Click here to Skip to main content
15,113,153 members
Articles / Programming Languages / PHP
Article
Posted 2 Jun 2015

Tagged as

Stats

30.9K views
94 downloads
7 bookmarked

LINQ for PHP comparison: YaLinqo, Ginq, Pinq

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
2 Jun 2015CC (ASA 3U)9 min read
Comparison of full-featured LINQ ports to PHP (YaLinqo, Ginq, Pinq), centered mostly on performance

Introduction

This article is about comparing (mostly performance-wise) of LINQ ports from .NET to PHP. In .NET, LINQ is used for writing SQL-like queries on various collections, including databases. In PHP, it is usually used for transforming arrays, like built-in functions array_filter and array_map, but in a more readable form and with much more features. Due to limitations of PHP and the current state of the libraries, LINQ ports are best suited for performing transformations on relatively small datasets returned from web services, for example.

This is not an introductory article, I will not include tutorials on using LINQ. I will likely write another article with more details for beginners. However, some examples should be self-explanatory, so even if you have not used LINQ before, you can compare the code.

You are expected to know how to use closures. Knowing LINQ is a huge bonus.

Background

Before developping one more port of LINQ from .NET to PHP, I've deeply investigated all available libraries. There were plenty of them: LINQ for PHP, Phinq, PHPLinq and Plinq. Unfortunately, all of them do not support lazy evaluation; most of them do not include enough tests (if any); documentation is either missing or incomplete etc. Overall, they are clearly not production-ready.

This is why YaLinqo was born. At the time, it was the only LINQ port which truly implemented LINQ to objects. It has 100% test coverage, very detailed PHPDoc, supports "string lambdas" and does not lose keys during transformations. The first version was implemented in PHP 5.3, later it was updated to rely on yield from PHP 5.5.

Since then, two libraries have appeared which rival YaLinqo. The first one is Ginq. Unlike YaLinqo, it relies on manually implemented iterators. In a way, it is implemented closer to "PHP-way" than the first version of YaLinqo, which relied on "hackish" iterators inspired by LINQ.js. It doesn't support "string lambdas", instead it supports "property access" from Symfony, which comes in handy when sorting, grouping and joining. Many methods have aliases coming from functional programming, for example "map" in addition to "select". Documentation is not detailed.

Another library is Pinq. It is the (potentially) most powerful library which supports both objects and databases. It supports parsing PHP code using PHP-Parser and can generate SQL. Unfortunately, at the time of writing, the only query provider is for MySQL and its state is a "demonstration". I suspect there is a lot of work to do before it becomes production ready and starts supporting multiple DBMS. Another drawback is that, surprisingly, it contains less functions and its functions are less featureful.

All three libraries have permissive open-source licenses, good test coverage, documentation, support lots of functions, are available on Packagist, and overall are ready to use in any project which doesn't require heavy optimization. If you count every microsecond, you should take into account that these libraries add considerable overhead, so if you use them in a high-load project where LINQ queries are a substantial part of the executed code, you may prefer to keep using good old for and foreach. However, I don't consider script languages a good option for high-load projects, and most heavy logic is usually done in DB, so in most cases, increased readability and maintainability are worth some performance loss.

It is interesting to note that the three libraries are very different is size: YaLinqo contains 4 classes and has zero dependencies, Ginq contains more than 70 classes and has a dependency on Symfony's Property Access module, Pinq contains more than 500 classes and has a dependency on PHP-Parser. The difference lies in their architecture. YaLinqo uses only PHP arrays and callbacks. Pinq includes iterator classes for every transformation, collections, comparers etc. Ginq contains even more classes and interfaces inspired by LINQ in .NET and includes all the plumbing necessary for supporting databases: repositories, parsing etc. (I haven't thoroughly investigated sources of Pinq.)

About tests

I have very little experience in performance testing, so tests are quick and dirty, without much thought put into getting precise results. Memory usage isn't considered at all. However, the difference in performance is so huge that I don't think precision really matters. If you find a bug in the code or can improve the tests, the project is available on GitHub, pull requests are welcome.

In all following tests, benchmark_linq_groups function is called which accepts an array of functions for implementations in PHP, YaLinqo, Ginq and Pinq. This function consumes produced collections using foreach and makes sure that results returned from all tests are the same.

Tests are performed on PHP 5.5.14, Windows 7 SP1 64-bit.

Tests

Let's start with pure overhead:

PHP
benchmark_linq_groups("Iterating over $ITER_MAX ints", 100, null,
    [
        "for" => function () use ($ITER_MAX) {
            $j = null;
            for ($i = 0; $i < $ITER_MAX; $i++)
                $j = $i;
            return $j;
        },
        "array functions" => function () use ($ITER_MAX) {
            $j = null;
            foreach (range(0, $ITER_MAX - 1) as $i)
                $j = $i;
            return $j;
        },
    ],
    [
        function () use ($ITER_MAX) {
            $j = null;
            foreach (E::range(0, $ITER_MAX) as $i)
                $j = $i;
            return $j;
        },
    ],
    [
        function () use ($ITER_MAX) {
            $j = null;
            foreach (G::range(0, $ITER_MAX - 1) as $i)
                $j = $i;
            return $j;
        },
    ],
    [
        function () use ($ITER_MAX) {
            $j = null;
            foreach (P::from(range(0, $ITER_MAX - 1)) as $i)
                $j = $i;
            return $j;
        },
    ]);

Generator function range is not available in Pinq, so I use a built-in function instead, as suggested in its documentation.

Here are the results:

Iterating over 1000 ints
------------------------
  PHP     [for]               0.00006 sec   x1.0 (100%)
  PHP     [array functions]   0.00011 sec   x1.8 (+83%)
  YaLinqo                     0.00041 sec   x6.8 (+583%)
  Ginq                        0.00075 sec   x12.5 (+1150%)
  Pinq                        0.00169 sec   x28.2 (+2717%)

Iterators waste lots of time. Pinq surprises the most — 30 times slower than for. However, it is far from the most surprising result, as you will see.

Let's generate an array instead of just iterating:

PHP
benchmark_linq_groups("Generating array of $ITER_MAX integers", 100, 'consume',
    [
        "for" =>
            function () use ($ITER_MAX) {
                $a = [ ];
                for ($i = 0; $i < $ITER_MAX; $i++)
                    $a[] = $i;
                return $a;
            },
        "array functions" =>
            function () use ($ITER_MAX) {
                return range(0, $ITER_MAX - 1);
            },
    ],
    [
        function () use ($ITER_MAX) {
            return E::range(0, $ITER_MAX)->toArray();
        },
    ],
    [
        function () use ($ITER_MAX) {
            return G::range(0, $ITER_MAX - 1)->toArray();
        },
    ],
    [
        function () use ($ITER_MAX) {
            return P::from(range(0, $ITER_MAX - 1))->asArray();
        },
    ]);

And results:

Generating array of 1000 integers
---------------------------------
  PHP     [for]               0.00025 sec   x1.3 (+32%)
  PHP     [array functions]   0.00019 sec   x1.0 (100%)
  YaLinqo                     0.00060 sec   x3.2 (+216%)
  Ginq                        0.00107 sec   x5.6 (+463%)
  Pinq                        0.00183 sec   x9.6 (+863%)

YaLinqo is now only two time slower than the solution with for. Other libraries performed worse, but satisfactory.

Let's count items from test data: number of orders having more than 5 order items; number of order having more than 2 order items of quantity more than 5.

PHP
benchmark_linq_groups("Counting values in arrays", 100, null,
    [
        "for" => function () use ($DATA) {
            $numberOrders = 0;
            foreach ($DATA->orders as $order) {
                if (count($order['items']) > 5)
                    $numberOrders++;
            }
            return $numberOrders;
        },
        "array functions" => function () use ($DATA) {
            return count(
                array_filter(
                    $DATA->orders,
                    function ($order) { return count($order['items']) > 5; }
                )
            );
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->orders)
                ->count(function ($order) { return count($order['items']) > 5; });
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->orders)
                ->count('$o ==> count($o["items"]) > 5');
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->orders)
                ->count(function ($order) { return count($order['items']) > 5; });
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->orders)
                ->where(function ($order) { return count($order['items']) > 5; })
                ->count();
        },
    ]);

benchmark_linq_groups("Counting values in arrays deep", 100, null,
    [
        "for" => function () use ($DATA) {
            $numberOrders = 0;
            foreach ($DATA->orders as $order) {
                $numberItems = 0;
                foreach ($order['items'] as $item) {
                    if ($item['quantity'] > 5)
                        $numberItems++;
                }
                if ($numberItems > 2)
                    $numberOrders++;
            }
            return $numberOrders;
        },
        "array functions" => function () use ($DATA) {
            return count(
                array_filter(
                    $DATA->orders,
                    function ($order) {
                        return count(
                            array_filter(
                                $order['items'],
                                function ($item) { return $item['quantity'] > 5; }
                            )
                        ) > 2;
                    })
            );
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->orders)
                ->count(function ($order) {
                    return E::from($order['items'])
                        ->count(function ($item) { return $item['quantity'] > 5; }) > 2;
                });
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->orders)
                ->count(function ($order) {
                    return G::from($order['items'])
                        ->count(function ($item) { return $item['quantity'] > 5; }) > 2;
                });
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->orders)
                ->where(function ($order) {
                    return P::from($order['items'])
                        ->where(function ($item) { return $item['quantity'] > 5; })
                        ->count() > 2;
                })
                ->count();
        },
    ]);

Points to note: first, functional style with standard array functions turns code into funny barely readable stairs. Second, "string lambdas" don't help here, because escaping code in escaped code is incomprehensible. And third, Pinq does not provide an overload of count function accepting a predicate, so a method chain is necessary. Results:

Counting values in arrays
-------------------------
  PHP     [for]               0.00023 sec   x1.0 (100%)
  PHP     [array functions]   0.00052 sec   x2.3 (+126%)
  YaLinqo                     0.00056 sec   x2.4 (+143%)
  YaLinqo [string lambda]     0.00059 sec   x2.6 (+157%)
  Ginq                        0.00129 sec   x5.6 (+461%)
  Pinq                        0.00382 sec   x16.6 (+1561%)

Counting values in arrays deep
------------------------------
  PHP     [for]               0.00064 sec   x1.0 (100%)
  PHP     [array functions]   0.00323 sec   x5.0 (+405%)
  YaLinqo                     0.00798 sec   x12.5 (+1147%)
  Ginq                        0.01416 sec   x22.1 (+2113%)
  Pinq                        0.04928 sec   x77.0 (+7600%)

Results are more or less predictable, except for scary Pinq's result. I've looked at the code — it seems to generate a complete collection and then call built-in count on it...

Let's filter arrays. Conditions are like last time, but instead of counting, we generate the collections.

PHP
benchmark_linq_groups("Filtering values in arrays", 100, 'consume',
    [
        "for" => function () use ($DATA) {
            $filteredOrders = [ ];
            foreach ($DATA->orders as $order) {
                if (count($order['items']) > 5)
                    $filteredOrders[] = $order;
            }
            return $filteredOrders;
        },
        "array functions" => function () use ($DATA) {
            return array_filter(
                $DATA->orders,
                function ($order) { return count($order['items']) > 5; }
            );
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->orders)
                ->where(function ($order) { return count($order['items']) > 5; });
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->orders)
                ->where('$order ==> count($order["items"]) > 5');
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->orders)
                ->where(function ($order) { return count($order['items']) > 5; });
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->orders)
                ->where(function ($order) { return count($order['items']) > 5; });
        },
    ]);

benchmark_linq_groups("Filtering values in arrays deep", 100,
    function ($e) { consume($e, [ 'items' => null ]); },
    [
        "for" => function () use ($DATA) {
            $filteredOrders = [ ];
            foreach ($DATA->orders as $order) {
                $filteredItems = [ ];
                foreach ($order['items'] as $item) {
                    if ($item['quantity'] > 5)
                        $filteredItems[] = $item;
                }
                if (count($filteredItems) > 0) {
                    $order['items'] = $filteredItems;
                    $filteredOrders[] = [
                        'id' => $order['id'],
                        'items' => $filteredItems,
                    ];
                }
            }
            return $filteredOrders;
        },
        "array functions" => function () use ($DATA) {
            return array_filter(
                array_map(
                    function ($order) {
                        return [
                            'id' => $order['id'],
                            'items' => array_filter(
                                $order['items'],
                                function ($item) { return $item['quantity'] > 5; }
                            )
                        ];
                    },
                    $DATA->orders
                ),
                function ($order) {
                    return count($order['items']) > 0;
                }
            );
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->orders)
                ->select(function ($order) {
                    return [
                        'id' => $order['id'],
                        'items' => E::from($order['items'])
                            ->where(function ($item) { return $item['quantity'] > 5; })
                            ->toArray()
                    ];
                })
                ->where(function ($order) {
                    return count($order['items']) > 0;
                });
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->orders)
                ->select(function ($order) {
                    return [
                        'id' => $order['id'],
                        'items' => E::from($order['items'])->where('$v["quantity"] > 5')->toArray()
                    ];
                })
                ->where('count($v["items"]) > 0');
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->orders)
                ->select(function ($order) {
                    return [
                        'id' => $order['id'],
                        'items' => G::from($order['items'])
                            ->where(function ($item) { return $item['quantity'] > 5; })
                            ->toArray()
                    ];
                })
                ->where(function ($order) {
                    return count($order['items']) > 0;
                });
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->orders)
                ->select(function ($order) {
                    return [
                        'id' => $order['id'],
                        'items' => P::from($order['items'])
                            ->where(function ($item) { return $item['quantity'] > 5; })
                            ->asArray()
                    ];
                })
                ->where(function ($order) {
                    return count($order['items']) > 0;
                });
        },
    ]);

Code using standard array functions becomes very hard to comprehend, mostly due to inconsistent argument order of array_map and array_filter.

Code using LINQ is intentionally not optimal: objects are generated even if they will be discarded later. It is a tradition in LINQ to rely on "anonymous objects" to pass data between transformations.

Compared to previous results, these results are unusually even:

Filtering values in arrays
--------------------------
  PHP     [for]               0.00049 sec   x1.0 (100%)
  PHP     [array functions]   0.00072 sec   x1.5 (+47%)
  YaLinqo                     0.00094 sec   x1.9 (+92%)
  YaLinqo [string lambda]     0.00094 sec   x1.9 (+92%)
  Ginq                        0.00295 sec   x6.0 (+502%)
  Pinq                        0.00328 sec   x6.7 (+569%)

Filtering values in arrays deep
-------------------------------
  PHP     [for]               0.00514 sec   x1.0 (100%)
  PHP     [array functions]   0.00739 sec   x1.4 (+44%)
  YaLinqo                     0.01556 sec   x3.0 (+203%)
  YaLinqo [string lambda]     0.01750 sec   x3.4 (+240%)
  Ginq                        0.03101 sec   x6.0 (+503%)
  Pinq                        0.05435 sec   x10.6 (+957%)

Let's get to sorting:

PHP
benchmark_linq_groups("Sorting arrays", 100, 'consume',
    [
        function () use ($DATA) {
            $orderedUsers = $DATA->users;
            usort(
                $orderedUsers,
                function ($a, $b) {
                    $diff = $a['rating'] - $b['rating'];
                    if ($diff !== 0)
                        return -$diff;
                    $diff = strcmp($a['name'], $b['name']);
                    if ($diff !== 0)
                        return $diff;
                    $diff = $a['id'] - $b['id'];
                    return $diff;
                });
            return $orderedUsers;
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->users)
                ->orderByDescending(function ($u) { return $u['rating']; })
                ->thenBy(function ($u) { return $u['name']; })
                ->thenBy(function ($u) { return $u['id']; });
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->users)
                ->orderByDescending('$v["rating"]')->thenBy('$v["name"]')->thenBy('$v["id"]');
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->users)
                ->orderByDesc(function ($u) { return $u['rating']; })
                ->thenBy(function ($u) { return $u['name']; })
                ->thenBy(function ($u) { return $u['id']; });
        },
        "property path" => function () use ($DATA) {
            return G::from($DATA->users)
                ->orderByDesc('[rating]')->thenBy('[name]')->thenBy('[id]');
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->users)
                ->orderByDescending(function ($u) { return $u['rating']; })
                ->thenByAscending(function ($u) { return $u['name']; })
                ->thenByAscending(function ($u) { return $u['id']; });
        },
    ]);

Code for usort's callback is a little scary, but with some practice, it is pretty easy to write code for comparers. Code using LINQ is very clean, especially in case of Ginq where "property access" makes the code beautiful.

Results are unanticipated:

Sorting arrays
--------------
  PHP                         0.00037 sec   x1.0 (100%)
  YaLinqo                     0.00161 sec   x4.4 (+335%)
  YaLinqo [string lambda]     0.00163 sec   x4.4 (+341%)
  Ginq                        0.00402 sec   x10.9 (+986%)
  Ginq    [property path]     0.01998 sec   x54.0 (+5300%)
  Pinq                        0.00132 sec   x3.6 (+257%)

First, Pinq is the fastest among LINQ libraries for the first time (spoiler: and the last time).

Second, Ginq's property access is incredibly slow. I would say it is unusable, because they are not worth 50x increase in time.

We get to the interesting part — joining two arrays based on equal keys in both.

PHP
benchmark_linq_groups("Joining arrays", 100, 'consume',
    [
        function () use ($DATA) {
            $ordersByCustomerId = [ ];
            foreach ($DATA->orders as $order)
                $ordersByCustomerId[$order['customerId']][] = $order;
            $pairs = [ ];
            foreach ($DATA->users as $user) {
                $userId = $user['id'];
                if (isset($ordersByCustomerId[$userId])) {
                    foreach ($ordersByCustomerId[$userId] as $order) {
                        $pairs[] = [
                            'order' => $order,
                            'user' => $user,
                        ];
                    }
                }
            }
            return $pairs;
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->orders)
                ->join($DATA->users,
                    function ($o) { return $o['customerId']; },
                    function ($u) { return $u['id']; },
                    function ($o, $u) {
                        return [
                            'order' => $o,
                            'user' => $u,
                        ];
                    });
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->orders)
                ->join($DATA->users,
                    '$o ==> $o["customerId"]', '$u ==> $u["id"]',
                    '($o, $u) ==> [
                        "order" => $o,
                        "user" => $u,
                    ]');
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->orders)
                ->join($DATA->users,
                    function ($o) { return $o['customerId']; },
                    function ($u) { return $u['id']; },
                    function ($o, $u) {
                        return [
                            'order' => $o,
                            'user' => $u,
                        ];
                    });
        },
        "property path" => function () use ($DATA) {
            return G::from($DATA->orders)
                ->join($DATA->users,
                    '[customerId]', '[id]',
                    function ($o, $u) {
                        return [
                            'order' => $o,
                            'user' => $u,
                        ];
                    });
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->orders)
                ->join($DATA->users)
                ->onEquality(
                    function ($o) { return $o['customerId']; },
                    function ($u) { return $u['id']; }
                )
                ->to(function ($o, $u) {
                    return [
                        'order' => $o,
                        'user' => $u,
                    ];
                });
        },
    ]);

Pinq's code is different from the others. It transforms a single method call into a chain. It increases readability, but may look unusual for those who got used to LINQ methods chains in .NET.

And results:

Joining arrays
--------------
  PHP                         0.00021 sec   x1.0 (100%)
  YaLinqo                     0.00065 sec   x3.1 (+210%)
  YaLinqo [string lambda]     0.00070 sec   x3.3 (+233%)
  Ginq                        0.00103 sec   x4.9 (+390%)
  Ginq    [property path]     0.00200 sec   x9.5 (+852%)
  Pinq                        1.24155 sec   x5,911.8 (+591084%)

Wow. Just wow. No, it is not a joke. I thought that the script hung, but eventually it returned this startling result. Pinq is 5,912 times slower than raw PHP. I could not find where exactly this happens in Plinq's code, but looks like it is basically for-for-if with no lookups. I totally didn't expect this from a developer who implemented 500 classes.

Okay, let's see a simpler test — aggregating (or accumulating, or folding).

PHP
benchmark_linq_groups("Aggregating arrays", 100, null,
    [
        "for" => function () use ($DATA) {
            $sum = 0;
            foreach ($DATA->products as $p)
                $sum += $p['quantity'];
            $avg = 0;
            foreach ($DATA->products as $p)
                $avg += $p['quantity'];
            $avg /= count($DATA->products);
            $min = PHP_INT_MAX;
            foreach ($DATA->products as $p)
                $min = min($min, $p['quantity']);
            $max = -PHP_INT_MAX;
            foreach ($DATA->products as $p)
                $max = max($max, $p['quantity']);
            return "$sum-$avg-$min-$max";
        },
        "array functions" => function () use ($DATA) {
            $sum = array_sum(array_map(function ($p) { return $p['quantity']; }, $DATA->products));
            $avg = array_sum(array_map(function ($p) { return $p['quantity']; }, $DATA->products)) / count($DATA->products);
            $min = min(array_map(function ($p) { return $p['quantity']; }, $DATA->products));
            $max = max(array_map(function ($p) { return $p['quantity']; }, $DATA->products));
            return "$sum-$avg-$min-$max";
        },
    ],
    [
        function () use ($DATA) {
            $sum = E::from($DATA->products)->sum(function ($p) { return $p['quantity']; });
            $avg = E::from($DATA->products)->average(function ($p) { return $p['quantity']; });
            $min = E::from($DATA->products)->min(function ($p) { return $p['quantity']; });
            $max = E::from($DATA->products)->max(function ($p) { return $p['quantity']; });
            return "$sum-$avg-$min-$max";
        },
        "string lambda" => function () use ($DATA) {
            $sum = E::from($DATA->products)->sum('$v["quantity"]');
            $avg = E::from($DATA->products)->average('$v["quantity"]');
            $min = E::from($DATA->products)->min('$v["quantity"]');
            $max = E::from($DATA->products)->max('$v["quantity"]');
            return "$sum-$avg-$min-$max";
        },
    ],
    [
        function () use ($DATA) {
            $sum = G::from($DATA->products)->sum(function ($p) { return $p['quantity']; });
            $avg = G::from($DATA->products)->average(function ($p) { return $p['quantity']; });
            $min = G::from($DATA->products)->min(function ($p) { return $p['quantity']; });
            $max = G::from($DATA->products)->max(function ($p) { return $p['quantity']; });
            return "$sum-$avg-$min-$max";
        },
        "property path" => function () use ($DATA) {
            $sum = G::from($DATA->products)->sum('[quantity]');
            $avg = G::from($DATA->products)->average('[quantity]');
            $min = G::from($DATA->products)->min('[quantity]');
            $max = G::from($DATA->products)->max('[quantity]');
            return "$sum-$avg-$min-$max";
        },
    ],
    [
        function () use ($DATA) {
            $sum = P::from($DATA->products)->sum(function ($p) { return $p['quantity']; });
            $avg = P::from($DATA->products)->average(function ($p) { return $p['quantity']; });
            $min = P::from($DATA->products)->minimum(function ($p) { return $p['quantity']; });
            $max = P::from($DATA->products)->maximum(function ($p) { return $p['quantity']; });
            return "$sum-$avg-$min-$max";
        },
    ]);

benchmark_linq_groups("Aggregating arrays custom", 100, null,
    [
        function () use ($DATA) {
            $mult = 1;
            foreach ($DATA->products as $p)
                $mult *= $p['quantity'];
            return $mult;
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->products)->aggregate(function ($a, $p) { return $a * $p['quantity']; }, 1);
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->products)->aggregate('$a * $v["quantity"]', 1);
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->products)->aggregate(1, function ($a, $p) { return $a * $p['quantity']; });
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->products)
                ->select(function ($p) { return $p['quantity']; })
                ->aggregate(function ($a, $q) { return $a * $q; });
        },
    ]);

There is not much to explain in the first group of functions.

In the second group, I am calculating multiplication (yes, multiplying product quantities does not make much sense, but who cares). There's no overload in Plinq which accepts a seed, it always uses the first element (it also silently returns null if there are no elements...), so I had to use a method chain, again.

Results:

Aggregating arrays
------------------
  PHP     [for]               0.00059 sec   x1.0 (100%)
  PHP     [array functions]   0.00193 sec   x3.3 (+227%)
  YaLinqo                     0.00475 sec   x8.1 (+705%)
  YaLinqo [string lambda]     0.00515 sec   x8.7 (+773%)
  Ginq                        0.00669 sec   x11.3 (+1034%)
  Ginq    [property path]     0.03955 sec   x67.0 (+6603%)
  Pinq                        0.03226 sec   x54.7 (+5368%)

Aggregating arrays custom
-------------------------
  PHP                         0.00007 sec   x1.0 (100%)
  YaLinqo                     0.00046 sec   x6.6 (+557%)
  YaLinqo [string lambda]     0.00057 sec   x8.1 (+714%)
  Ginq                        0.00046 sec   x6.6 (+557%)
  Pinq                        0.00610 sec   x87.1 (+8615%)

All LINQ libraries performed bad. Ginq in property-access mode and Pinq performed remarkably bad. Even built-in functions turned out to be far from performant. For rules.

And finally, the last test with a complex query from YaLinqo's ReadMe, which uses several functions and subqueries:

PHP
benchmark_linq_groups("Process data from ReadMe example", 5,
    function ($e) { consume($e, [ 'products' => null ]); },
    [
        function () use ($DATA) {
            $productsSorted = [ ];
            foreach ($DATA->products as $product) {
                if ($product['quantity'] > 0) {
                    if (empty($productsSorted[$product['catId']]))
                        $productsSorted[$product['catId']] = [ ];
                    $productsSorted[$product['catId']][] = $product;
                }
            }
            foreach ($productsSorted as $catId => $products) {
                usort($productsSorted[$catId], function ($a, $b) {
                    $diff = $a['quantity'] - $b['quantity'];
                    if ($diff != 0)
                        return -$diff;
                    $diff = strcmp($a['name'], $b['name']);
                    return $diff;
                });
            }
            $result = [ ];
            $categoriesSorted = $DATA->categories;
            usort($categoriesSorted, function ($a, $b) {
                return strcmp($a['name'], $b['name']);
            });
            foreach ($categoriesSorted as $category) {
                $categoryId = $category['id'];
                $result[$category['id']] = [
                    'name' => $category['name'],
                    'products' => isset($productsSorted[$categoryId]) ? $productsSorted[$categoryId] : [ ],
                ];
            }
            return $result;
        },
    ],
    [
        function () use ($DATA) {
            return E::from($DATA->categories)
                ->orderBy(function ($cat) { return $cat['name']; })
                ->groupJoin(
                    from($DATA->products)
                        ->where(function ($prod) { return $prod['quantity'] > 0; })
                        ->orderByDescending(function ($prod) { return $prod['quantity']; })
                        ->thenBy(function ($prod) { return $prod['name']; }),
                    function ($cat) { return $cat['id']; },
                    function ($prod) { return $prod['catId']; },
                    function ($cat, $prods) {
                        return array(
                            'name' => $cat['name'],
                            'products' => $prods
                        );
                    }
                );
        },
        "string lambda" => function () use ($DATA) {
            return E::from($DATA->categories)
                ->orderBy('$cat ==> $cat["name"]')
                ->groupJoin(
                    from($DATA->products)
                        ->where('$prod ==> $prod["quantity"] > 0')
                        ->orderByDescending('$prod ==> $prod["quantity"]')
                        ->thenBy('$prod ==> $prod["name"]'),
                    '$cat ==> $cat["id"]', '$prod ==> $prod["catId"]',
                    '($cat, $prods) ==> [
                            "name" => $cat["name"],
                            "products" => $prods
                        ]');
        },
    ],
    [
        function () use ($DATA) {
            return G::from($DATA->categories)
                ->orderBy(function ($cat) { return $cat['name']; })
                ->groupJoin(
                    G::from($DATA->products)
                        ->where(function ($prod) { return $prod['quantity'] > 0; })
                        ->orderByDesc(function ($prod) { return $prod['quantity']; })
                        ->thenBy(function ($prod) { return $prod['name']; }),
                    function ($cat) { return $cat['id']; },
                    function ($prod) { return $prod['catId']; },
                    function ($cat, $prods) {
                        return array(
                            'name' => $cat['name'],
                            'products' => $prods
                        );
                    }
                );
        },
    ],
    [
        function () use ($DATA) {
            return P::from($DATA->categories)
                ->orderByAscending(function ($cat) { return $cat['name']; })
                ->groupJoin(
                    P::from($DATA->products)
                        ->where(function ($prod) { return $prod['quantity'] > 0; })
                        ->orderByDescending(function ($prod) { return $prod['quantity']; })
                        ->thenByAscending(function ($prod) { return $prod['name']; })
                )
                ->onEquality(
                    function ($cat) { return $cat['id']; },
                    function ($prod) { return $prod['catId']; }
                )
                ->to(function ($cat, $prods) {
                    return array(
                        'name' => $cat['name'],
                        'products' => $prods
                    );
                });
        },
    ]);

Results:

Process data from ReadMe example
--------------------------------
  PHP                         0.00620 sec   x1.0 (100%)
  YaLinqo                     0.02840 sec   x4.6 (+358%)
  YaLinqo [string lambda]     0.02920 sec   x4.7 (+371%)
  Ginq                        0.07720 sec   x12.5 (+1145%)
  Pinq                        2.71616 sec   x438.1 (+43707%)

GroupJoin killed the performance of Pinq. I guess the reson is the same as in the test with join.

All results

Iterating over 1000 ints
------------------------
  PHP     [for]               0.00006 sec   x1.0 (100%)
  PHP     [array functions]   0.00011 sec   x1.8 (+83%)
  YaLinqo                     0.00041 sec   x6.8 (+583%)
  Ginq                        0.00075 sec   x12.5 (+1150%)
  Pinq                        0.00169 sec   x28.2 (+2717%)

Generating array of 1000 integers
---------------------------------
  PHP     [for]               0.00025 sec   x1.3 (+32%)
  PHP     [array functions]   0.00019 sec   x1.0 (100%)
  YaLinqo                     0.00060 sec   x3.2 (+216%)
  Ginq                        0.00107 sec   x5.6 (+463%)
  Pinq                        0.00183 sec   x9.6 (+863%)

Generating lookup of 1000 floats, calculate sum
-----------------------------------------------
  PHP                         0.00124 sec   x1.0 (100%)
  YaLinqo                     0.00381 sec   x3.1 (+207%)
  YaLinqo [string lambda]     0.00403 sec   x3.3 (+225%)
  Ginq                        0.01390 sec   x11.2 (+1021%)
  Pinq                        * Not implemented

Counting values in arrays
-------------------------
  PHP     [for]               0.00023 sec   x1.0 (100%)
  PHP     [arrays functions]  0.00052 sec   x2.3 (+126%)
  YaLinqo                     0.00056 sec   x2.4 (+143%)
  YaLinqo [string lambda]     0.00059 sec   x2.6 (+157%)
  Ginq                        0.00129 sec   x5.6 (+461%)
  Pinq                        0.00382 sec   x16.6 (+1561%)

Counting values in arrays deep
------------------------------
  PHP     [for]               0.00064 sec   x1.0 (100%)
  PHP     [arrays functions]  0.00323 sec   x5.0 (+405%)
  YaLinqo                     0.00798 sec   x12.5 (+1147%)
  Ginq                        0.01416 sec   x22.1 (+2113%)
  Pinq                        0.04928 sec   x77.0 (+7600%)

Filtering values in arrays
--------------------------
  PHP     [for]               0.00049 sec   x1.0 (100%)
  PHP     [arrays functions]  0.00072 sec   x1.5 (+47%)
  YaLinqo                     0.00094 sec   x1.9 (+92%)
  YaLinqo [string lambda]     0.00094 sec   x1.9 (+92%)
  Ginq                        0.00295 sec   x6.0 (+502%)
  Pinq                        0.00328 sec   x6.7 (+569%)

Filtering values in arrays deep
-------------------------------
  PHP     [for]               0.00514 sec   x1.0 (100%)
  PHP     [arrays functions]  0.00739 sec   x1.4 (+44%)
  YaLinqo                     0.01556 sec   x3.0 (+203%)
  YaLinqo [string lambda]     0.01750 sec   x3.4 (+240%)
  Ginq                        0.03101 sec   x6.0 (+503%)
  Pinq                        0.05435 sec   x10.6 (+957%)

Sorting arrays
--------------
  PHP                         0.00037 sec   x1.0 (100%)
  YaLinqo                     0.00161 sec   x4.4 (+335%)
  YaLinqo [string lambda]     0.00163 sec   x4.4 (+341%)
  Ginq                        0.00402 sec   x10.9 (+986%)
  Ginq    [property path]     0.01998 sec   x54.0 (+5300%)
  Pinq                        0.00132 sec   x3.6 (+257%)

Joining arrays
--------------
  PHP                         0.00016 sec   x1.0 (100%)
  YaLinqo                     0.00065 sec   x4.1 (+306%)
  YaLinqo [string lambda]     0.00070 sec   x4.4 (+337%)
  Ginq                        0.00105 sec   x6.6 (+556%)
  Ginq    [property path]     0.00194 sec   x12.1 (+1112%)
  Pinq                        1.21249 sec   x7,577.5 (+757648%)

Aggregating arrays
------------------
  PHP     [for]               0.00059 sec   x1.0 (100%)
  PHP     [array functions]   0.00193 sec   x3.3 (+227%)
  YaLinqo                     0.00475 sec   x8.1 (+705%)
  YaLinqo [string lambda]     0.00515 sec   x8.7 (+773%)
  Ginq                        0.00669 sec   x11.3 (+1034%)
  Ginq    [property path]     0.03955 sec   x67.0 (+6603%)
  Pinq                        0.03226 sec   x54.7 (+5368%)

Aggregating arrays custom
-------------------------
  PHP                         0.00007 sec   x1.0 (100%)
  YaLinqo                     0.00046 sec   x6.6 (+557%)
  YaLinqo [string lambda]     0.00057 sec   x8.1 (+714%)
  Ginq                        0.00046 sec   x6.6 (+557%)
  Pinq                        0.00610 sec   x87.1 (+8615%)

Process data from ReadMe example
--------------------------------
  PHP                         0.00620 sec   x1.0 (100%)
  YaLinqo                     0.02840 sec   x4.6 (+358%)
  YaLinqo [string lambda]     0.02920 sec   x4.7 (+371%)
  Ginq                        0.07720 sec   x12.5 (+1145%)
  Pinq                        2.71616 sec   x438.1 (+43707%)

Conclusion

If you need to perform queries on relatively small sets of data, for example returned from web-services, you can use either YaLinqo or Ginq.

YaLinqo has better performance, has more functions, has much better documentation. It is a minimalistic library which relies on modern PHP features. It supports both anonymous functions and string lambdas (in all varieties). It does not contain any classes besides a wrapper around an iteraror and relies on good old PHP arrays, so it is easy to learn.

Ginq uses multiple classes of iterators, collections and comparers. Thanks to this, it closer resembles LINQ from .NET. However, it comes with a price. Unlike in .NET, custom dictionaries implemented in PHP will be much slower than native arrays. Public classes of iterators, on the other hand, are alien for .NET developers, but PHP developers using SPL are used to seeing them. And they come with a price too — iterating using an SPL iterator is much slower than yield. Overall, Ginq is 1.5—3 times slower than YaLinqo.

Pinq is unbelievably slow. No amount of architecture can justify slowing application down 6000 times because of a simple query. The library has a pretty website, a unique feature of supporting databases, a complex architecture, it is version 3 already, so I am very sad to come to conclusion that the library is absolutely unusable. I hope the developer improves the performance and implements at least one full-featured query provider. When it is done, the library may become the library of choice when LINQ to database is needed.

Another library to consider is Underscore.php. It is not LINQ, it is not lazy, but it follows the same functional idea and its methods may look familiar if you have used functional languages or various Underscore.* libraries in other languages.

Other libraries

I have written an extensive article in Russian which compares old "LINQ" libraries: LINQ for PHP, Phinq, PHPLinq and Plinq. However, I cannot recommend using any of them. They are incomplete, untested, undocumented, and above all, they are not LINQ — none of them support lazy evaluation. Discussing them in detail would be a waste of time in the presence of the newer libraries.

The only library among those which is worth mentioning is PHPLinq. It supports querying databases, in fact a lot of them. However, you should consider that the library is almost untested, the order of function calls is fixed (it is more like DAL for generating SQL), single and first are considered the same etc. I would never use code like this in production, but you can decide yourself.

Licenses

  • YaLinqoPerf — WTFPL* License
  • YaLinqo — Simplified BSD License
  • Ginq — MIT License
  • Pinq — MIT License + BSD 3-clause License (dependencies)

History

  • 2015-05-30: first version

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-Share Alike 3.0 Unported License

Share

About the Author

Athari
Software Developer
Russian Federation Russian Federation


C#, JavaScript, PHP developer.




Comments and Discussions

 
QuestionThank you for your work. Pin
hungndv3-Nov-15 23:06
Memberhungndv3-Nov-15 23:06 
QuestionVery Good Article. Thanks Pin
umlcat4-Jun-15 7:20
Memberumlcat4-Jun-15 7:20 
AnswerRe: Very Good Article. Thanks Pin
Athari4-Jun-15 12:55
MemberAthari4-Jun-15 12:55 
GeneralRe: Very Good Article. Thanks Pin
umlcat5-Jun-15 9:14
Memberumlcat5-Jun-15 9:14 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.