11,930,788 members (54,206 online)
alternative version

16K views
29 bookmarked
Posted

# A lazy stream implementation in C++11

, 19 May 2014 CPOL
 Rate this:
A lazy stream has been implemented in C++11, so as to highlight the functional capabilities of this new specification.

## Introduction

After attending a free introductory course on functional programming at Coursera I got fascinated by the power of lazy (non-strict) collections. I was taught the basic functioning of the `Stream` class in Scala (basically a simply linked list with non-strict evaluation). This was a radical change in my view on how collections could be structured and how their elements could be evaluated.

Coming from a mostly imperative world (as a C++ hobbyist), functional programming was an exotic mythical beast I was yearning to tame into a language I felt much more comfortable with.

Browsing in the Internet, my first inspiration came from Example 2 in this Louis Botterill’s techy blog insightful post. In this example, Louis depicts a home-made but very inspiring imitation of the Scala `Stream` class. I took his design and tried to poured it into an equivalent C++03 class, with no success at all after several days. It was not until I moved to C++11 when I began to see the light at the end of the tunnel… But it is not yet time to show the innards of the resulting implementation.

## The fascination of infinite streams

Let´s have a look at this simple instruction:

`lazy_stream<int> stream_a = lazy_stream<int>::from(3);    `

To put it simply, `stream_a` is the set of all integers bigger than 2, i.e.: {3, 4, 5,…}. It represents an infinite set on which we can apply some intriguing operations we may not be used to. The first one is filtering:

```lazy_stream<int> stream_b = stream_a.filter(
[](int value){
return value % 4 == 0;
});    ```

We have just created another infinite set called `stream_b`. In this case it is the subset of all multiples of 4 in `stream_a`. The filtering is achieved by means of an anonymous (lambda) function, in the above example the pattern: `[](...){...}`. The operation represents the set {4, 8, 12,…}.

Ok. Nice, but how can I hold an infinite set in memory? In fact you can’t, but you do not need to. At the point `stream_a` is created, the only known element is the head of the stream, i.e.: the number 3. The rest (called the tail) is only an embryonic, not yet born stream that can be created by invoking a function we might call tail generator. To invoke the tail generator we use the `tail` method. Thus:

`stream_a.tail();  `

creates a brand new lazy stream whose head is the number 4 and whose tail is another tail generator (just a function).

But not all the operations are equally lazy. `from` is pure lazy, but `filter` is not so. In fact it internally calls `tail` until the first filter-compliant occurrence is found (number 4 in our example). It is easy to see that if no such occurrence is found we will get into trouble (remember: the stream is infinite).

Let’s have a look at another pure lazy operation, `map`:

```typedef std::pair<int, int> pair_type;
lazy_stream<pair_type> stream_c = stream_b.map<pair_type>(
[](int value){
return std::pair<int, int>(value, value + 1);
});   ```

We have created a new lazy stream `stream_c` by mapping every value element of `stream_b` into the pair (value, value+1). `stream_c` is then: {(4, 5), (8, 9), (12, 13),…}.

Once an infinite stream is created, we may be soon interested in exploring its elements. To take the (say) first 10 elements of `stream_c`, we write:

`lazy_stream<pair_type> stream_d = stream_c.take(10); `

Observe that `stream_d` is a finite stream. In fact, we can take its size:

`std::cout << stream_d.size() << std::endl; // Prints 10. `

As you may imagine, `size` is a strict (non-lazy) operation. You have to evaluate all the elements of a lazy stream in order to know how many they are.

If we are only interested in getting a single element, we can use the `get` method. It just retrieves the element of a specified index n {n=0, 1, 2,…}. Unfortunately, it must evaluate the prior n-1 elements as well.

`std::cout << stream_b.get(2) << std::endl; // Prints 12. `

A worn-out but very illustrative example of a lazy stream is the extraction of the first n prime number by means of the so called Sieve of Eratosthenes.

Without entering in more explanations, this could be the code:

```// The sieve function object does the hard work.
std::function<lazy_stream<int> (const lazy_stream<int>&)> sieve =
[&sieve](const lazy_stream<int>& start){
// We filter the primes with respect to head.
return value % head > 0;
});
// We return a new lazy stream with its own head and tail generator.
return sieve(temp);
});
};
// prime_numbers holds all the prime numbers.
lazy_stream<int> prime_numbers = sieve(lazy_stream<int>::from(2));
// Let’s take the first 100 prime numbers and pour them into a std::list.
std::list<int> prime_number_list = prime_numbers.take(100).to_list();   ```

## The not so fancy finite lazy streams

Not so gorgeous, but they do not look that bad after all. To compose a finite lazy stream, we just prepend elements to an already existing finite stream. The simplest finite lazy stream is the empty one, also called `nil`.

`lazy_stream<int> stream_e = 1 & (2 & lazy_stream<int>::nil);`

The use of parentheses is a pity, but all the right associative operators in C++ can be overloaded only as members of a class, so I have been compelled to use a left associative operator &. The equivalent Scala operator `#:: `needs no parentheses, for instance.

Some interesting notes:

• The size of `stream_e` is 2.
• `lazy_stream<int>::nil` is equivalent to `lazy_stream<int>()`.
• We can always prepend an element to an already lazy stream, no matter whether this is finite or infinite.

Save the generation function `from`, the rest of operations (`filter`, `map`, `take`, etc.) are equally available for finite streams. For integral types we can create, for instance, finite ranges:

```// This is the lazy finite set {100, 99, 98,… ,2}.
lazy_stream<int> stream_f = lazy_stream<int>::range(100, 1);```

There are other operations it is not worth mentioning. I just would like to highlight one very important reducing function in functional languages: `fold_left`. This operation first takes a starting value of type `U` (let’s call it `start`). Then, with each one of the elements in the stream it operates a given function `op` which takes the `start` value plus the element value and stores the result in `start`. It then proceeds with the next element an so on. The final result is a single value of type `U`.

Let’s add the `double` 0.5 plus all the elements in `stream_f`:

```// Observe that stream_f is a lazy stream of type int.
// The result is of type double instead.
// (Types of start and result must be the same).
[](double accum, int element){
return accum + element;
});                ```

## Purpose of the lazy_stream<> class

With the writing of `lazy_stream<>` I did not intend to make a full-fledged, optimized and efficient piece of code. I just wanted to realized how the new C++11 specification could significantly move the scope of the language to a more functional (less imperative) view.

In fact, one of features of `lazy_stream<>` class is that all its objects are immutable, i.e.: once created, they cannot be modified. This is a common trait of functional programming objects, generally disregarded by imperative languages, because immutability sometimes leads to a lesser memory efficiency; many times it is much cheaper to modify an object than copying it.

On the other hand, I think this design would have been very tricky if not impossible with previous specifications of C++. I will discuss this idea later.

## Implementation details

Both finite and infinite streams keep three smart pointers to dynamically allocated information in the heap, namely:

1. A `std::shared_ptr` smart pointer, called `head_ptr_`, to the head (a value of generic type `T`).
2. A `std::shared_ptr` smart pointer, called `tail_ptr_`, to the stream tail (another `lazy_stream<T>`).
3. A `std::shared_ptr` smart pointer, called `tail_gen_ptr_`, to the stream tail generator (a function class of type `std::function<lazy_stream<T> ()>`).

Pointers [2] and [3] are mutually exclusive in order to keep the design consistent. Thus, one of them is directly set to `nullptr` since object creation.

The notation `std::function<lazy_stream<T> ()>`, aliased in the code as `tail_gen_type`, is quite weird to a C++03 developer eyes. The above type can wrap any function with no input parameters and returning a lazy stream. That includes:

1. Plain standalone functions
2. Member functions (static or non-static)
3. Lambda (anonymous) functions, with or without closures.

The use of `tail_gen_type` is the keystone of the whole design. The use of lambdas (with or without closures) representing tail generators is widespread along the code. All those functions can be wrapped (or represented) by the same type: `tail_gen_type`.

As an illustrative example, let’s have a look at `take` method implementation. Remember that this method takes an input stream and returns a new stream with the first n elements of the former.

```// take method implementation
template <typename T>
lazy_stream<T> lazy_stream<T>::take(size_t n) const
{
if(empty_ || n<1)
return lazy_stream<T>();         // [1]
const tail_type& tail_ = tail();     // [2]
return lazy_stream<T>(*head_ptr_, [tail_, n]() -> lazy_stream<T> {return tail_.take(n-1);}) // [3]
}   ```

This method, as well as the rest of methods in `lazy_stream<>`, is `const`.

If `this` stream is empty or if we ask for less than one element, we just return an empty stream [1]. Otherwise we evaluate the tail stream, invoking `tail()` [2]. Eventually we create and return a brand new lazy stream [3] whose head is `this`’s head and whose tail generator is given by the lambda:

```[tail_, n]() -> lazy_stream<T> {
return tail_.take(n-1);
});  ```

Observe that once the new lazy stream is created and returned, no further actions are taken. At this point we only know the head of this second stream (`*head_ptr_`). If we now invoke `tail()` on this newborn stream, a third lazy stream will be created by evaluating:

`tail_.take(n-1); `

in the lambda, where `tail_` is the tail of the first stream.

We have been lazy, but in fact we could have been even more. Consider this alternative (but not implemented) code:

```// Alternative take method implementation
template <typename T>
lazy_stream<T> lazy_stream<T>::take(size_t n) const
{
if(empty_ || n<1)
return lazy_stream<T>();
const tail_ptr_type& tail_ptr = tail_ptr_;
const tail_gen_ptr_type& tail_gen_ptr = tail_gen_ptr_;
[tail_ptr, tail_gen_ptr, n]() -> lazy_stream<T> {
// This cannot happen
assert(!(tail_ptr && tail_gen_ptr));
if(tail_ptr)
return tail_ptr->take(n-1);
return (*tail_gen_ptr)().take(n-1);
});
} ```

The above implementation is indeed a little more efficient than the original one: we do not pass the stream head into the lambda. Besides, it is a little lazier: the first invocation of `tail()` is inside the lambda. I have disregarded it only for a subjective reason: it loses clarity with respect to the original one and it is a little more verbose. Feel free to choose what you like more.

Because the lazy streams herein developed are immutable, they cannot be assigned (assignment operation is disabled). Copy-construction, on the other hand, is available at a reasonable cost. As we already know, a lazy stream object stores its information in the heap, keeping only two active smart pointers pointing to it. When copying, we are actually sharing resources ownership.

## Implementation pitfalls in C++03

My first attempt after the Scala course was to write a lazy stream class in C++03. After racking my brains for a long while, I gave up. In C++03 there are no lambdas, so in order to approach them I tried to use function classes (functors). For instance, let’s turn the following lambda into a functor:

```[tail_, n]() ->lazy_stream<T> {
return tail_.take(n-1);
});  ```

The equivalent functor could be:

```template <typename T>
class take_functor
{
typedef lazy_stream<T> tail_type;
T n_;            // Closure parameter
tail_type tail_; // Closure parameter
public:
take_functor(const T& n, const tail_type& tail) : n_(n), tail_(tail) {}
tail_type operator()() const
{
return tail_.take(n_-1);
}
}; ```

The essential difference lies in the fact that every lambda function with symbolic: type `()=>lazy_stream<T>` (in Scala notation), no matter how it may be, can be represented by a `std::function<lazy_stream<T> ()>`. As far I know an analogous representation for functors is not possible in C++03. Moreover, `std::function` does not even exist in C++ specifications prior to 2011. Thus, when using C++03 I was not able to give all the functors I wrote a common representation. What unique type in C++03 can hold a functor with any closure parameters representing a function of type: `()=>lazy_stream<T>`? I think none. So the key problem was I could not figure out how to write a tail generator in C++03.

## Source code

The `lazy_stream<>` implementation code has been attached, along with a main sample code. Only a subset of the corresponding Scala methods (the most insightful, from my viewpoint) has been translated. The class allows extension and further optimization.

The code has been written with Code::Blocks and compiled with MinGw (mingw-get-inst-20120426).

## History

• December 2012. Initial version.
• May 2014. Version 1.1. Main changes:
• Constructors have been templatized in order to support rvalues arguments. (Thanks to Lukasz Gwizdz, another codeproject member, for his insight).
• New member functions added: `filter_not`, `find_first`, `reduce_left`, etc.

## References

1. Coursera. Functional Programming Principles in Scala. Martin Odersky. (www.coursera.org).
2. Louis Botterill’s mostly software and techy Blog. Scala, lazy evaluation and the Sieve of Eratosthenes. (http://louisbotterill.blogspot.com.es/2009/07/scala-lazy-evaluation-and-sieve-of.html).
3. Wikipedia. Sieve of Eratosthenes. (http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).
4. C/C++ Operator Precedence and Associativity. (http://www.anne-gert.nl/techcorner/cpp/cpp-operators.html).
5. Functors: Function Objects in C++. Alex Allain. (http://www.cprogramming.com/tutorial/functors-function-objects-in-c++.html).

## Share

 Systems Engineer Spain
I work as a senior industrial engineer for Public Administration in Spain. I have experience in developing software for ballistic computations. I like maths and programming and, above all, riding my mountain bike. I run my own website on computation and numerical calculus at rumigaculum.com. Contact me here.

## You may also be interested in...

 First Prev Next
 Fajny artykuł Member 115785053-Apr-15 3:43 Member 11578505 3-Apr-15 3:43
 Some related code (~100 LOCs) and paper - using ANSI C++ (circa 2005) ucbearcat10-Jun-14 6:38 ucbearcat 10-Jun-14 6:38
 Re: Some related code (~100 LOCs) and paper - using ANSI C++ (circa 2005) David Serrano Martínez10-Jun-14 22:04 David Serrano Martínez 10-Jun-14 22:04
 Use boost::function to hold your functor and boost::bind to capture your variables for C++03 Wong Shao Voon20-May-14 22:42 Wong Shao Voon 20-May-14 22:42
 Vote of 5 GanesanSenthilvel19-May-14 8:50 GanesanSenthilvel 19-May-14 8:50
 My vote of 5 hatemtaleb28-Aug-13 8:48 hatemtaleb 28-Aug-13 8:48
 My vote of 5 Mihai MOGA15-Jan-13 7:51 Mihai MOGA 15-Jan-13 7:51
 Re: My vote of 5 David Serrano Martínez19-Feb-13 4:43 David Serrano Martínez 19-Feb-13 4:43
 My vote of 4 ahlav21-Dec-12 7:25 ahlav 21-Dec-12 7:25
 Last Visit: 31-Dec-99 19:00     Last Update: 29-Nov-15 18:49 Refresh 1