Download steps_loop_tests.zip

Changes From the Previous Version

Since the previous version of this article the following changes have been made:

a section with more formal definitions of the proposed loop constructs have been added;
the implementation pays more attention to the corner cases (when step is zero, when the loop iterates over the whole range of the corresponding integral type);
the benchmarks contain results for compilers under Windows and under Linux.
the section 'Tests for Correctness' has been added.

Introduction

In this article, the following loops are proposed:

for (auto&& x : step(start, count, step-width)) { ... }

for (auto x : step(count)) {...}

for (auto x : step_backward(count)) {...}

for (auto x : step_to(start, finish, step-width)) {...}

for (auto x : step_until(start, finish, step-width)) {...}

They are based on the C++11 for-range loop. In order to write such loops, the appropriate functions step, step_backward, step_to and step_until have to be implemented. The article shows their implementation, discusses advantages of such loops and includes the benchmark results.

Background

There are some cases when we need to produce values sequentially, for instance with a fixed step. If we need to use iterators, which is a natural thing to do, we have to store the values them in a container. This can be a problem if a sequence consists of several million elements. It will take much space if we store it in a container; we will have to provide a move constructor and move assignment operator, and, in general, be careful how to pass the containers between functions.

Using standard for-loops may be inconvenient: writing descending sequences using unsigned integers that stop at 0 is not staightforward. The following will not work:

for (std::size_t  j =  N-1;  j >= 0; --j) // infinite loop

{

. . .

}

We have to write the following loop, which is not obvious for beginners:

for (std::size_t  j =  N;  j-- != 0;)

{

      . . .

}

Floating-point loops, which step through values of floating-point types, have also some issues. The following loop

for (double x = 0.5; x < 1.0; x += 0.1)

{

    std::cout << std::setprecision(17) << x << std::endl;

}

will print (on most computers) something like that:

0.5

0.59999999999999998

0.69999999999999996

0.79999999999999993

0.89999999999999991

0.99999999999999989

We get six values instead of the "expected" five. In order to achieve the desired effect we can have two options. One of them is to use integer steps and write the following code:

double x = 0.5;

for (int i = 0; i < 5; ++i)

{  

     std::cout << std::setprecision(17) << x << std::endl;

     x += 0.1;

}

Another option is to subtract a small epsilon from the upper bound:

double epsilon = 0.05;

double right_bound = 1.0 – epsilon;

for (double x = 0.5; x < right_bound; x += 0.1)

{

    std::cout << std::setprecision(17) << x << std::endl;

}

In other programming languages (like Python [1], Pascal, etc) it is possible to use ranges with steps.

The implementation proposed in this article has been tested in VC++ CTP 2014, GNU C++ 4.9, GNU C++ 4.9.2 and Clang 3.6.

Existing Implementations in C++ and Constructs in Other Languages

In Boost [2], there is a class template, called irange, which is defined as follows:

template<class Integer>

iterator_range< range_detail::integer_iterator<Integer> >

irange(Integer first, Integer  last);


template<class Integer, class StepSize>

iterator_range< range_detail::integer_iterator_with_step<Integer, StepSize> >

irange(Integer first, Integer last, StepSize step_size);

It can be used to iterate over a range [first, last) with a fixed step:

    std::size_t m = 5;

    for (auto x : boost::irange(0U, m)) // OK in the 32-bit code

    {

        std::cout << x << " ";

    }

std::cout << std::endl;

This loop will output:

0 1 2 3 4

The last parameter is always excluded from the range. Negative steps are allowed:

std::vector<int> v = { 1, 2, 3, 4 };

for (auto i : boost::irange(static_cast<int>(v.size()) - 1, -1, -1))

{

        std::cout << v[i] << std::endl;

}

The irange template allows to write loops for integer ranges. It is similar to the range function in Python [1].

Here are some issues. First of all the first and the last bounds of the loop should be of the same type. The following expressions will not compile:

boost::irange(0, m), // the first parameter is not unsigned

boost::irange(v.size() - 1, -1, -1) // v.size() is unsigned

Even boost::irange(0U, m) would fail in 64-bit code on most compilers. It should be written as follows:

boost::irange(0ULL, m)

or even better

boost::irange(static_cast<std::size_t>(0), m) // will work in any code, including 32-bit and 64-bit

The second issue is that we prefer to use indice of std::size_t and it is not convenient to use irange for loops with descending ranges using unsigned values. It would be better if the descending range excluded the first bound and included the last one.

For floating-point number, there is the linspace function, defined in the Nympy library for Python [3], which allows to write loops as follows:

for x in np.linspace(2.0, 3.0, num=5, endpoint=True):

print(x)

This loop will output the following values:

If, on the other hand, endpoint is defined as False, the output will be:

In MATLAB [4], there is a function called linspace as well, but it always includes the last bound.

Proposed Loop Constructs

The following four types of loop constructs are proposed.

1. The step loop

1.1. The three-parameter step loop

The following loop

for(auto&& control_variable : step(start_value, step_count, step_width)) statement

is equivalent to the following statement

{
    auto control_variable = start_value;   
    auto __counter = step_count;
    auto __stepw = step_width;
    while (__counter-- != 0)
    {
        statement;
        control_variable += __stepw;
    }   
}

where __counter and __stepw are defined for exposition only (which means that they are not explicitly available, and the actual implementation may use different variables to achieve the same effect). The type of step_count must be integral.

Example 1

Here is a sample step loop:

for (auto&& i : step(10, 5, -2))
{
    std::cout << i << std::endl;
}

It will output the following values:

Example 2

The following loop will not output anything because step_width is zero and the body will not be executed.

for (auto&& i : step(10, 10, 0))
{
    std::cout << i << std::endl;
}

Example 3

The steps loop can be used for generation not only arithmetic values:

for (auto c : step(std::string("!"), 6, std::string("*")))
{
    std::cout << c << std::endl;
}

This code will print:

!
!*
!**
!***
!****
!*****

This means that the steps loop can also be used, for instance, to iterate over various user-defined Big Numbers.

1.2. The two-parameter step loop

If the last parameter of step is absent, it is assumed to be of int type with the value 1. So, the following
loop

for(auto&& control_variable : step(start_value, step_count)) statement

is equivalent to

for(auto&& control_variable : step(start_value, step_count, 1)) statement

Example

Here is sample two-parameter step loop:

for (auto&& i : step(3, 5)) 
{
    std::cout << i << std::endl;   
}

It will output this:

1.3. The single-parameter step loop

The following loop

for(auto control_variable : step(step_count)) statement

is equivalent to this statement

for(auto&& control-variable : step(0,step-count, 1)) statement

Example

This is a sample single-parameter step loop:

for (auto i : step(5))
{
    std::cout << i << std::endl;
}

It will output the following values:

2. The step_backward loop

This loop

for(auto control_variable : step_backward(step_count) statement

is equivalent to the following statement

{
     auto __counter = step_count;
     for(auto&& control_variable = step(__counter-1, __counter, -1)  statement
}

where __counter is defined for exposition only.

Example

This is a sample single-parameter step loop:

for (auto i : step_backward(5))
{
    std::cout << i << std::endl;
}

It will output the following values:

3. The step_to loop

3.1. The three-parameter step_to loop

This loop

for (auto control_variable : step_to(start_value, target_value, step_width) statement

is equivalent to the following statement

{
    typename template CommonType<decltype(start_value),decltype(target_value)>::type __common_type;
    __common_type control_variable = start_value;
    __common_type __target = target_value;
    auto __stepw = step_width;
    if (__stepw > 0)
    {
        if (control_variable <= __target)
        {
            while (true)
            {
                statement;
                if (control_variable > __target - __stepw)
                    break;
                control_variable += __stepw;
            }
        }
    }
    else if (__stepw < 0)
    {
        if (control_variable >= __target)
        {
            while (true)
            {
                statement;
                if (control_variable < __target - __stepw)
                    break;
                control_variable += __stepw;
            }
        }
    }
}

where __common_type, __target and __stepw are defined for exposition only.

The type CommonType<T,U>::type is defined as follows:

if T and U represent the same type, then this type is used;
otherwise, decltype(T()+U()) is used.

The types of all the expressions start_value, target_value and step_width must be integral. The value of target_value might not be the final value of control_variable. If __common_type is int or of the same or higher rank (unsigned, long, unsigned long, long long, unsigned long long, etc.) then the effect of the loop over the whole range of values of this type are implementation-dependent, which means you should not rely on the results and avoid writing such loops.

Example 1

Here is a sample step_to loop:

for (auto i : step_to(10, 2, -2))
{
    std::cout << i << std::endl;
}

It will output the following values:

Example 2

The effect of the following loop is implementation-dependent:

unsigned long long counter = 0;

for (auto i : step_to(std::numeric_limits<int>::max(), std::numeric_limits<int>::min(), -1))
{
    counter++;
}

std::cout << counter << std::endl; // implementation-dependent output

Example 3

The effect of this loop is properly defined (the loop counter i will be of type long long):

unsigned long long counter = 0;
long long max1 = std::numeric_limits<int>::max();
long long min1 = std::numeric_limits<int>::min();

for (auto i : step_to(max1, min1, -1))
{
    counter++;
}

std::cout << counter << std::endl;

The result will be (a-b+1), which is in most implementations:

4294967296

3.2. The two-parameter step_to loop

In step_to, the last parameter can be absent. In this case it is assumed to of of int type with the value 1.
This loop

for (auto control_variable : step_to(start_value, target_value) statement

produces the same results as the following statement:

for (auto control_variable : step_to(start_value, target_value, 1) statement

4. The step_until loop

4.1. The three-parameter step_until loop

The following loop

for (auto control_variable : step_until(start_value, target_value, step_width) statement

is equivalent to the following statement

{
    typename template CommonType<decltype(start_value),decltype(target_value)>::type __common_type;
    __common_type control_variable = start_value;
    __common_type __target = target_value;
    auto __stepw = step_width;
    if (__stepw > 0)
    {
        if (control_variable < __target)
        {
            while (true)
            {
                statement;
                if (control_variable >= __target - __stepw)
                    break;
                control_variable += __stepw;
            }
        }
    }
    else if (__stepw < 0)
    {
        if (control_variable > __target)
        {
            while (true)
            {
                statement;
                if (control_variable <= __target - __stepw)
                    break;
                control_variable += __stepw;
            }
        }
    }
}

where __common_type, __target and __stepw are defined for exposition only. The types of all the expressions start_value, target_value and step_width must be integral. The value of target-value will never be assigned to control_variable.

Example 1

Here is a sample step_until loop:

for (auto i : step_until(10, 2, -2))
{
    std::cout << i << std::endl;
}

It will output the following values:

Example 2

Here is another step_until loop:

unsigned long long counter = 0;

for (auto i : step_to(std::numeric_limits<int>::max(), std::numeric_limits<int>::min(), -1))
{
    counter++;
}

std::cout << counter << std::endl;

This code will output the value of the expression

static_cast<long long>(std::numeric_limits<int>::max())-static_cast<long long>(std::numeric_limits<int>::min())

On most 32-bit and 64-bit systems the output will be:

4294967295

4.2. The two-parameter step_until loop

In step_until, the last parameter can be absent. In this case it is assumed to of of int type with the value 1.
This loop

for (auto control_variable : step_until(start_value, target_value) statement

produces the same results as the following statement:

for (auto control_variable : step_until(start_value, target_value, 1) statement

Implementation

Creating a Class to be Used in a For-Range Loop

In C++11 there is a for-range loop which allows to iterate over elements of a container. Here is an example:

std::vector<unsigned> a{ 5, 6, 7, 8 };
for (auto x : a)
{
   std::cout << x << std::endl;
}

This loop is equivalent to the following loop:

for (auto it = a.begin(), itStop = a.end(); it != itStop; ++it)
{
    auto x = *it;
   
    std::cout << x << std::endl;   
}

In the for-range loop, the class a does not have to be a container. It can be a class, which does not store values: it can generate them through the iterator provided.

We need a starting value, a number of steps and a step width. The exact types of these values can vary depending on the loop we would like to write. In this case we have to create a template. Here is a general structure of the template:

template<class T, class N, class S>
class steps_class
{
public:

  // types

    typedef to-be-defined iterator;

// constructor

    steps_class(T start, N step_count, S step);

// iterators

    iterator begin() const;

    iterator end() const;

// property functions

    T first_value() const;
    N size() const;
    S step_length() const;
};

We can use this class as follows:

std::string str = "apples";
for (auto i : steps_class<std::size_t, std::size_t,int>(str.size()-1, str.size(), -1))
{
   std::cout << str[i] << std::endl;
}

This will print:

s
e
l
p
p
a

The loop over floating-point values, discussed in the introduction, can be re-written as follows:

for (auto x : steps_class<double, unsigned, double>(0.5, 5, 0.1))
{
    std::cout << x << std::endl;
}

Here is the full definition of steps_class:

template<class T, class N, class S>
class steps_class
{
public:

    class steps_iterator : public std::iterator<std::forward_iterator_tag, T>
    {
        N m_step_counter;
        T m_counter;
        S m_step;

    public:
        steps_iterator() : m_step_counter(N()) {}
        steps_iterator(N steps_count, T start, S step) :m_step_counter(steps_count), m_counter(start), m_step(step)
        {
        }

        steps_iterator& operator++()
        {
            --m_step_counter;
            m_counter += m_step;
            return *this;
        }

        steps_iterator operator++(int)
        {
            steps_iterator  it = *this;
            --m_step_counter;
            m_counter += m_step;
            return it;
        }

        const T& operator*() const
        {
            return m_counter;
        }

        const T* operator->() const
        {
            return &m_counter;
        }

        bool operator== (const steps_iterator& r) const
        {
            return m_step_counter == r.m_step_counter;
        }

        bool operator!= (const steps_iterator& r) const
        {
            return !(m_step_counter == r.m_step_counter);
        }
    };

    typedef steps_iterator iterator;

private:
    iterator m_end_iterator;
    iterator m_start_iterator;

public:
    steps_class(T start, N step_count, S step) : m_start_iterator(step_count, start, step)
    {
    }

    iterator begin() const
    {
        return m_start_iterator;
    }

    iterator end() const { return m_end_iterator; }
};

Implementation of Loop Constructs

1. Implementation of the step loop

It is not really convenient to use steps_class: it requires to explicitly specify all the types of the parameters. It is much easier to provide convenient functions for writing various loops. The three- and two-parameter step loop can be implemented as follows:

template<class T, class N, class S = int>
inline auto step(T a, N n, S s = 1)->steps_class<decltype(a + s), N, S>
{
    static_assert(std::is_integral<N>::value, "step: the second parameter should be of integral type");
    return steps_class<decltype(a + s), N, S>(a, n, s);
}

The single-parameter step loop implementation can be as follows :

template<class N>
inline steps_class<N, N, int> step(N n)
{
    static_assert(std::is_integral<N>::value, "step: parameter should be of integral type");
    return steps_class<N, N, int>(N(0), n, 1);
}

2. Implementation of the step_backward loop

The step_backward loop can be implemented as follows:

template<class N>
inline steps_class<N, N, int> step_backward(N n)
{
    static_assert(std::is_integral<N>::value, "step_backward: parameter should be of integral type");
    return steps_class<N, N, int>(n + int(-1), n, int(-1));
}

3. Implementation of the step_to and step_until loops

First of all, we have to define the CommonType template definition. Here is a simple definition of this template:

template<class T, class U>
class CommonType
{
    typedef typename std::remove_reference<T>::type T1;
    typedef typename std::remove_reference<U>::type U1;
 
public: 
    typedef decltype(T1() + U1()) type;
};

template<class T>
class CommonType<T,T>
{
    typedef typename std::remove_reference<T>::type T1;

public: 
    typedef T1 type;
};

Since we may pass varaible as parameters to the step_to and step_until loops, the types T and U may be, for instance, int&, char& instead of simple: int and char. That's why we have to use remove_reference.

In this implementation we will be using steps_class to count the number of iterations. It is convenient to use std::size_t and the counter type, unless its rank is less than the common type of the first two parameters (start_value and target_value). In this case we can start the definition of the step_to as follows:

template<class T, class U, class S = int,
         class StartType = typename CommonType<T, U>::type,
         class CounterType = typename CommonType<std::size_t, StartType>::type>
inline steps_class<StartType, CounterType, S> step_to(T a, U b, S s = 1)

Two extra types have been added to the template parameters: StartType and CounterType. The former will be the type of the values produced in the loop, the latter will be used to count the number of steps.

The full definition of those loop will be as follows:

template<class T, class U, class S = int,
            class StartType = typename CommonType<T, U>::type,
            class CounterType = typename CommonType<std::size_t, StartType>::type>
inline steps_class<StartType, CounterType, S> step_to(T a, U b, S s = 1)
{                
    static_assert(std::is_integral<StartType>::value, "step_to requires integral parameters");       
    CounterType  n = 0;

    if (s == 0 || b != a && (b < a) != (s < 0))
    {
        n = 0;           
    }
    else
    {
        S abs_s = s > 0 ? s : -s;
        CounterType diff = b > a ? (long long)b - (long long)a : (long long)a - (long long)b;

        if (abs_s == 1)
            n = diff;
        else
            n = diff / abs_s;

        n++;           
    }
    return steps_class<StartType, CounterType, S>(a, n, s);
}

template<class T, class U, class S = int,
            class StartType = typename CommonType<T, U>::type,
            class CounterType = typename CommonType<std::size_t, StartType>::type>
inline steps_class<StartType, CounterType, S> step_until(T a, U b, S s = 1)
{       
    static_assert(std::is_integral<StartType>::value, "step_until requires integral parameters");       
    CounterType  n;

    if (b == a || s == 0 || (b < a) != (s < 0))
    {
        n = 0;
    }
    else
    {
        S abs_s = s > 0 ? s : -s;           
        CounterType diff = b > a ? (long long)b - (long long)a : (long long)a - (long long)b;

        if (abs_s == 1)
            n = diff;
        else
            n = diff / abs_s;

        CounterType finish = a + n*s;
        if (finish != b)
        {
            n++;
        }
    }
    return steps_class<StartType, CounterType, S>(a, n, s);
}

Other Examples

The steps function can be used for generation not only arithmetic values:

for (auto c : step(std::string("!"), 6, std::string("*")))
{
    std::cout << c << std::endl;
}

This code will print:

!
!*
!**
!***
!****
!*****

This means that the steps function can also be used for various user-defined Big Numbers.

Benchmarks

The following loops were tested for benchmarks:

T1 - two-level loops with single-parameter steps;
T2 - two-level loops, stepping through vector indices with the step width of 2;
T3 - two-level loops, stepping through vector indices with the step width of 5;
T4 - integral calculation using the 7-point open Newton-Cotes rule.

I ran tests under Windows 8.1 and under Linux Mint 17 Qiana. The computers were different:

Windows ran on a computer with 8GB RAM and a Quad Core i7-4790, 3.6 GHz processor;
Linux ran on a computer with 6GB RAM and a Dual Core i5-2410M, 2.3 GHz processor.

The following compilers were used:

VC++ 2014;
GCC C++ 4.9.0;
GCC C++ 4.9.2;
Clang 3.6

For VC++ the following optimizations were set up:

For the other compilers the option -O3 was used.

The results slightly differ in various compilers and there was some difference between the 32-bit and 64-bit models. The results for Windows are shown in Figure 1.

Figure 1. Benchmark results under Windows.

The timings are in milliseconds; percentage was calculated as follows:

100*(timing_steps - timing_standard)/timing_standard

The negative percentage means that the steps constructs performed better. I have also highlighted in yellow the results which were close (differed within 1% from one another), in green -- if the steps code peformed better by more than 1%, and in red if the steps perform worse by more than 1%.

The Linux results are shown in Figure 2.

Figure 2. Benchmark results under Linux

Overall, the results for the steps constructs showed a very good performance.

Tests for Correctness

The tests contain various loops with defferent ranges. The implementation-dependent features for stop_to loops have been included as well. In general, the following issues in different implemetations have been observed (they are well-known issues):

in VC++ in both 32-bit and 64-bit code, the constant -2147483648 is of type unsigned and has the value 2147483648, which is diffrent from std::numeric_limits<int>::min(), which has correct value; at the same time -2147483648LL is correct;
in GCC and Clang the constant -2147483648 takes 8 bytes in memory.

Both these issues make step_to loop behave differently in different compilers. That is the reason I mentioned some implementation-dependent issues. In the provided tests, the implementation depended issues are categorized as "acceptable errors": they do not produce results that we want, but they are acceptable. We should not rely on the implementation-dependent code anyway and avoid such loops.

Other issues include floating-point calculations. The provided tests compared like with like and produced exactly the same results using standard loops and the step loops. But it is possible to slightly modify those tests so that, with aggressive optimization, they may produce different results, although the the calculations used look the same. There are several reasons for this. I will name two main ones:

possible change in the order of calculations (for instance, summation of floating-point numbers, due two rounding, may produce different results);
the Intel floating-point arithmetic traditionally uses 80-bit floating-point numbers, but calculations on xmm registers (which usually produce faster code) use 64-bit numbers (there is an obvious difference in precision).

References

[1]. Python 3.2 Build-in Function.: https://docs.python.org/3.2/library/functions.html

[2]. Boost irange: http://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/ranges/irange.html

[3]. NumPy linspace: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

[4]. MATLAB linspace: http://www.mathworks.co.uk/help/matlab/ref/linspace.html