Snake Two Ways: Exploring Circular Arrays and Queuing

Michael Sydney Balloni

5.00/5 (3 votes)

Mar 7, 2024

CPOL

4 min read

7610

A "new" queue data structure is described and compared with existing collection types

Download source code - 12.6 KB

Introduction

So I thought I had a great idea, something new to offer the canon of computer science. Hubris, I think that's the word.

I thought I could create a class that served the function of a double-linked list, but based on an array, resulting in amazing performance. It would have a couple "markers" that pointed into the array to track the head and tail of the list, and you would be able to add or remove at either end with ease and low cost. I called the data structure a Snake, as it had a head and a tail and it could wrap around itself.

I'd heard of circular arrays, ring buffers I think they were called, but I somehow forgot about that. Alas.

So I coded up my novel idea. And it was fun to code! I wrote unit tests, worked out all the bugs, and felt great about what I'd created. I then set out to measure performance. But first, on the matter of misconceptions...

Misconceptions

My first misconception was that .NET's Queue class is based on LinkedList or some internal such thing. It is not. According to the documentation, Queue uses circular arrays, which are simpler than Snakes, but all the tech they need to quickly add to one end and remove from the other without a lot of memory allocations like linked lists do. The Queue class supports a wide variety of operations and accessors; it's a great class.

Another couple misconceptions I had are that C++'s std::queue class was based on the double-linked std::list class, and that the std::deque class is inefficient. C++'s queue class is by default built on std::deque. And std::deque is quite efficient as you'll see in the Performance Comparo section below.

My final misconception - likely not the last I'll ever have, just the last here - is that .NET's ConcurrentQueue class has bad performance.

Wrong on all accounts!

Circular Arrays Described

A circular array uses a normal array to store a sequence of elements. Unlike a normal array, a circular array allows for adding elements to the front of the sequence without shifting existing elements down. It does this by way of markers of a conceptual list: one marker for the head, another for the tail. As you add elements at the head, with decreasing index values in the array, when you reach the beginning of the array (index 0), the marker goes around to the end (index size - 1), and you continue adding elements from there, still with decreasing index values, until you bump against the tail marker. At this point, you grow the array and reset the markers and start back up again with room at the end for more elements.

Adding elements to the end is a better idea. Reading and writing use forward data access, making for better use of processor caching. Adding at the tail, you add until you reach the end of the array or the head marker, then go around to the front of the array. You keep adding until you bump into the head marker, when you grow the array and start again.

That may seem like a lot of bookkeeping for code to chase itself around and grow a lot. Where this data structure makes sense is when you are reading from the head in one part of the code while other code writes to the tail. Then the head chases the tail reading the data. A queue.

Circular arrays seem to be great for implementing queue data structures.

Here is what a circular buffer looks like in memory with three elements 
and free spaces around the head and tail:
[ ][ ][ ][1][2][3][ ][ ]
       ^              ^
       head marker    tail marker

Add element 4:
[ ][ ][ ][1][2][3][4][ ]
       ^              ^
       head marker    tail marker

Add 5:
[ ][ ][ ][1][2][3][4][5]
 ^     ^                 
 tail  head marker 
      
Add 6:
[6][ ][ ][1][2][3][4][5]
    ^  ^                 
 tail  head marker

Try to add 7, tail marker hits head marker, array is grown, element added, 
plenty of room to grow
[ ][1][2][3][4][5][6][7][ ][ ][ ][ ][ ][ ]...
 ^                       ^
 head                    tail

At this point, or perhaps while all that adding was going on, 
you could remove forward from the head until you reach the tail.

The head chases the tail forward in memory.

Snake Code

I wrote Snake in C# first because it was faster to develop and get right than C++. Once I was happy with my C# class, porting to C++ was easy. I had the tests to validate it, and it was a line-for-line port, no surprises.

Here's the C++ code for Snake:

#pragma once

#pragma warning(push)
#pragma warning(disable : 4365)
#include <exception>
#include <string>
#include <sstream>
#pragma warning(pop)

template <typename T>
class Snake
{
public:
    Snake()
        : m_data(new T[2]) // this must always have two array spots 
                           // for the two end markers
        , m_capacity(2)    // how big m_data is
        , m_size(0)        // size is never the length of the array, 
                           // always at least two less
        , m_startIdx(0)    // point the end markers...
        , m_endIdx(1)      // ...into the initial array
    {
    }

    ~Snake()
    {
        delete[] m_data;
        m_data = nullptr;
    }

    size_t size() const { return m_size; }
    
    size_t capacity() const { return m_capacity; }

    T front() const 
    { 
        if (m_size == 0)
            throw std::exception("No first element to get");
        else
            return m_data[(m_startIdx + 1) % m_capacity];
    }
    
    T back() const 
    { 
        if (m_size == 0)
            throw std::exception("No last element to get");
        else
            return m_data[(m_startIdx + m_size) % m_capacity];
    }

    void push_front(const T& elem)
    {
        size_t new_start_idx, data_idx;
        if (m_startIdx == 0)
        {
            new_start_idx = m_capacity - 1;
            data_idx = 0;
        }
        else
        {
            new_start_idx = m_startIdx - 1;
            data_idx = m_startIdx;
        }

        if (new_start_idx == m_endIdx)
        {
            grow();
            push_front(elem);
            return;
        }

        m_startIdx = new_start_idx;
        m_data[data_idx] = elem;
        ++m_size;
    }

    void push_back(const T& elem)
    {
        size_t new_end_idx, data_idx;
        if (m_endIdx == m_capacity - 1)
        {
            new_end_idx = 0;
            data_idx = m_capacity - 1;
        }
        else
        {
            data_idx = m_endIdx;
            new_end_idx = m_endIdx + 1;
        }

        if (new_end_idx == m_startIdx)
        {
            grow();
            push_back(elem);
            return;
        }

        m_endIdx = new_end_idx;
        m_data[data_idx] = elem;
        ++m_size;
    }

    void pop_front()
    {
        size_t proposed_idx = (m_startIdx + 1) % m_capacity;
        if (proposed_idx == m_endIdx)
            throw std::exception("No first element to remove");
        else
            m_startIdx = proposed_idx;
        --m_size;
    }

    void pop_back()
    {
        size_t proposed_idx = m_endIdx == 0 ? m_capacity - 1 : m_endIdx - 1;
        if (proposed_idx == m_startIdx)
            throw std::exception("No last element to remove");
        else
            m_endIdx = proposed_idx;
        --m_size;
    }

    T operator[](size_t idx) const
    {
        if (idx >= m_size)
            throw std::exception("The index is out of range");
        else
            return m_data[(m_startIdx + idx + 1) % m_capacity];
    }

    std::string diagnostic_dump()
    {
        std::stringstream sb;
        
        sb
        << "size: " << m_size
        << " - "
        << "start: " << m_startIdx
        << " - "
        << "end: " << m_endIdx
        << "\n";

        for (size_t idx = 0; idx < m_capacity; ++idx)
        {
            sb << "[" << idx << "]";

            if (idx == m_startIdx)
                sb << " (start)";
            if (idx == m_endIdx)
                sb << " (end)";

            sb << " " << m_data[idx] << "\n";
        }

        return sb.str();
    }

private:
    void grow()
    {
        // create the new larger array
        size_t new_capacity = m_capacity * 2;
        T* new_data = new T[new_capacity];

        // copy data from the original array into the front of the new array, 
        // sort of compacting it
        // start idx at 1 so it is after the start marker
#pragma warning(push)
#pragma warning(disable : 6386)
        for (size_t idx = 1; idx <= m_size; ++idx)
            new_data[idx] = m_data[(m_startIdx + idx) % m_capacity];
#pragma warning(pop)

        // reset state
        delete[] m_data;
        m_data = new_data;
        m_capacity = new_capacity;
        m_startIdx = 0;
        m_endIdx = m_size + 1;
    }

private:
    // The raw array and the number of elements in it
    T* m_data;
    size_t m_capacity;
    size_t m_size;

    // These indexes point to the head and tail markers in the array
    size_t m_startIdx;
    size_t m_endIdx;
};

Layout of Attached Code

The attached ZIP has the source code:

snakenet - C# class library containing the Snake class, and BufferQueue, a thread-safe wrapper queue class
snaketest - C# unit tests
snakeperf - C# cmd line app for comparing performance of various queuing approaches
snakeplus - C++ project with Snake class, unit tests, and performance test driver program, all-in-one

Performance Comparo

Now for what we've all been waiting for, let's stack them up and see where the chips fall. The tests involve enqueuing and dequeing ten million ints.

Here are the .NET test results:

Array (.NET List): 87 ms // no FIFO on this, just LIFO to measure 
                         // general performance of working with the 10M elements

// these "Easy" tests simply add the elements, then remove them all
Linked List Easy: 1279 ms
Snake Easy: 380 ms
Queue Easy: 140 ms       // wow, Queue is much faster than Snake!

// these tests create a Task for dequeuing, then enqueue the elements, 
// then wait on the Task
Linked List: 846 ms
Snake: 845 ms
BufferQueue: 744 ms      // a thread-safe class I created based on Snake
Queue: 675 ms

// Oh my, the clear winner!  It's not even close.
ConcurrentQueue: 162 ms

Here are the C++ test results:

std::vector: 46 ms       // not FIFO, LIFO: push_back the ints, pop_back until empty
Snake Easy: 204 ms
std::list (Linked List) Easy: 673 ms
std::queue (std::deque) Easy: 267 ms

Conclusion and Points of Interest

Writing collection types and getting them right and making them efficient is fun.
Just don't expect to "change the world": There's very rarely anything new under the sun.
.NET's Queue class is very efficient and a great choice when thread-safety is not a concern.
.NET's ConcurrentQueue class provides by far the fastest inter-thread queuing among tested types.
C++ data structures can be 2X faster than their line-for-line C# equivalents.
C++ std::queue is based on std::deque, not std::list.
C++ std::deque is efficient.
In all tests done here, it should be clear that there is no good reason to use a linked list for queuing.
There's no compelling reason to switch to using C++ solely for an increase in the performance of queuing code.

History

7^th March, 2024: Initial version