Treaps in C

Tomasz_Wisniewski

4.83/5 (11 votes)

May 4, 2015

CPOL

4 min read

20163

402

A treap implementation in C

Introduction

A treap is a data structure combining both the binary heap and the BST tree. It's a self organizing, self-balancing data structure. It's main advantage when confronted with other self-balancing trees like AVL tree or Red-Black Tree is the simplicity of implementation and quite good performance. It can be used as an efficient backend when implementing other data structures like i.e.. priority queues. Time complexity for all of the operations on a Treap is O(log n) which makes it a very attractive data structure.

For more details regarding this data-structure I encourage the reader to have a look on a great and concise description in the Wikipedia

In the rest of the article I will assume that the reader has the basic understanding of principles behind how the data structure is built and maintained.

Background

Recently I needed to implement a priority queue in C. I was thinking for a while what data structure to use ? My first thought was of course heap. Accessing the minimum priority element is O(1) which is exactly what I needed. But my other requirement was to be able to quickly find any element, not only the one with the minimum priority. Now, that's a problem. Considering an array heap implementation, looking up the minimum priority element would be O(1), but looking any other (random-access) would be O(n). That's acceptable if your heap is small, but along with the size, the performance deteriorates quickly.

After doing some research I finally decided to use a treap. Using a treap makes the minimum priority element lookup slower. It's O(log n), but random priority element lookup is O(log n) as well, which seems like a fair trade off.

Of course the lookup time in the treap may deteriorate and be even O(n) for any operation if your random number generator is very poor. In most of the cases though, using standard POSIX rand(), I was getting pretty satisfying results.

Implementation details

I introduced two data types in order to implement the treap. The essential one, represents the treap's node. It holds the heap's priority, the actual data, and the BST pointers to children nodes. This is a sufficient minimum, however most of the time I like to have a wrapper, representing the data-structure as a separate entity. This allows me to store some additional information along.

/**
 * @brief leaf enumeration
 */
typedef enum _treap_leaf {
    L_LEFT = 0,
    L_RIGHT,
    L_MAX,
} treap_leaf;


/**
 * @brief treap node declaration
 */
struct treap_node {
    struct treap_node *l[L_MAX];

    /* random priority */
    int p;

    /* the data */
    treap_data_t d;
};


/**
 * @brief treap data type declaration
 */
struct treap {
    struct treap_node *root;
    treap_cmp_t cmp;
    size_t n;
};

Using the code

There are some prerequisites. The implementation provides a naive type obliviousness. As it's visible in the struct treap_node declaration, treap_data_t type definition is required. Another thing is the treap_cmp_t visible in the struct treap definition. This is a comparison function which operates on the encapsulated data type. The treap.h header file requires a presence of treap_data_type.h header within the inclusion path. This header should define the treap_data_t and declare the mentioned comparison function. Here's an example for int:

typedef int treap_data_t;
int treap_int_cmp(treap_data_t *a, treap_data_t *b);

The comparison function has more or less the same format as the one used by standard library's qsort. It should do the same thing as well. Sample implementation may look like so:

int treap_int_cmp(treap_data_t *a, treap_data_t *b) {
    return (*a > *b ? 1 : (*a == *b ? 0 : -1));
}

Those are all of the pre-requisites. In order to use the treap, it needs to be allocated. The implementation provides a set of uniform API's to operate on it. Example usage is given bellow:

void treap_usage() {
    struct treap *t = treap_alloc();
    struct treap_node *n = NULL;
    int min, max;

    t->cmp = treap_int_cmp;

    treap_insert(t, 123);
    treap_insert(t, 456);
    treap_insert(t, 789);

    n = treap_find(t, 456);
    if (n) treap_delete(t, n);

    min = treap_min(t)->d;
    max = treap_max(t)->d;

    treap_dealloc(t);
}

The following set of operation is provided (most of the function names is self explaining, Doxygen documentation contains the rest of necessary details):

treap_alloc
treap_dealloc
treap_insert
treap_delete
treap_modify
treap_traverse_in
treap_traverse_level
treap_max_height_get
treap_balance_factor_get
treap_parent_find
treap_find
treap_min
treap_max

Points of Interest

The priority queue implemented on top of that data structure behaves quite good. Performance in my use case; where a need to have a random access as well as minimum priority element was essential, is very satisfying. It's a lot better than an array heap implementation and doing a linear search.

Regarding the self-balancing capability of this data structure, I wrote a set of simple tests for that implementation as well. One test-case (it doesn't actually do any code verification) inserts 10000 elements in sorted order into the treap, then removes one third of that and inserts another 10000. The purpose was only to measure the balance factor of the tree. Just as a reminder the balance factor is the difference between root's left subtree and right subtree. The closer it is to zero, the better balance we have. I did some tests. There are occurrences where rand() doesn't really provide a uniformly distributed range of random numbers and that has it's impact on the balance obviously. However, most of the time the balance is pretty good I would say, especially when taking into account the fact that the incoming data is always inserted in the sorted order. Just to have some reference, I ran my tests a couple of times and here are some example numbers:

After 32 runs.

Average balance factor:

#6667 elements: -1,44

#10000 elements: 0,59

#16667 elements: -0,13

Best noticed balance factor:

#6667 elements: 0

#10000 elements: 1

#16667 elements: 0

Worst noticed balance factor:

#6667 elements: 13

#10000 elements: 17

#16667 elements: 14