Click here to Skip to main content
Click here to Skip to main content

Number crunching: Why you should never, ever, EVER use linked-list in your code again

By , 21 Aug 2012
 

 

Mea Culpa - Please read before continuing

Let us look at the title again. It contains a very important prefix. The title says:

Number crunching: Why you should never, ever, EVER use linked-list in your code again.

The number crunching prefix is key to this whole article. If the "Number crunching:" part of the title is ignored it might be understood as a general advice to always stay away from linked-list, that is not the purpose of this article.

The point of the article is to explain why linked-list is so badly suited for number crunching tasks. What number crunching is will soon be explained. Even so there are of course always exceptions to the rule and so the title cannot be 100 percent true.

The title above is a hyperbole. It strongly emphasizes a purpose but being deliberately over the top. Since I do have mixed feelings about it, and obviously so do some readers then a possible title change could be:

Why you should never, ever, EVER use linked-list for number crunching  

However I decided to keep it since it could be more confusing to change it. A few readers have also commented that they think the title is just right. So let's just leave it with that.

Now, why on earth did I decide in the first place for such a CodeProject unusual act as to put a hyperbole title on the article? Honestly, because the message needed to get out. My impression is that the performance impact of using a linked-list for number crunching is not commonly known and that is why I decided to write this article. 

I hope with this Mea Culpa  you can see beyond any initial confusion which is all my fault and see to the core issue: the cons of linked-list for a set of number crunching problems.

Table of Contents

Introduction

This article will explain why the linked-list is a less then ideal choice for dealing with number crunching.

Number crunching  is the term used here for describing the crunching, sorting, inserting, removing and simply juggling of raw numbers or POD data. This article is about why linked-list is not well suited for these tasks and should in fact be avoided. It is written with an attitude but backed up with facts and proofs. You are welcome, even encouraged, to verify these claims and all code and proofs are either accessible online or are available to download.

The intention is to clearly show some obvious examples to prove a point. The article is not written to cover every special case and exception to its recommendation.

These obvious examples are for some readers edge cases, for other readers it is common programming tasks. Some readers think the point of the article is common sense, others that it is a joke and are shocked when they realize that it is in fact reality. I especially recommend the latter readers to not easily dismiss this article but in fact take time to study it. You might just get a new insight.

In Real Life 

Number crunching tasks are sometimes similar to my examples below, sometimes not. They can be found in a variety of areas such as avionics technology, heart surgery equipment, surveillance technology and [...] well you name it. What to remember as you continue is that number crunching as used in this article is referring to numbers and POD data that is stored, retrieved, removed and sorted. Nothing weird, and very common and are always types that are cheap to copy

A few times in my professional career I have encountered performance issues where some time critical areas got bad performance just because of number crunching by linked-list. I have witnessed these linked-list related performance issues in a range of areas from real-time, embedded, small memory constrained PowerPC systems to x86 PC applications.

These times it came as a complete surprise to those involved. It was a surprise because to them it was proven by mathematical complexity models that it was a very efficient data structure for this or that use. That it was slow was considered bad, but OK, because other data structures would be slower according to the same mathematical line of reasoning.  

Unfortunately for a given problem set these mathematical models are likely to lead you astray. Following blindly the Big-O complexity of the data structures and algorithms might lead you to a conclusion that is completely and utterly false

Big-O notations tells nothing about how one (data structure with algorithm) fare against another. Big-O will only tell you how performance will degrade as n increases. So comparing one a data structure that is RAM intensive to another data structure that is cache friendly from an abstract Big-O point-of-view is just pointless.  This is nothing new, but since the books and online resources rarely, or ever, mention this not many know about it, or at least tend to forget.

 

Disclaimer 

Just in case my Mea Culpa and Introduction did not cover it. Let me be clear: This article is not disqualifying the linked-list for other types of usages, such as when it contains types that have pointers to other types, large and expensive-to-copy types (and yes, POD can also be too large).

This article is not about specific usage patterns where linked-list makes extra sense - because there sure are several of those. Take a peek later in the comments section and you can find several more, or less, good examples of such cases. This article is an opinionated article but backed up with hard facts. It does not discredit linked-list overall except in one specific area: number crunching.

Other types of containers are used to compare with the linked-list. The purpose is not to hype these containers but to show the areas where linked-list is lacking. It even goes as far as using these containers in clearly suboptimal ways which still by far outperforms the linked-list.

The code examples and container names in this article are from C++. std::vector would in Java be called ArrayList and the Linked-list in C++ is called std::list, which in Java it would be called LinkedList

OK, enough with the disclaimer, lets get going.

Scope

What data structure should you use? What data structure should you avoid?

Imagine that you have to use a data structure. It is nothing fancy, only a storage for raw numbers, or POD data carriers that you will use now and then and some later on.

These items will be inserted, then removed intermittently. They will have some internal order, or significance, to them and be arranged accordingly. Sometimes insertions and removal will be at the beginning or the end but most of the times you will have to find the location and then do an insertion or removal somewhere in between.
An absolute requirement is that the insertion and removal of an element is efficient, hopefully even O(1).

Now, what data structure should you use?



On-line resources

At a time like this it is good advice to double check with books and/or on-line resources to verify that your initial hunch was correct. In this situation maybe you recollect that vectors are generally fast for accessing an item but can be slow in modifying operators since items not at the end that are inserted/removed will cause part of the contents in the array to be shifted. Of course you also know that an insertion when the vector is full will trigger a re-size. A new array storage will be created and all the items in the vector will be copied or moved to the new storage. This seems intuitively slow.

A linked list on the other hand might have slow O(n) access to the item but the insertion/removal is basically just switching of pointers so this O(1) operations are very much appealing. I double check with a few different online resources and make my decision. Linked-list it is


BEEEEEEEEP. ERROR. A RED GNOME JUMPS DOWN IN FRONT OF ME AND FLAGS ME DOWN. HE TELLS ME IN NO UNCERTAIN WAYS THAT I AM WRONG. DEAD WRONG.

Wrong? How can this be wrong? This is what the online resources say:

www.cplusplus.com/reference/stl/list Regarding std::list[...] Compared to other base standard sequence containers (vector and deque), lists perform generally better in inserting, extracting and moving elements in any position within the container, and therefore also in algorithms that make intensive use of these, like sorting algorithms.

www.cplusplus.com/reference/stl/vector Regarding std::vector[...] vectors are generally the most efficient in time for accessing elements and to add or remove elements from the end of the sequence. [...] For operations that involve inserting or removing elements at positions other than the end, they perform worse than deques and lists.

http://en.wikipedia.org/wiki/Sequence_container_(C%2B%2B)#List Vectors are inefficient at removing or inserting elements other than at the end. Such operations have O(n) (see Big-O notation) complexity compared with O(1) for linked-lists. This is offset by the speed of access — access to a random element in a vector is of complexity O(1) compared with O(n) for general linked-lists and O(log n) for link-trees.

In this example most insertions/removals will definitely not be at the end, this is already established. So that should mean that the linked-list would be more effective than the vector, right? I decide to double check with Wikipedia, Wiki.Answers and Wikibooks on Algorithms. They all seem to agree and I cannot understand what the RED GNOME is complaining about.

I take a break to watch Bjarne Stroustrup’s Going Native 2012 keynote. At time 01:44 in the video-cast and slide 43, something interesting happens. I recognize what the RED GNOME has tried to to tell me. It all falls into place. Of course. I should have known. Duh. Locality of Reference.



Important Stuff comes now

  1. It does not matter that the linked-list is faster than vector for inserting/removing an element. In the slightly larger scope that is completely irrelevant. Before the element can be inserted/removed the right location must be found. And finding that location is extremely slow compared to a vector. In fact, if both linear search is done for vector and for linked-list, then the linear search for vector completely, utterly and with no-contest beats the list.

  2. The extra shuffling, copying overhead on the vector is time-wise cheap. It is dirt cheap and can be completely ignored if compared to the huge overhead for traversing a linked-list.

Why? you may ask? Why is the linear search so extremely efficient for vector compared to the supposedly oh-so-slow linked-list linear search? Is not O(n) for linked-list comparable to O(n) for vector?

In a perfect world perhaps, but in reality it is NOT! It is here that Wiki.Answers and the other online resources are wrong! as the advice from them seem to suggest to use the linked-list whenever non-end insertion/removals are common. This is bad advice as they completely seem to disregard the impact of locality of reference.



Locality of Reference I

The linked-list have items in disjoint areas of memory. To traverse the list the cache lines cannot be utilized effectively. One could say that the linked-list is cache line hostile, or that the linked-list maximizes cache line misses. The disjoint memory will make traversing of the linked-list slow because RAM fetching will be used extensively.

A vector on the other hand is all about having data in adjacent memory. An insertion or removal of an item might mean that data must be shuffled, but this is cheap for the vector. Dirt cheap (yes I like that term). The vector, with its adjacent memory layout maximizes cache utilization and minimizes cache line misses. This makes the whole difference, and that difference is huge as I will show you soon.

Let us test this by doing insertions of random values. We will keep the data structure sorted and to get to the right position we will use linear search for both vector and the linked-list. Of course this is silly way to use the vector but I want to show how effective the adjacent memory vector is compared to the disjoint memory linked-list.

The Code

/* NumbersInVector &randoms is pre created random values that are stored in a 
   std::vector. This way the same random values can be used for testing and comparing the 
   vector to the linked-list.
    
    Example:
    0, 1, 8, 4, 1     will get sorted to:
    0, 1, 1, 4, 8                                                           */
template <typename Container>
void linearInsertion(const NumbersInVector &randoms, Container &container)
{
 std::for_each(randoms.begin(), randoms.end(),
 [&](const Number &n)
 {
    auto itr = container.begin();
    for (; itr!= container.end(); ++itr)
    {
       if ((*itr) >= n) { break; }
    }
    container.insert(itr, n); // sorted insert
 });
}
 
/* Measure time in milliseconds for linear insert in a std container */
template <typename Container>
TimeValue linearInsertPerformance(const NumbersInVector &randoms, Container &container)
{
 g2::StopWatch watch;
 linearInsertion(std::cref(randoms), container);
 auto time = watch.elapsedMs().count();
 return time;
}
  

The time tracking (StopWatch that I wrote about here) is easy with C++11... Now all that is needed is the creation of the random values and keeping track of the measured time for a few sets of random numbers. We measure this from short sets of 10 numbers all the way up to 500,000. This will give a nice perspective

 
void listVsVectorLinearPerformance(size_t nbr_of_randoms)
{
  std::uniform_int_distribution distribution(0, nbr_of_randoms);
  std::mt19937 engine((unsigned int)time(0)); // Mersenne twister MT19937
  auto generator = std::bind(distribution, engine);
  NumbersInVector     values(nbr_of_randoms);
 
  // Generate n random values and store them
  std::for_each(values.begin(), values.end(), [&](Number &n)
                { n = generator();});
 
  NumbersInList      list;
  TimeValue list_time = linearInsertPerformance(values, list);
  NumbersInVector    vector;
  TimeValue vector_time = linearInsertPerformance(values, vector);
 
  std::cout << list_time << ", " << vector_time << std::endl;
}
  

Calling this for values going all the way up to 500,000 gives us enough values to plot them in a graph and which shows exactly what Bjarne Stroustrup told us at the Going Native 2012. For removal of items we get a similar curve but that is not as time intensive. I have google doc collected some timing results that I made on IdeOne.com and on a x64 Windows7 and Ubuntu Linux laptop for you to see.

Can you see the red fuzz at the bottom? The graph is not displayed incorrectly. That red fuzz is the time measurements for vector. 500,000 sorted insertions in a linked-list took some 1 hour and 47 minutes. The same number of elements for the vector takes 1 minute and 18 seconds.

You can test this yourself. The code is attached with this article. For a quick test you can try out the smaller (max 15 seconds) test of up to 40.000 elements at the pastebin and online compiler IdeOne: http://ideone.com/62Emz

 

Locality of Reference II

The processor and cache architecture has a lot of influence on the result obviously. On a Linux OS the cache level information can be retrieved by 

grep . /sys/devices/system/cpu/cpu*/cache/index*/*  

On my x64 quad-core laptop that gives each core-pair 2x L1 cache a 32 KBytes and with a cache line size of 64 bytes. 2x L2 cache a 256 KBytes. All cores share a 3 MBytes L3 cache.

Conceptual drawing of quad core Intel i5

Conceptual drawing of quad core Intel i5

I thought it would be interesting to compare my quad-core against a simpler and older computer. Using the CPU-Z freeware software I got from my old dual-core Windows7 PC that it has 2xL1 caches a 32KBytes and 1xL2 cache a 2048 KBytes. Testing both with a smaller sets of values shows the cache significance clearly (disregarding different bus bandwidth and CPU Speed )

x86 old dual-core PC

x64 quad-core Intel i5

 

Devils Advocate

Of course there are always exceptions, special cases, and objections to either using vector or the advice of not using linked-list. If we can ignore the obvious non-related cases  and focus on just the POD or number crunching then there are still some valid concerns that should be addressed. 

Devil's Advocate I  - Cost of copying

For vector the traditionally perceived overhead for copying, shuffling and resizing (with memory allocations and copying) is sufficient reasons for some (ref: [2] to [7]) for choosing the linked-list over the vector.

Above was shown how the perceived inefficiency of the vector was in fact dirt cheap compared to the cost of traversing a linked-list. The linked-list item insertion is cheap but to get to the insertion place a very expansive traversal in disjoint memory must be done.

The examples above are somewhat extreme. The integer examples brings out another bad quality of linked-list. The linked-list need 3 * size of integer to represent one item when the vector only need 1. So for small data types it makes even less sense to use a linked-list.

However, what would happen if the cost of copying was not so cheap? If the cost for modifying the vector would increase?

For larger types you might see the same behavior as shown above but obviously the times would differ. As the types grow in size and becoming more expensive then finally the linked-list would outperform the vector which has to shuffle and copy a lot of objects. Using the C++ move operator would perhaps change this. I have not tested with move operator so that is up to you to try out for yourself. 

As PODs grow larger in size, the copying cost will also grow. The overhead in copying and shuffling will take its toll. The cache architecture and the cache's line size will also make an impact. If the POD size is larger than the cache-line size then the cache usage will be suboptimal compared to when used for smaller POD size.

At some POD size this extra overhead will even be more time consuming then the linked-list slow traversal. At this point choosing to use a vector is a worse choice then choosing the linked-list, even for the scenario used earlier in this article.

This is shown in the three graphs below. The tests were done for 200, 4000 and 10.000 elements. Each graph represents a fixed number of elements, where the byte size of the PODs is increasing.

Observe the intersection between the linked-list and the vector performance. It is at the intersection that the performance goes from being in favor of vector to being in favor of linked-list for that number of elements

From the graphs it is obvious that for the same number of elements the POD size has virtually no impact on the linked-list performance. For vector the POD size is of massive importance. The cost of copying increase proportionally with the size.

Efficiency when using the linked-list is limited by the cost of RAM access and thus almost POD size ignorant. However as the number of items increase the total cost for RAM access will be more expensive and substantially larger PODs (bytes) are needed before linked-list will outperform the vector.

The data is collected in a google document and also in the download area in excel format. A quick look using ideone.com can be done at ideone.com/W9vpT and also available with the downloads.

Devil's Advocate II : Linear, sorted insert with smart linked-list 

A common objection to the linear, sorted insert comparison shown above is that since the linked-list linear is known to be slow the iterator to the last inserted location should be kept. That way on average the linear search could benefit from O(n/2) instead of O(n).

When testing this it showed what huge impact the cache architecture has on the performance on adjacent memory structures. The first test run was made at ideon.com/XprUU. The ideone online compiler has unknown cache architecture and since it is serving multiple runs simultaneously it is likely that the caches utilization is unpredictible and RAM usage is heavy.

With this likely cache under utilization the smart linked-list was clearly the winner. At least up to a certain number of elements. It took up to 6000 elements before the vector had caught up to the smart linked-list. This was a huge surprise to me! but it made sense when I tested it with a known cache architecture.

The same test was run on a known cache architecture, my x64 laptop, with L1, L2 and L3 cache. It immediately showed the benefits of cache utilization for adjacent memory structures. The smart linked-list and the vector kept even pace up to about 100 elements, at 500 elements the difference was around 20 percent and steadily increasing.

If you take a peek at its data sheet you can see the performance data both when running at ideone.com/XprUU and on the x64 laptop.

On other computers with different cache architectures and when using other languages (Java, C#, etc) this will give different results..

Devil's Advocate III : Fragmented memory makes vector a big no-no  

For fragmented memory, or systems where the memory can be suspected to be fragmented, or for huge vectors, then it might be beneficial to use list-of-vectors instead. There are a few different types to choose from: unrolled linked-lists [12] or the std::deque are two.

I have not tested the fragmented memory scenario as I simply could not come up with a way to force the memory to be easily fragmented. On 32-bit dual-core PC and 64-bit quad-core PC I never once encountered allocation exception, even for multiples of the 500.000 items comparison testing.    

Devil's Advocate IV : Algorithms with merge or sort 

Yes, what about it? The online resources pointed out earlier are stating that merge or sort of the containers is where the linked-list will outperform the vector. I am sorry to say but this is not the case for a computer with a modern cache architecture. At least not in the area of number crunching, and is it not there that sort are most commonly used?

Once again these resources are giving you bad advice.

From a mathematical perspective YES it does makes sense when calculating big-O complexity. The problem is that the mathematical models (at least the ones I have seen) are flawed. At least flawed in that aspect that making a decision for what to use from the big-O complexity value is just not good enough.

Computer cache, RAM and memory architecture are completely disregarded and only the mathematical, SIMPLE, complexity is regarded.

To show this some simple sort testing is laid out below.

void listVsVectorSort(size_t nbr_of_randoms)
{
  std::uniform_int_distribution<int> distribution(0, nbr_of_randoms);
  std::mt19937 engine((unsigned int)time(0)); // Mersenne twister MT19937
  auto generator = std::bind(distribution, engine);
  NumbersInVector  vector(nbr_of_randoms);
  NumbersInList list;
 
  std::for_each(vector.begin(), vector.end(), [&](Number& n) 
                                              { n = generator(); list.push_back(n); }    );
 
  TimeValue list_time;
  {  // list measure sort
    g2::StopWatch watch;
    list.sort();
    list_time = watch.elapsedUs().count();
  }
 
    TimeValue vector_time;    
  {  // vector measure sort
    g2::StopWatch watch;
    std::sort(vector.begin(), vector.end());
    vector_time = watch.elapsedUs().count();
  }
 
  std::cout <<  nbr_of_randoms << "\t\t, " << list_time << "\t\t, " << vector_time << std::endl;
} 

The sort testing I made is available on the google spreadsheet on tab “sort comparison“. I used std::sort (vector) vs std::list::sort.

Sure, it is two different algorithms who likely use two different algorithms so this test might be a leap of faith. Either way it can be interesting since both algorithms are at their best and this is where the linked-list supposedly should be the winner. No big surprise that it was not.

The std::sort (vector) beats the std::list::sort hands-down. The complete code is attached and available here http://ideone.com/tLUeK.

The graph below show the sort performance from 10 up to 1.000 elements. The cost is always higher for the linked-list. Both show a proportional cost increase as the number of elements increase but the linked-list has a steeper incline. The linked-list sorting will compared to the vector be obviously more expensive per item as the elements increase.

Testing this on a modern x64 computer with better CPU-cache architecture would increase the difference between std::list and vector even more. 

As the number of elements continue to grow this initial performance perspective does not change. The graph below shows from 10 to 100.000 elements but has virtually the same proportional properties as when compared 10 to 1.000.

 The corresponding sort test but for merge was not tested. However,  for merge of two sorted structures, that after the merge should still be sorted, then that involves traversal again and once again the linked-list is no good.  

 

Devil's Advocate V : Insert at first position 

  A reader wrote in the comments below:
“[If] Your insertion is always done [at] the index 0 or somwhere in the middle. No matter how large list grows the operation perfomed in constant time O(1). With vectors it is O(log(n)). The larger the array the slower the insertion.” 

Another reader wrote in my blog
“[A] very common example when adding elements to the beginning a list. Using a LinkedList this would be a fairly simple operation (just swapping some pointers) while the ArrayList  [Java version of dynamic array] would struggle to make such insertion (moving the entire array)” 

Copy and pasting from my own article comment answer to the first reader:

“What you wrote above is completely accurate. You are not the first to mention this and while it is so right it is also so wrong. Let me explain. 

The scenario you outlined is very believable. In fact putting one piece of data in front of an older piece of data and so forth is perhaps the most common task for data collecting?

In this scenario it makes sense to do insert at “index 0” for the linked-list. This does not makes sense for the vector. 

The default insertion on vector should be done by push_back. Insertion in the direction of growth. An obvious choice to avoid shifting the whole previous section. This is made abundantly clear in C++ where the std::vector[^] have push_back, and insert but lacks completely the function push_front

So, let us not be silly [naive] here, right? Doing push_back on the vector instead and comparing that to insert at index 0 on the linked-list that is probably the valid options a programmer would choose from. Don’t you agree?

push_back on the vector does not need to shuffle everything for every new element. The overhead when the vector is full and need to be re-sized still applies.“

A code and performance comparison between these three insertions can be seen at http://ideone.com/DDEJF. It compares between the common linked-list operation : insert at “index 0” vs std::vector push_back vs a naive std::vectorpush_front ".  

 

From: http://ideone.com/DDEJF. Times are given in microseconds [us]
elements, list, vector, vector (naive)
10,       3,     6,     1 
100,      7,     3,     13 
500,      43,    7,     83 
1000,     70,    10,    285 
2000,     149,   27,    986 
10000,    928,   159,   22613 
20000,    1578,  284,   97330 
40000,    3138,  582,   553636 
100000,   7802,  1179,  3555756  
Exiting test,. the whole measuring took 4266 milliseconds (4seconds or 0 minutes) 

What to Do Now? 

Donald Knuth made the following statement regarding optimization: 

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil” .

Meaning that “Premature Optimization” is when the software developer thinks he is doing performance improvements to the code without knowing for a fact that it is needed or that it will greatly improve performance. The changes made leads to a design that is not as clean as before, incorrect or hard to read. The code is overly complicated or obfuscated. This is clearly an anti-pattern.

On the other hand we have the inverse of the extreme Premature Optimization anti-pattern, this is called Premature Pessimization. Premature pessimization is when the programmer chose techniques that are known to have worse performance than other set of techniques. Usually this is done out of habit or just because.

Example 1: Always copy arguments instead of using const reference.

Example 2: Using the post increment ++ operator instead of pre increment ++. I.e value++ instead of ++value.

The cost for incrementing native types pre- or post-increment is inconsequential,but for class types the performance difference can have an impact. So using value++ always is a bad habit that when used for another set of types actually is degrading performance. For this reason it is always better to just do pre increments. The code is just as clear as post increments and performance will never be worse

Using linked-list in a scenario when it has little performance benefit (say 10 elements only touched rarely) is still worse compared to using the vector. It is a bad habit that you should just stay away from.

Why use a bad performance container when other options are available? Options that are just as clean and easy to work with and have no impact on code clarity.

Stay away from Premature Optimization but be sure to not fall into the trap of Premature Pessimization! You should follow the best (optimal) solution, and, of course, common sense without trying to optimize and obfuscate the code and design. Sub-optimal techniques such as linked-list should be avoided since there are clearly better alternatives out there.

Java and C#

One comment got when this article was published was something in the line of

This might be true for C++ but for Java or C# this is not so important.
Those languages are not chosen for speed but for faster development time [etc, etc
].  

This might be true but actually since the speed aspect will likely hit Java and C# just as fast and hard as C++ code it might still be important. 

I got help from a friendly reader for a Java try. After some changes proposed by other readers (ref  comments) I had a similar linear search, then insert example at  http://ideone.com/JOJ05.

In Java the C++ std::list equivalent is called LinkedList. The C++ dynamic array std::vector is in Java called ArrayList.  There is one big difference between these libraries, both the Java LinkedList and ArrayList can only contain objects. This means that the boost of adjacent memory will be somewhat less for an ArrayList containing Integer objects.  To access and compare values for the search there is one extra indirection, from reference to value.

To see what difference that made for ArrayList I threw in a DynamicIntArray to get a truly cache friendly structure to compare with. The DynamicIntArray use int, not Integer objects.

 

 For Java the LinkedList showed immediately that it was performing worse than the ArrayList. For the test on ideone.com (http://ideone.com/JOJ05) the LinkedList was performing from a factor of 1.5 to a factor of 3 slower then the best array container.

On my x64 ultrabook the cache architecture is different and the cache friendly structures got a nice boost. The LinkedList was then a factor  1.5 to factor 15 slower then the DynamicIntArray for the same range of elements.

Integer object slow down for ArrayList
On ideone.com the extra indirection overhead of using Integer objects instead of int was roughly a 30% slow down for ArrayList in comparison to the faster DynamicIntArray.  On my x64 laptop the ArrayList took roughly twice (200%) as long time to finish as the DynamicIntArray (still faster than at ideone.com)

Much more Java specific testing was presented at my  blog but will not fit here. If you would like to read more about it (filtering, sorting, big-O discussion) you can read about it here.

Conclusion

This article is not to discriminate the linked-list for all intents and purposes. It does show however the shortcomings of the linked-list in the area of number crunching.

Traditionally, yes even today, books and online resources almost always completely disregard computer architecture and how this impacts with the concept of locality of reference. These resources calculate algorithm big-O efficiency and makes recommendations that are from just plain wrong as there are huge difference between RAM intense and cache friendly structures. In reality the cache architecture has an enormous impact. 

Above I showed some silly usage of C++ std::vector. Instead of using the fast [] operator or some kind of tree structure the scenarious played out equal to not discredit the std::list unnecessarily, in fact some were made to favor the linked-list. It showed that for small, cheap to copy data, raw numbers, or POD an adjacent memory structure is to prefer to a disjoint memory structure. The perceived overhead for new allocations, copying, shuffling that goes with a dynamic array like the Vector is next-to-nothing compared to the very, very slow RAM fetching inolved whenever accessing elements in a linked-list.

When it comes to number crunching then the vector or other adjacent memory structure should be the default and not the linked-list. Of course every new situation needs to be analyzed and data structure and algorithm chosen appropriately. Apply the knowledge you got reading this wisely, only use the linked-list if it will give benefits not easily retrieved from an array-based structure.

Remember that the performance issues with linked-list is usually not the main performance issues for your software. But why using a clearly sub-optimal data structure when the dynamic vector can just as easily be used? Code readability is the same and using linked-list for number crunching is to me an obvious case of premature pessimization.  

When we are in the business of number crunching why not use your understanding of the linked-list disadvantages and stay away from the linked-list for the time being? At least until you are sure you need it. Your code will be better from it and performance will be the same or better, sometimes even much better.

Feedback
I recommend to check out the comments below, maybe leaving one yourself. There are lots of opinions and even some interesting exceptions to my advice and in general very smart comments by equal smart coders. Thanks to them this article got a makeover when I addressed their concerns.

Last, a little plea. Please, before I get flamed to pieces for this obstinate article, why not run some sample code on ideone.com or similar, prove me wrong and educate me. I live to learn =)

Oh, and yes. I know, thank you for pointing it out. There are other containers out there, but this time I focused on linked-list vs vector 

References

  1. Keynote by Bjarne Stroustrup, GoingNative 2012
  2. cplusplus.com std::list reference
  3. cplusplus.com std::vector reference
  4. Wikipedia on sequence containers 
  5. Wiki.Answers: Difference between [Linked-] List and Vector
  6. WikiBooks: Advantages / Disadvantages [List vs Vector]
  7. Wikipedia: Linked-List vs Dynamic Array
  8. Wikipedia: On optimization and Donald Knuth
  9. CodeProject: Pre-allocated Arrayed List, by Vinod Vijayan 
  10. Blog response by George Aristy Performance comparison: Linked-List vs. Vector [Java] 
  11. Future Chips: Should you ever use Linked-Lists?
  12. MSDN blogs: Unrolled linked lists
  13. KjellKod.Wordpress Initial blog version of this article:  now mirrored
  14. KjellKod.Wordpress Java follow up after improvement suggestions from readers

History 

  • 05/03/2012 Re-published from blog-article http://kjellkod.wordpress.com/2012/02/25/why-you-should-never-ever-ever-use-linked-list-in-your-code-again/
  • 06/03/2012 Title update & initial problem scenario clearified... face & fog-lift in process to clear some confusion and better prove the point.
  • 19/04/2012 Long overdue makeover. Mea Culpa, Introduction, Disclaimer, Conclusions and general "fog lift". This should remove many earlier misunderstandings... Code examples were added and updated as well. 
  • 18-21/08/2012 "Devil's Advocate V" + Java section update and minor changes to layout and text. Java was updated due to improvement suggestions and "blog reply" from KjellKod.WordPress (Java Battle Royal

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

About the Author

KjellKod.cc
Software Developer (Senior) Looking for my next Software Challenge
United States United States
Member
Kjell Hedström is currently looking for his next software challenge!
 
As a recent immigrant to America and brand new Bostonian he is looking for his next software challenge, a position at a company that share his passion for developing cutting edge products that make a difference.
 
Kjell is a driven developer with a passion for software development. His focus is on coding, boosting software development, improving development efficiency and turning bad projects into good projects.
 
Kjell was twice national champion in WUKO and semi-contact Karate and now he focuses his kime towards software engineering, everyday challenges and moose hunting while not being (almost constantly) amazed by his daughter and sons.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 2memberJoe Woodbury30 Mar '13 - 8:46 
Making a performance argument while using STL is absurd. In all seriousness, if high performance is required, you will almost always be better off writing your own custom collection.
 
Almost any template-based linked list is particularly horrible.
 
Rather than ramble dogmatically, it would have been better to explore the impact of the CPU cache, paging and so forth AND various algorithms to leverage this, almost none of which will involve STL (except, perhaps, at the periphery.)
 
Instead, you manufactured a data scenario to fit your preconceived notion that linked lists are bad, vectors good and then threw out cliches to support your position, including quoting Knuth.
 
(And, BTW, memory fragmentation and allocation exceptions are two entirely different things.)
GeneralRe: My vote of 2memberKjellKod.cc30 Mar '13 - 10:02 
Hi Joe and thank you for your vote and especially your comment.
 
Yes, this article is a little silly and it does simplify things, maybe too much?
 
I do not think linked-list are bad, far from it. I certainly state this several times in the article.
 

The purpose of the article is to make coders aware that disjoint memory structures might come at a price compared to if using a contiguous memory structure. To show this it was the easiest to use std::list and vector in C++ and in Java LinkedList and ArrayList since these structures are widely used and recognized.
 
A few common scenarios were picked, even scenarios where many coders commonly think that the linked-list would outperform the contiguous dynamic array structure.
 
Judging from the comments the difference in performance surprised many readers so from their reaction I am happy that I could educate them.
 
It was not the purpose of this article to show what data structures and algorithm were best suited for different tasks, only to make coders aware of cache efficient vs non cache efficient data structures.
 
It is a great suggestion to showcase various algorithms (with different data structures) and how they can have an impact. Maybe this will be material for another article.
 
Thank you for your input.
Cheers
Kjell
GeneralRe: My vote of 2memberJoe Woodbury30 Mar '13 - 12:08 
I don't think it's silly; I think it misses the point. If you want to talk about the performance implications of the CPU caches and demand paging, do so, but one you introduce STL or any generic algorithm, you implicitly state that performance is secondary. This is compounded by creating test conditions which favor vectors. Even assuming you are willing to trade performance for writing code faster, the problem still dictates the solution.
 
(Incidentally, I've never seen anyone use std::list. For me personally, the reason is simple; it's a terrible class. On the other hand, terrible apparently isn't everything since I recently saw someone use a dictionary where a static array would have sufficed.)
 
Finally:
 
KjellKod.cc wrote:
I do not think linked-list are bad, far from it

 
"Why you should never, ever, EVER use linked-list in your code again."
GeneralMy vote of 4memberpeterchen30 Oct '12 - 22:56 
Kudos for exploring your problem in that detail. The tabloid title wasn't necessary, though.
GeneralRe: My vote of 4memberKjellKod.cc30 Oct '12 - 23:24 
Thank you
QuestionInitial ConditionsmemberRobert Ludwig27 Aug '12 - 10:03 
What I didn't see in the article was the initial conditions in the test. If you know in advance the size of the array needed to contain the data, then certainly I can pre-allocate the array and manipulate it more quickly than a linked list. But if you don't know the amount of data you will be working with in advance (almost universally the case) then you will be forced to either:
 
1) Over-allocate the array in the hopes of being right, or
2) Reallocate and rebuild the array when it fills, leading to fragmented storage.
 
I must admit that my need to "number crunch" (sum?, average?, mean?, median?, mode?, linear regression?) has been limited in my career. More often than not I'm relating a table to other data in storage to spare me the overhead of SQL. That, typically, involves searching possibly large amounts of entries - a task that a linked list is poor at.
 
One technique that I have used with reasonable efficiency to handle significant amounts of data arriving in random sequence, of unknown quantity is to accumulate the data in a linked list, then construct a sorted list of pointers to the entries once the size of the list has been determined. It saves me the problem of overallocation, reallocation and fragmentation, while allowing me to search and manipulate the data with the efficiency of an array.
AnswerRe: Initial ConditionsmemberKjellKod.cc28 Aug '12 - 0:42 
Hi Robert.
 
Quick replies:
1) The arrays are never preallocated in the testing above. So all the overhead for reallocating, copying over contents when the array is increasing in size is seen in the graphs.
 
2) std::vector does this (allocate, copy, etc) *automatically* so your 1+2 are not so important to worry about, unless you are on a system where memory is very expensive (embedded) these are non-issues.
 
3) Fragmentet storage? Eh? That is not how std::vector, ArrayList or similar structures work. Are you thinking of some specific or handmade datastructure?
 

Robert Ludwig wrote:
One technique that I have used with reasonable efficiency to handle significant amounts of data arriving in random sequence, of unknown quantity is to accumulate the data in a linked list, then construct a sorted list of pointers to the entries once the size of the list has been determined. It saves me the problem of overallocation, reallocation and fragmentation, while allowing me to search and manipulate the data with the efficiency of an array.

I am glad to hear that it worked out great for you. Linked-List is of course a structure you can use exactly for this. I am not saying it is more efficient one-or-the-other : it all depends on the "search and manipulate data" content.
 
* This fragmentation issue sounds very strange to me. elaborate?
* Search is normally faster on a contiguious memory : list is definitely slower
* Sort is normally faster on contigous memory: list is normally slower - algorithm for this is of course a factor.
 
* Manipulation of data *can* be faster on linked-list. It all depends on what type of manipulation. The search time however is likely to remove any chance of linked-list being faster then the contiguous data structure.
---- Please note that I do not write array or vector : It could well be that a binary heap (or something else) is more of what would be suitable.
---- If it is time critical your operations I would suggest you to try out what difference it makes using contigous memory structures instead of disjoint memory structure like the linked-list
 
Thank you for reading the article and commenting!
Cheers
Kjell
 
PS: Dont' forget to reply regarding the *fragmentation*. I am curious.
GeneralRe: Initial ConditionsmemberRobert Ludwig28 Aug '12 - 6:47 
Thank you for your clarification on the management of the storage techniques. It does make for a fairer comparison. I agree that the overhead of allocating individual nodes for each entry will be greater than allocating a block of nodes and manipulating the storage.
 
As for the fragmentation consider that, even in a virtual memory environment, storage is allocated sequentially logically if not physically. Each time the array / vector has to grow, the system / program must first allocate a new array, copy the old one to the new one, then free the old one. Regardless as to how you size the new allocation, you will eventually end up with a dead zone of storage which, while both free and large, is not a large enough contiguous space to contain the new array. No matter how much memory you have on your system, given a sufficiently large enough collection of data, the array will fail long before the linked list does.
GeneralRe: Initial ConditionsmemberKjellKod.cc28 Aug '12 - 11:37 
Good. Then I know what you meant. The "fragmented storage" threw me off since I did not know if you meant RAM or fragmented within the data-structure.
 
OK. Before I address that. The overhead of memory allocations is a common *cost* that usually can just be ignored. Another reader asked me to remove the allocation time from linked-list as he was sure this would show a different picture. It did not. Not surprising since it is the frequent cache misses and cache lading from RAM that goes with traversal of disjoint memory that is so much more time consuming.
 
You can read his question here. And you can read my answer here [^]. As you see if you compare the timing, there is almost no difference in time between linked-list [LL] with/without allocations. So rest assured, performance issues from allocation overhead must be extremely rare.
 

 
Robert Ludwig wrote:
Regardless as to how you size the new allocation, you will eventually end up with a dead zone of storage which, while both free and large, is not a large enough contiguous space to contain the new array. No matter how much memory you have on your system, given a sufficiently large enough collection of data, the array will fail long before the linked list does.

True that! Depending on your system and application this could be a real issue. But unless on very limited OS and embedded systems it is very unlikely, more of an academic issue then a real issue. At least from what I have understood when I tested this (hours and hours without managing to get bad alloc due to fragmentation).
 
If it is an issue or the structures are so humongous that it could be an issue (on x86-x64 we're probably talking many millions of integers) then you can use the "unrolled list" or std::deque : basically a list of dynamic arrays. Almost all of the cache benefit from the dynamic array and minimum risk for the fragmentation *danger*.
 
Se this section of my article for more information
GeneralMy vote of 5memberwind surf27 Aug '12 - 8:39 
Quite consistent with my observations.
GeneralRe: My vote of 5memberKjellKod.cc28 Aug '12 - 0:21 
Good to hear Smile | :)
and thank you for your vote
GeneralMy vote of 5memberbssrini20 Aug '12 - 17:25 
An eye opener
GeneralRe: My vote of 5memberKjellKod.cc21 Aug '12 - 9:39 
Thank you. It was my pleasure
GeneralMy vote of 5memberpasztorpisti20 Aug '12 - 6:56 
The article says the simple truth. Performance critical solutions often have to employ a linked-listness solution or a special linked list using suitable data structure inside.
GeneralRe: My vote of 5memberKjellKod.cc21 Aug '12 - 9:40 
It is late night so maybe that is why? but I am not exactly sure what you refer to ...
 
Either way I am glad you liked my article =)
GeneralRe: My vote of 5memberpasztorpisti21 Aug '12 - 10:49 
I refer for example to gamez among the others, and cache/prefetch friendliness.
GeneralMy vote of 5memberHal Angseesing21 May '12 - 6:11 
I agree with your main conclusion that the modelling of the algorithms doesn't take into account the complexities of modern hardware.
What is really interesting is that there are a whole bunch of modern engineers who would recognise this problem - people who write 3D rendering engines. Cache misses, when you are talking about polygon vertices on hundreds of models can have a huge impact. There are a number of tools, within this specialism designed because of the lessons learnt.
Obviously all your caveats are fine and I do also like your obstinate writing style for this article. Smile | :)
GeneralRe: My vote of 5memberKjellKod.cc22 May '12 - 4:05 
Hal Angseesing wrote:
I do also like your obstinate writing style for this article. Smile | :)

Yeah. Sometimes obstinate makes more fun Poke tongue | ;-P
 
Thank you for your vote and your comment! The 3D rendering engineering is very interesting field. Hopefully these pros will continue to make a positive impact on online/books programming resources.
QuestionColoursmember.dan.g.20 May '12 - 19:41 
Great article, but your use of colours for array vs linked-list was not consistent throughout the article which made it more difficult to follow.
.dan.g.
 
AbstractSpoon Software
Facebook
email: abstractspoon2_at_optusnet_dot_com_dot_au


AnswerRe: ColoursmemberKjellKod.cc20 May '12 - 20:45 
Hi @.dan.g
 
I tried to be consistent that when mentioning specific library implementations like std::vector or std::list which then followed the CodeProject tags. I did not follow the code tagging when the article described vector or linked-list in general terms.
 
I am sure I messed up this along the road on more then a few places. Maybe as you commented it is better to follow the code highlighting for every time list, dynamic array or vector is mentioned?
 
Thank you @.dan.g. I will make sure to keep this in mind when/if I update the article.
QuestionAnother test you can domemberwaleri17 May '12 - 1:10 
You measure cache access but yet you tend to forget the fact, that std::vector preallocates a *lot* of memory! This perhaps is compiler dependent, for Microsoft it is 50% of the vector size - that'a a lot of overhead. Now for test to be fair you should exclude memory allocation from the equation, as it is an expensive operation. Simplest way to do so is to preallocate all list elements in a second list, then simply pull noedes from it and insert into "main" list. I am more than certain, that timing will be *completly* different then! When memory pool is empty you should allocate another 50% elements, but this timing should be excluded from overall performance.
 
Another approach would be to preallocate a list *and* the vector thus eliminating memory allocation per se from the measurement.
 
Yet I agree, that in perfect conditions linear iteration of an array would be a fraction faster, since "add reg,const" is faster than "mov reg,[reg]", but only just.
AnswerRe: Another test you can do [modified]memberKjellKod.cc18 May '12 - 3:20 
Good to see you again @waleri
 
You have an eye for details and are correct in that the article should show a test with a pre-allocated linked-list.
 
It was partly discussed in a previous comment[^] but I decided to cut if from the scope of the "Devils Advocate" special cases[^]. The reason for this was that Bjarne Stroustrup shows exactly this example in his keynote at Going Native 2012[^], see from time 01:44 in the video-cast and slide 43 (pdf)[^].
 
You are the third person who comments on this or related topics so it definitely merits a place in the article. I will put it immediately in my TODO list[^], for if/when this article is updated.
 

Your comment above is from a slightly different point of view. It focuses not on linked-list in pre-allocated *contiguous* space but on the overhead of individual memory allocations vs the block-sized allocations done for vector. This is an important issue as memory allocation is often brought up to be a costly factor.
 

 
In the meantime let me break down your comment in a couple of sub-comments and try to answer them.
waleri wrote:
[...] for test to be fair you should exclude memory allocation from the equation, as it is an expensive operation

System, OS, even hardware makes allocation cost vary from one machine to another. It is interesting to find out how expensive memory allocation is. Especially in comparison to the cost for traversing the linked-list (disjoint memory) vs traversing vector (adjacent memory)
 
I followed your advice to use a pre-allocated list as a memory pool, grabbing the items as needed and inserting them into the *main list*. This way the time for node memory allocation is removed from the timing.
 
To get an appreciation of how this is compared to earlier tests. The same random values for insertion are also used for a non pre-allocated linked-list and a std::vector. The code is available here http://ideone.com/4hetC[^]
 

Pre-allocated linked-list vs linked-list vs vector.
Timing is made in microseconds (us). Testing was done on ideone.com, ubuntu x64 quad-core and windows7 quad-core computer. See conclusion in the end
http://ideone.com/4hetC
#items, LL (us), LL pre-allocated (us) , vector (us)
10,     2,       1,       3
100,    11,      7,       13
500,    118,     112,     116
1000,   412,     456,     396
2000,   1471,    1780,    1461
10000,  95107,   98522,   33494
20000,  588622,  603722,  138968
40000,  3434945, 3556232, 626431
 
Windows7, Visual Studio 2011 Beta
#items, LL (us), LL pre-allocated (us) , vector (us)
10,      0,        0,        0
100,     0,        0,        0
500,     1000,     0,        0
1000,    1000,     1000,     1000
2000,    5000,     5000,     3000
10000,   123012,   103010,   25002
20000,   810081,   799079,   110011
40000,   5188518,  5173517,  450045
100000,  43447344, 44180417, 2874287
 

Ubuntu Linux x64
#items, LL (us), LL pre-allocated (us) , vector (us)
10,      4,        2,        10
100,     21,       14,       19
500,     308,      266,      176
1000,    663,      404,      225
2000,    2500,     2382,     858
10000,   138137,   139279,   18801
20000,   1059487,  1012656,  78463
40000,   5704195,  5666202,  318634
100000,  55417680, 57198783, 1995051
 
Conclusions
Graphs of the result is available at a google doc[^]. The linked-list vs the pre-allocated linked list are both in disjoint memory. The pre-allocated does not *suffer* from the extra overhead of memory allocations when inserting a new node, however using the memory pool causes one extra pointer indirection traversal when using splice for node insertion.
 
Disjoint memory access is more expensive then allocations : ergo any potential gains for the avoided memory allocation is thereby lost.
 
In comparison the traversal cost for visiting node-to-node in disjoint memory is overwhelming compared to any potential gains (or non-gains) for using the memory pool.
 
However, a memory pool in *adjacent* memory will give better result since at least some nodes can benefit from that. Cache misses will be less. This is not shown here but shown in Bjarne Sutters keynote speech (see above). This potential gain is also mititaged as the linked-list will eventually become fragmented[^].
 
waleri wrote:
linear iteration of an array would be a fraction faster, since "add reg,const" is faster than "mov reg,[reg]", but only just.

I hope this article and my answer above has shown that the difference is not fractions faster but in fact the difference is colossal . What matters is how often RAM access is needed for traversal. Disjoint memory access, with constant cache misses, overshadows any possible gains with avoiding individual memory allocations.
 
This reply took some time @waleri. Thank you for bringing it up. When/if updating the article I will make sure to add your contribution to give extra clarity to how cost for memory allocations compare to memory traversal.

modified 18 May '12 - 9:30.

GeneralGood pointsmvpEspen Harlinn16 May '12 - 10:18 
Good points Kjell.
 
Most number crunching libraries, like blas and blitz++, use some kind of array internally - even for sparse vectors and matrices.

GeneralRe: Good pointsmemberKjellKod.cc17 May '12 - 1:34 
Thank you Espen
QuestionRed/black treememberMember 869768715 May '12 - 7:51 
After reading the Scope section, it seems obvious to me that a red/black search tree is the right choice of data structure, but this is never mentioned in the article.
 
Maintaining the sorted sequence and being able to insert and delete elements efficiently - that is exactly what the red/black search tree (implemented in C++ as std::set) implements.
 
In theory, insertion and deletion takes time O(lg n), and reporting the sorted sequence takes O(n) (which is optimal - O(1) per element). In practice, the red/black tree works just as well and easily outperforms the sorted array, since we completely avoid having to copy the entire array all the time.
AnswerRe: Red/black tree [modified]memberKjellKod.cc15 May '12 - 12:54 
True that @"Member 8697687".
 
Worth mentioning perhaps is that in the Conclusions[^] I state that trees were avoided. The reason for avoiding trees was that I wanted to make a point regarding disjoint memory linked-list vs adjacent memory structures.
 
The article was never about showing the ultimate data structure for the scope. Only to show that one particular data structure was to be avoided.
 
It is interesting that you mention the red/black search tree. The first time I encountered the problems with linked-list vs adjacent memory structure we ended up switching out the linked-list for a binary search tree with a c-type wrapped array in its core.
 
The binary tree with its wrapped array ended up being lots, and lots faster then the linked-list with much less cache misses. So in a way, it was that binary tree experience that gave me the inspiration for this article

modified 16 May '12 - 0:16.

QuestionMemory PoolingmemberMehuge25 Apr '12 - 0:10 
I wonder how lists vs vector would compare if using a memory allocator that supports memory pooling, which is a technique for keeping related memory allocations local to each other, so in effect your list data would be stored in local memory just like a vector.
AnswerRe: Memory PoolingmemberKjellKod.cc25 Apr '12 - 0:26 
Hi Mehuge,
 
You can take a look at the slides[^] from Bjarne Stroustrups presentation.
 
If I remember correctly he brings up exactly that scenario!

It will be more effective than not doing it. After a while of using the data structure with inserts, removal, maybe sort and so on the effect will be similar as the "normal" disjoint memory Linked-List (L-L). I.e. due to "fragmentation" between the items in the list itself it cannot possibly keep the pieces in adjacent memory.
 
My conclusion: Better than "normal" linked-list but still a poor choice for "number crunching",. unless there are some special reasons of course to use the Linked-list.
GeneralRe: Memory PoolingmemberMehuge25 Apr '12 - 0:53 
Ah ofc, when the list gets large enough, because your jumping all over the list, adjacent element references are no longer memory local to each other simple because they could be at opposite ends of the memory pool.
QuestionYup, that's basically right. [modified]memberQwertie24 Apr '12 - 20:20 
I'm the author of a mapping package and I spent around 6 months optimizing my code. Since I'm programming for an ARM chip with a mere 32 KB data cache, linked lists cause obvious performance problems. First, there's the obvious 8-byte per-node overhead for "next" and "prev" pointers. A linked list of 8-byte items would take 16 bytes instead. But not only that, my OS was Windows CE and it apparently has some overhead per heap block. Assuming 8 bytes of "bookkeeping" per heap block, 8-byte items actually would require 24 bytes of memory.
 
The 32 KB cache is not nearly big enough for my application, so it would be unacceptable to double the size of much of my data. And even though a vector can waste up to 1/2 of its space, the unused space does not tend to occupy space in the CPU cache (anyway, although the maximum waste is 99.9%, the average waste is 50%).
 
I actually wrote my own memory manager to handle small heap blocks efficiently, so a 16 byte block only uses 16.1 bytes, and the speed of my custom Alloc and Free is much faster than the OS routines. Even so, some work is still required to allocate and free every heap block. vector<T> therefore tends to be faster because it avoids the cost of Alloc and Free when adding/removing single items. Yes, when size() exceeds capacity() it must copy all elements to a new memory area, but like the author says, this is a relatively fast operation that isn't done very frequently. Adding an item to a linked list is certainly faster, but allocation work much be done for every single insert.
 
For some reason the Microsoft vector<T> is 24 bytes empty. My app uses a large number of small vectors, so I wrote a mini_vector<T,N=2> class. This vector uses only 8 bytes for bookkeeping, and it holds all elements inside the vector if size()<=N. So mini_vector<double,2> is 24 bytes in total if size()<=2 and this 24 bytes often fits in one cache line (but its memory consumption is identical to vector<double> if size() > 2). This design proved slightly faster than vector<T> in general. In particular, it's good for the performance of lists that are SOMETIMES large but are MORE OFTEN very short. Specifically, I wrote mapping software that deals with roads. Some roads are long and winding, but around half of all roads have exactly 2 points, so mini_vector<Point> is perfect.
 
If anyone is interested, mini_vector is included in my C++ benchmark (though it is not benchmarked directly).
 
I do use a linked list once in awhile, though. It is useful when your "next" and "previous" pointers are "intrusive", i.e. embedded in the object that is in the linked list. This configuration guarantees an item can be removed from the linked list in O(1) time because an item always "knows" its own location in the list. So linked lists are still useful if they hold large objects and if you rarely/never need to do a search (linear scan) to find an item. In my app, I use linked lists in this manner to manage an object cache; when an object is used, it is moved to the front of the list in O(1) time. But I use a linked list in one place, and almost everywhere else I use vector, mini_vector, or other data structures (hashtables, heaps). I don't hold large objects in a vector or hashtable, instead I use reference counting (e.g. Boost's intrusive_ptr).
 
As Kjell noted, it may make sense to put large objects in a linked list in C++, I just personally prefer refcounting + vector. In C#/Java, only intrusive linked lists ever make sense; standard-library linked lists (which store the next and previous pointers separately) are always inefficient because a separate heap object is allocated to hold the next pointer, the previous pointer and a pointer to the actual object (and don't forget the object header that all heap objects in C#/Java have).
 
I am not sure why I just wrote a short version of Kjell's article. Smile | :)

modified 25 Apr '12 - 10:11.

AnswerRe: Yup, that's basically right.memberKjellKod.cc24 Apr '12 - 20:46 
Thank you Qwertie, good to get som I.R.L experience and reflections, especially from one who is as seasoned coder as you are. Say, you have not planned to CodeProject publish this mini_vector library? It sounds very interesting for whenever I will be doing some embedded work in the future.
 
Cheers,
Kjell
 
PS: Almost unrelated but....
 
I really liked your C# vs C++ comparison[^] article. Awesome job on that one!
It was very interesting to read about Dimkumware's bad performance,. and nice to read someone else's tests for array vs list Wink | ;)
GeneralRe: Yup, that's basically right.memberQwertie25 Apr '12 - 4:19 
No, I haven't written about mini_vector before; it is merely an "implementation detail" in the benchmark. My software designs often seem to end up with a lot of small vectors so mini_vector is handy for that; and when I use .NET I sometimes optimize code using my own InternalList<T> value type (a struct version of List<T>).
Questiongreat article - my vote of 5memberGabrielWF23 Apr '12 - 4:46 
Hi,
Just to add that you are right: never take something as the "ultimate truth" before running some tests... Wink | ;-)
We all know what are the normal cases and when to use each container. The big problem is that the special cases are not easy to see and demand testing before saying that something is the best way of doing it.
great article in showing this!
 
Gabriel
Gabrielwf

AnswerRe: great article - my vote of 5memberKjellKod.cc23 Apr '12 - 5:27 
Thank you Gabriel for your vote and your comment. You are so right: testing, and confirming should come before believing. It was good to read that, I wish more coders would practice that I.R.L
Answer6 good reasons to always use linked lists [modified]membersteveb20 Apr '12 - 3:55 
1. When traversing collection from 0 to n is a requirement.
 
2. Random access to elements never required and performed.
 
3. You store address of the objects inserted in the list. Since the reallocation never happens, your stored address is never wiped out. This is not the case with vectors where during insertion, memory moved up, thus invalidating the address you stored, which happened be above insertion point. I know it is not good idea but sometimes no way around it.
 
4. Your insertion is always done and the index 0 or somwhere in the middle. No matter how large list grows the operation perfomed in constant time O(1). With vectors it is O(log(n)). The larger the array the slower the insertion.
 
5. You perform very large number of erases. For linked lists it is constant time O(1). Same operation on vectors(arrays) is exponentially slower O(log(n)). Again because vector required move (reallocate) memory back.
 
6. If you do huge number of inserts and erases in multiple adjacent arrays or vectors you will eventually fragment the crap out of available global heap. Not with the linked lists.
 
Good informative article.

modified 20 Apr '12 - 10:25.

AnswerRe: 4 good reasons to always use linked list list [modified]memberKjellKod.cc20 Apr '12 - 4:24 
Good objections there steveb. Some valid some not. I actually addressed some of them in this article I hope you did not miss them
 
steveb wrote:
1. When traversing collection from 0 to n is a requirement.

Not a valid objection. Here the linked-list is absolutely not to prefer. At least as long as we are in the domain of number crunching. This was proved and clearly stated and showed in the article. Plenty of code examples to prove it. Traversing std::list is extremely slow compared to traversing std::vector. Disjoint memory with RAM fetching is just too slow compared to adjacent memory structures. Cache L1-L3 rule over RAM.
 

steveb wrote:
2. Random access to elements never required and performed.

Good objection. This article is not about random access but number crunching which is where a linked-list is clearly unsuitable.

An example, common enough, but not with random access involved is to sum all elements in a collection. Try doing that in a collection of 10, 100, 1.000, 10.000 for list and for vector then draw your conclusions.
 

steveb wrote:
3. You store address of the objects inserted in the list [...]

Valid objection, and outside the scope of the article. And actually I wonder if not C++11 move operator could address this?
 
Just to be clear, this article only deals with linked-list in the area of raw numbers or POD or similar, not pointers, objects with pointers or references or expensive to copy objects/PODs.
 

steveb wrote:
4. Your insertion is always done and the index 0 or somwhere in the middle. No matter how large list grows the operation perfomed in constant time O(1). With vectors it is O(log(n)). The large the array the slower the insertion.

Definitely a valid objection! If this is the time critical point and you keep the insertion iterator then yes. However,. at some time you need to do something with the numbers. If that access operations involves iterating over the collection then maybe the list choice would not be as good as it first seemed?
 

steveb wrote:
And finally: if you still need fast random access to the list - first sort it, then use binary search algorithm instead of the linear traversal. The access time will be slightly worse than std::map.

For raw numbers, POD and such this is not a valid objection. You can do the same with a vector, have it sorted, use it binary search algorithms etc. This is so much faster doing it with adjacent memory then disjoint memory.
 
Did you see my "sort" section? Why not take the ideone.com code from it and tweak it yourself to try it out.
 

steveb wrote:
5. You perform very large number of erases. For linked lists it is constant time O(1). Same operation on vectors(arrays) is exponentially slower O(log(n)). Again because vector required move (reallocate) memory back.

For number crunching, not a valid objection. I actually tested this and erase is much faster on vector than linked-list. The Big-O notation you quote is from the same resources that are clearly wrong otherwise.
 

steveb wrote:
6. If you do huge number of inserts and erases in multiple adjacent arrays or vectors you will eventually fragment the crap out of available global heap. Not with the linked lists.

That is why people created unrolled lists or std::deque. Basically a list of vectors. So that fragmented memory would not be an issue. This was addressed in the article, did you miss it?

modified 20 Apr '12 - 11:33.

GeneralRe: 4 good reasons to always use linked list listmembersteveb20 Apr '12 - 4:27 
Thanks.
I actually was editing my reply while you answered. I removed some points which I think not important and added couple more I could think of from my past projects.
 
Great job anyway. You got my 5
GeneralRe: 4 good reasons to always use linked list listmemberKjellKod.cc20 Apr '12 - 4:30 
On you last. 5)Erase. I thought I hade posted that in the article but I forgot so it was good that you commented on that one
 
The "erase" test you can see at http://ideone.com/62Emz[^] go to the bottom of the page to see the timing results.
GeneralRe: 4 good reasons to always use linked list listmembersteveb20 Apr '12 - 4:37 
That is interesting. I used to know internal implementation of STL vector. They require like any C array continuous memory so the moves and reallocs happen every time something inserted, added or erased. Those API used to be inherently slow proportionally to the amount of memory.
 
I suspect that vectors continuous memory is being moved into internal CPU L1 cache. Usiually lists do not deserve such treatment because they are not continous. But even vectors request will be sooner or later rejected if it grown past the cache size.
GeneralRe: 4 good reasons to always use linked list list [modified]memberKjellKod.cc20 Apr '12 - 4:40 
updated: I had to change my response since I missed your last two lines
 
steveb wrote:
I suspect that vectors continuous memory is being moved into internal CPU L1 cache.
[...]
But even vectors request will be sooner or later rejected if it grown past the cache size.

 
From my understanding, and from what I tested it is no problem that the vectors continuous memory is larger than the L1 cache. The L1 cache will load what it can from memory. In a good scenario that memory will involve first loading from L2 and it in turn from L3 before involving RAM (which loads to L3, and so forth).
 
On my older PC with L1 cache of 32Kbyte I have tested with 500.000 integers without any problems. The performance was similar to the 1st picture in the article.

modified 20 Apr '12 - 11:27.

GeneralRe: 6 good reasons to always use linked listsmemberKjellKod.cc20 Apr '12 - 13:19 
steveb wrote:
4. Your insertion is always done and the index 0 [...] No matter how large list grows the operation perfomed in constant time O(1). With vectors it is O(log(n)). The larger the array the slower the insertion

 
I just had to add this little tidbit. I am really glad you brought up the insert at first position objection. This is a perfect opportunity to address that.
 
What you wrote above is completely accurate. You are not the first to mention this and while it is so right it is also so wrong. Let me explain.
 
The scenario you outlined is very believable. In fact putting one piece of data in front of an older piece of data and so forth is perhaps the most common task for data collecting?
 
In this scenario it makes sense to do insert at index 0 for the linked-list. This does not makes sense for the vector.
 
The default insertion on vector should be done by push_back. Insertion in the direction of growth. An obvious choice to avoid shifting the whole previous section. This is made abundantly clear in C++ where the std::vector[^] have push_back, and insert but lacks completely the function push_front
 
So, let us not be silly here, right? Doing push_back on the vector instead and comparing that to insert at index 0 on the linked-list that is probably the valid options a programmer would choose from. Don't you agree?
 
push_back on the vector does not need to shuffle everything for every new element. The overhead when the vector is full and need to be re-sized still applies.
 
I just tested this on my x64 laptop and at ideone.com[^]. Times are in microseconds but running on Windows7 which is not so exact so that is why all the zeroes are in the beginning
 
elements, list,    vector::push_back, vector::push_front
10,          0,       0,                0
100,         0,       0,                0
500,         0,       0,                0
1000,        0,       0,             1000
2000,        0,       0,             1000
10000,       0,       0,            24001
20000,    1000,       0,            90005
40000,    2000,    1000,           381021
100000,   6000,    1000,          2461140
500000,  31001,    5000,         65182728
1000000, 60003,    8000,        315747059
Exiting test,. the whole measuring took 384133 milliseconds (384seconds
or 6 minutes)
 
Clearly the worst case as you described it applies for the unbelievable push_front. More realistically and in line with the reasoning of this article the aim is to try to use the cache line architecture in a smart way (not in a dumb way). Doing push_back will ... well, just look at the numbers. I stick with vector Wink | ;)
 
Here's the code: http://ideone.com/DDEJF[^]
QuestionLists are quite good, in factmemberCristian Amarie9 Apr '12 - 9:43 
When the vast majority of a vector are 0 (ok, the neutral element). Rare matrices is one of the first applications for lists (over vectors) I knew, where M[i, j] = xij (!= 0) and implement operations over rare matrices.
Nuclear launch detected

AnswerRe: Lists are quite good, in fact [modified]memberKjellKod.cc9 Apr '12 - 11:35 
Hi Christian,
 
Thanks for the feedback. Honestly I am a bit rusty with my matrices operations. Apart from some simple geometry transformations and rotations I have not had the luxury to work with that since my university days.
.
What I try to say with this article is more or less the following:
1. In spite of the article title[modified] I do not say linked-list is bad always, only for "number crunching"
2. Linked-list uses disjoint memory access. Traversal on the linked-list is extremely slow since the cache architecture cannot be utilized effectively. RAM access is very slow and thus disjoint memory type data structures will (almost) always be way slower than adjacent memory data structures. This is especially evident when managing raw numbers.

Here I use vector to exemplify that but it could be whatever structure really as long as it is adjacent memory vs disjoint memory.
 

I have a couple of questions for you if you do not mind.
 
1. Could you explain more clearly when this would be faster then vector access?
2. Have you tried that yourself or seen it in reference materials?
3. If it is easy to demonstrate, do you mind posting it on a http://ideone.com snippet?

modified 10 Apr '12 - 2:32.

GeneralRe: Lists are quite good, in factmemberCristian Amarie10 Apr '12 - 8:10 
Ah, I didn't wanted to say that lists beats vectors on speed. Vector will always be faster, not for any other reason that memory contiguous access.
 
That will answer to your questions Smile | :)
 
What will do a list is, apart from regular things (remove/insert vs vector add/pop/push), is to be optimal on space.
On a vector, if you allocate 100000 elements but only 20 are relevant-not zero, you're essentially waste 99980 elements.
 
This is what I wanted to say. Let's say that I wanted to point out for the impatient that might jump into conclusions that "vector good, list bad" that also lists have their uses.
 
I mentioned matrices because for sure linear storage of v[i, j] or even vector of vectors have speed access better than lists (most likely O(n), maybe sometimes O(n log n), but I am not sure), but rare matrices can use lists better by avoiding enumerating a ton of zero elements. But perhaps here a map[ pair< row, col > ] = value would do the trick better than the list.
Nuclear launch detected

GeneralRe: Lists are quite good, in factmemberKjellKod.cc10 Apr '12 - 19:58 
OK now I see your point. Yes that seems to be a good case for unecessary allocations that are just zeroes - there for the math but really just skipped over
 
... although if you skip the zeroes in the list, why can't you just skip them in the vectors too? No need to answer that. I am just showing how little I know of the subject Wink | ;)
 
In other areas, not matrices, it varies from case to case whether or not to allocate all the space at once in the vectors. In the examples in this article the vector is dynamically growing as it needs to. This gives some memory allocation overhead on purpose - it is there to show that in spite of this overhead, and extra copying, and extra shuffling it is still faster than traversing the linked-list.
AnswerRe: Lists are quite good, in factmemberAescleal12 Apr '12 - 21:21 
I was just about to post "what about massive sparse matrices? You wouldn't want to use a list for that" but luckily read your post first.
 
So instead of explaining it I can just say "me too!" in traditional internet style.
 
Cheers,
 
Ash
QuestioninterestingmemberCIDev9 Apr '12 - 5:09 
You make a number of good points. The linked list does have its place, but it is over used.
Just because the code works, it doesn't mean that it is good code.

AnswerRe: interestingmemberKjellKod.cc9 Apr '12 - 11:15 
Thank you CIDev. and I agree - the linked-list does have its uses. ETA for the article update where this will be clearly stated is approx 1 week.
GeneralRe: interestingmemberCIDev9 Apr '12 - 13:00 
Looking forward to reading the update. Smile | :)
Just because the code works, it doesn't mean that it is good code.

GeneralRe: interestingmemberCIDev23 Apr '12 - 12:38 
I really like the update. Smile | :)
Just because the code works, it doesn't mean that it is good code.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 21 Aug 2012
Article Copyright 2012 by KjellKod.cc
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid