Click here to Skip to main content
Click here to Skip to main content

Fast and Then Even Faster String Concatenation With JScript

, 25 Mar 2007 CPOL
Rate this:
Please Sign up or sign in to vote.
Concatenating strings is critical in many dynamic content situations. Here is how to make it scale and then be fast and scale.

Introduction

Let's build a scalable string concatenator, and then optimize its performance.

As we all use JavaScript to build dynamic content more and more, concatenating lots of little strings together becomes increasingly common. This can lead to nasty scaling problems; here, I show how to overcome them using simple linked lists.

What is the problem?

myString+='some string'; looks very innocent, but it is not. The problem is that a new string is created and the entire contents of myString and 'some string' are then copied into it. Although += looks like it just adds one string to another, it actually returns a third string which contains both the originals.

Most of the time, we do not notice this copying going on. This is especially true in scripting where string functions are typically very much faster than the script itself. However, as the concatenated strings get longer, there comes a time when the copying starts to take a noticeable amount of time.

The real killer comes where we repeatedly concatenate strings in a loop. In this case, the time taken to finish the loop eventually scales as the square of iterations, because the length of the string being copied inside each iteration is proportional to the number of pre-ceding iterations. For example, if we add ten characters to a string in a loop and the loop runs one hundred times, in the last iteration of the loop, we have to copy one thousand characters. Yes - we copy one thousand characters just to add ten.

What is a linked list?

One way around this problem is to store the concatenated strings in a linked list and then concatenate them all at the last minute using a more efficient algorithm than just adding them together one at a time. To take this approach means that we have to store the sub-strings somehow. We could store them in an array. However, in JavaScript, you have to access the elements of an array by index. I once got my fingers badly burnt programming in PERL with exactly this issue. In some implementations of arrays, where you can very easily add elements to the end of an array, indexed lookup is not zero scaling. For example, if you have a 1 million element array, how long will it take to access the five hundred thousandth element compared to the time taken to access the fifth element of a ten element array?

For what we want to do, we are only interested in accessing the elements of the storage system in sequence, not indexed. A linked list is ideal for this. A linked list consists of list elements. Each element contains a datum and a link to the next element in the list. You can have doubly linked list and other things, but we only need a singly linked list. To allow us to start 'walking' through the elements of the list, we need to maintain a link to the first element of the list, and to allow us to add to the end of the list, we need to maintain a link to the end. That is it, nice and simple.

The implementation I have put together uses arrays as the elements of the list. Each element is an array with two indices; the first being the datum and the second being a reference (link) to the next element in the list. The last element in the list is denoted by the forward reference (link) being undefined.

Compound concatenation

OK, so now, we have a way of storing the sub strings which scales nicely, but we still need a concatenation algorithm which scales better than M.N (where M is the number of concatenations and N the final length of the string). There are very good concatenation algorithms out there. However, we are living in the world of scripts, and so we are a little more limited in what we can do, and we don't really want to have thousands of lines of code. So, the algorithm I have chosen is what I call 'compound concatenation'. The trick is to progressively halve the number of elements in the linked list by concatenating each odd datum with its next even neighbour until there is only one element left.

Compound concatenation scales as N.Log(M) (where M is the number of concatenations and N is the final length of the string). If we start with 10 elements in the list, the first time we concatenate, we make 5 copies, which leaves us with 5 elements and requires copying N characters. The second time through, we use 2 copies to get us to 3 elements, then one more copy, then a final copy. In all these sessions, we have only copied 4N bytes (actually, slightly less). This is less than half the amount of copying - 10N - required for simple concatenation. This relative benefit increases with the number of elements being concatenated.

It is not all gain though. There is a much more code running to achieve the same thing using the linked list/compound concatenation approach. This means that for small numbers of concatenations, the simple approach is faster and for large numbers of concatenations, the compound approach is faster.

Some real world scaling tests!

To prove the point, I wrote a little test script:

var sc=new StringCat();
var st='';
for(var i=0;i<2500;++i)
{
    st+='Count='+i+'\r\n';
}
    //sc.push('Count='+i+'\r\n');
    //var tmp=sc.toString();

I ran this simple piece of code with either the sc.push line or the st+= line inside the loop. It then measured the amount of time for the entire script to run. The results were:

Iterations StringCat Normal

2500

0.156

0.11

5000

0.25

0.187

10000

0.547

0.437

20000

0.938

1.61

30000

1.55

13.391

40000

2.28

30.969

50000

3.05

53.562

60000

4.06

80.89

70000

5.235

114.187

80000

6.25

152.469

90000

7.42

198.5

100000

8.8

242.016

From this, you can see that at 2500 iterations, the traditional concatenation method was in the lead, but very quickly, it became ridiculously slow compared to the compound approach. The following graphs show the same results, first with a linear time axis, and then with a logarithmic time axis. From the latter, you can see that the compound method will continue to scale nicely whereas the traditional method is just hopeless once the cost of all that character copying outweighs the cost of the rest of the operations.

Time taken in seconds against the number of iterations

Log of time taken in seconds against the number of iterations

The Code!

function StringCat()
{
    var sp;
    var ep;
    var l=0;
    
    this.push=function(what)
    {
        if(typeof(sp)=='undefined')
        {
            ep=new Array();
            sp=ep;
        }
        else
        {
            var oep=ep;
            ep=new Array();
            oep[1]=ep;
        }
        ep[0]=what;
        ++l;
    }
    
    this.toString=function()
    {
        if(l==0)return ;

        while(l>1)
        {
            var ptr=sp;
            var nsp=new Array();
            var nep=nsp;
            var nl=0;
            
            while(typeof(ptr)!='undefined')
            {
                if(typeof(nep[0])=='undefined')
                {
                    nep[0]=ptr[0];
                    ++nl;
                }
                else
                {
                    if(typeof(ptr[0])!='undefined')nep[0]+=ptr[0];
                    nep[1]=new Array();
                    nep=nep[1];
                }
                ptr=ptr[1];
            }
            sp=nsp;
            ep=nep;
            l=nl;
        }
        return sp[0];
    }
}

Note how the toString method converts the linked list into a single element list containing the concatenated string. This means that toString can be called repeatedly with only the minimum necessary number of concatenations occurring.

Not Fast Enough!

The problem with StringCat was that it did not take advantage of the very high speed of the built-in string concatenation method for short strings in JScript. With a small change, this can be corrected with a huge speed up.

The trick is to put an accumulator in the concatenator. The built-in concatenation is used to add to the accumulator until it hits a 'magic' size, then the accumulator's contents are added to the linked list and the accumulator reset:

var accum='';
this.push=function(what)
{
    accum+=what;
    if(accum.length>2800)
    {
        if(typeof(sp)=='undefined')
        {
            ep=new Array();
            sp=ep;
        }
        else
        {
            var oep=ep;
            ep=new Array();
            oep[1]=ep;
        }
        ep[0]=accum;
        accum='';
        ++l;
    }
}

The first question with this set-up is what size to set the accumulator maximum to (the point beyond which it is reset). To find a good approximation for this, I set up a little test. The test is specific to the hardware, JScript version etc., but it is better than nothing. I timed the concatenation of one hundred thousand short strings with different accumulator sizes. The results are in the following graph:

From this graph, we can see that the optimum is between 10^3.3 and 10^3.6; I therefore chose a value of 2800. This gives a very large performance increase indeed, compared to the original StringCat:

As always, here is the source (you can also get it from www.nerds-central.com):

function FStringCat()
{
    var sp;
    var ep;
    var l=0;
    var accum='';
    
    this.push=function(what)
    {
        accum+=what;
        if(accum.length>2800)
        {
            if(typeof(sp)=='undefined')
            {
                ep=new Array();
                sp=ep;
            }
            else
            {
                var oep=ep;
                ep=new Array();
                oep[1]=ep;
            }
            ep[0]=accum;
            accum='';
            ++l;
        }
    }
    
    this.toString=function()
    {
        if(l==0)return ;

        while(l>1)
        {
            var ptr=sp;
            var nsp=new Array();
            var nep=nsp;
            var nl=0;
            
            while(typeof(ptr)!='undefined')
            {
                if(typeof(nep[0])=='undefined')
                {
                    nep[0]=ptr[0];
                    ++nl;
                }
                else
                {
                    if(typeof(ptr[0])!='undefined')nep[0]+=ptr[0];
                    nep[1]=new Array();
                    nep=nep[1];
                }
                ptr=ptr[1];
            }
            sp=nsp;
            ep=nep;
            l=nl;
        }
        return sp[0];
    }
}

For more like this and other topics, why not look up my free tech site: Nerds-Central.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

alex turner
Web Developer
United Kingdom United Kingdom
I am now a Software Systems Developer - Senior Principal at Micro Focus Plc. I am honoured to work in a team developing new compiler and runtime technology for Micro Focus.
 
My past includes a Ph.D. in computational quantum mechanics, software consultancy and several/various software development and architecture positions.
 
For more - see
 
blog: http://nerds-central.blogspot.com
 
twitter: http://twitter.com/alexturner

Comments and Discussions

 
GeneralMore optimal PinmemberMontago7-Sep-08 9:05 
QuestionWah about array push/join? Pinmembermzeo7725-Mar-07 4:48 
AnswerRe: Wah about array push/join? Pinmemberalex turner25-Mar-07 5:28 
AnswerRe: Wah about array push/join? Pinmemberalex turner25-Mar-07 5:39 
GeneralRe: Wah about array push/join? Pinmembermzeo7725-Mar-07 5:57 
GeneralRe: Wah about array push/join? Pinmemberalex turner25-Mar-07 21:31 
GeneralRe: Wah about array push/join? PinmemberRalf D.27-Mar-07 7:41 
GeneralRe: Wah about array push/join? Pinmemberalex turner27-Mar-07 8:21 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.141029.1 | Last Updated 25 Mar 2007
Article Copyright 2007 by alex turner
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid