Numeric String Sort in C#

vcepa

Rate me:

4.86/5 (74 votes)

8 Aug 20059 min read

417.9K

9.4K

106

An article on sorting strings in C# the way Windows Explorer does with file names.

Download source - 5.67 Kb

Introduction

In this short article, I will explain how to sort strings in the natural numeric order. The .NET System.IO.Directory class returns file and directory names ordered in alphabetic order (Win32 API functions for dealing with files order the strings alphabetically), whereas Windows Explorer shows them in natural numeric order. The table below shows the difference between these two orderings:

Alphabetic sort	Natural numeric sort
DOS (CMD prompt) style	Windows Explorer Style
1.txt 10.txt 3.txt a10b1.txt a1b1.txt a2b1.txt a2b11.txt a2b2.txt b1.txt b10.txt b2.txt	1.txt 3.txt 10.txt a1b1.txt a2b1.txt a2b2.txt a2b11.txt a10b1.txt b1.txt b2.txt b10.txt

In the alphabetic order '3.txt' comes before '10.txt' whereas in the natural numeric order '10.txt' comes after '3.txt', which is what we would expect. Windows Explorer uses the natural numeric order for files.

Thanks to Richard Deeming (see comments) I now know that there is a similar function in shlwapi.dll, called StrCmpLogicalW used by Windows Explorer that works only in XP. My current .NET implementation emulates StrCmpLogicalW in pure C# code so it can be used with the .NET version in any version of Windows where .NET runs. My implementation is, however, not fully compatible with the one in Windows Explorer. There are some very slight differences which I will explain after the examples.

The class StringLogicalComparer in my C# code emulates StrCmpLogicalW, and NumericComparer is a class implementing the System.Collections.IComparer interface to be used to sort collections.

Using the code

The natural numeric order comparer for strings is defined in a class named NumericComparer : IComparer and can be found in the source code of this article. I will give several examples here of how the NumericComparer class can be used in code.

Example 1 - Ordering an Array of Strings

This example shows how to use the NumericComparator class to order strings. I will suppose, the strings are in a string[] array as shown next:

string[] files = System.IO.Directory.GetFiles();

NumericComparer ns = new NumericComparer();
Array.Sort(files, ns);

// files will now sorted in the numeric order
// we can do the same for directories

string[] dirs = System.IO.Directory.GetDirectories();
ns = new NumericComparer(); // new object
Array.Sort(dirs, ns);

While we can reuse the same NumericComparer object instance more than once, it is more efficient to create a new object when the set of strings to order changes (I will explain the implementation of NumericComparer later, after the examples).

Example 2 - Ordering Items in a ListView Control

There are several ways to order elements in a ListView control. I will show only how to order the elements responding to a column head click. To define a generic custom way to order rows of a ListView control, when we click the ListView headers, we need to respond to the ColumnClick event and set the ListViewItemSorter property of the ListView control to a class that implements IComparer. The custom ListComparer comparer will usually take other arguments in the constructor, e.g. the column header being clicked that will serve as the index for sorting the ListView elements, as demonstrated by the code snippet below:

private void lstFiles_ColumnClick(object sender,
              System.Windows.Forms.ColumnClickEventArgs e)
{
    ...
    ListComparer lc = new ListComparer(e.Column, ...);
    lstFiles.ListViewItemSorter = lc;
    ...
}

Inside the custom ListComparer, we will have code to order the ListView elements depending on the clicked column header. If we suppose the first column (index 0) contains the strings that need to be ordered in the natural numeric order, then we can use the StringLogicalComparer class that comes with NumericComparer directly inside ListComparer as follows (the code is simplified and has no error checking):

internal class ListComparer : IComparer
{
    ...

    public int Compare(object x, object y)
    {
        ListViewItem lx = (ListViewItem)x;
        ListViewItem ly = (ListViewItem)y;

        switch(column) // set in the constructor
        {
            case 0: // first column
            int c = StringLogicalComparer.Compare(lx.SubItems[0].Text, 
                                                 ly.SubItems[0].Text);
            // take care of the other columns if needed
            // modify c to respond to ascending
            // or descending order, as needed
            ...
            return c;
            ...
        }
    }
}

The strings (file names) in the first column of the ListView control will now be in the natural numeric order.

Example 3 - Ordering Dictionary Entries

Sometimes we need to associate a data object with a key string and we may need to order the string keys and then access the data objects ordered by the keys.

If the keys are unique, we can use a Hashtable to keep them and the associated data objects. .NET offers the possibility to order the keys of a Hashtable and then retrieve the data objects with it:

Hashtable hash = new Hashtable();
// fill with data string key, object data
...
hash.Add(key, data); // key must be unique
...

// numeric sort
NumericComparer nc = new NumericComparer();
SortedList list = new SortedList(hash, nc);
// now use the sorted list
foreach(DictionaryEntry de in list)
{
    // use de.Key and de.Value
    ...
}

A more interesting situation arises when the keys are not unique, that is when we can have different data objects that map to the same key. In this case, we have two choices.

We can keep the data objects as ArrayLists associated with the string keys in a Hashtable. The order of data objects inside the ArrayLists does not matter because they have the same key.

We can build a simple data structure to keep the key and the value data objects or we can use a System.Collections.DictionaryEntry structure. We can now store our data as DictionaryEntry elements of an ArrayList:

ArrayList list = new ArrayList();

// populate the list, possibly from some other structure as part of a loop
...
list.Add(new DictionaryEntry(fileName, data));

To order the elements in this case, we need also to create a custom (generic) comparer:

public class DictionaryEntryComparer : IComparer
{
    private IComparer nc = null;

    public DictionaryEntryComparer(IComparer nc)
    {
        if(nc == null) throw new Exception("null IComparer");
        this.nc = nc;
    }

    public int Compare(object x, object y)
    {
        if((x is DictionaryEntry) && (y is DictionaryEntry))
        {
            return nc.Compare(((DictionaryEntry)x).Key,
                              ((DictionaryEntry)y).Key);
        }
        return -1;
    }
}

We can now order the items of list according to the keys, in the natural numeric order, using:

list.Sort(new DictionaryEntryComparer(new
              NumericComparer()));

Of course, we can use any other IComparer with the DictionaryEntryComparer class we created.

Points of Interest

The complete code can be found in files StringLogicalComparer.cs and NumericComparer.cs in the source code files of this article.

A small difference with Windows Explorer

There is currently a small difference with Windows Explorer. My code will order files that start with special characters based on the code table order. Windows Explorer uses another order. For example:

Windows Explorer:	(1.txt, [1.txt, _1.txt, =1.txt
My code:	(1.txt, =1.txt, [1.txt, _1.txt

I cannot think of any reason why one order can be better than the other. So I see no reason why my code should emulate this specific order. My code uses the current profile to find the order of chars in the code page. Note that, the difference exists only in the first character. If such a special character is inside the file name, Windows Explorer gives the same order as my code.

A practical implication of this special behavior for the first character is that both StrCmpLogicalW and my code work better with file names not with full paths. Use code similar to Example 3 to order file names and keep the directory information.

Implementation

If you are a developer and want just to use the code, then there is no need to read beyond this point. If you are a student in an introductory computer science course, then the following could be interesting.

There are several ways to order strings in the numeric natural order. The problem is when a list of N items is sorted using quick sort then the Compare function will be called more than N times which means that it would be nice to optimize the implementation if any.

The first version of my code used another implementation (see below). I thought the code was nicely optimized and the results of computation were cached. However, the comments of Richard Deeming below, made me wonder if I had it right. In the beginning I thought that the problem was with RegEx, and the Hashtable may be because of deadline :), and even Richard did not do better :) to guess it right (see comments). To understand the problem with my first solution, I will list several possible implementations very shortly:

A simple technique is to use padding with a special character ‘/’. This character has several nice properties. It is not used in file paths in Windows and its ASCII code is smaller than the one for digits. It can be used to pad the numeric parts of two strings so that they have the same length. Example: a10.txt and a1.txt will become a10.txt and a/1.txt. Then the alphabetical order can be used and a/1.txt will be smaller than a10.txt. Finally the '/' padding needs to be removed. This method works, but has some serious limitations. The '/' can be used only for file paths in Windows. If the strings contain ‘/’, this method will not work. This method requires also too many passes over the string and cannot be implemented with fixed char arrays. The method, however, treats numbers somehow uniformly and is different from the rest so it is interesting per se.
The two strings to be compared are split into lists with alphabetical and numeric parts, the parts are then compared one by one. One optimization of this technique would be to remember the split in parts (cache it in a Hashtable). Numeric parts can be converted to numbers and compared. The number conversions can also be cached. My first implementation used this technique.
The implementation is, however, slower than StrCmpLogicalW despite the caching (it would be even slower without dynamic programming). As I read the Richard Deeming comments about the speed of code compared to StrCmpLogicalW, I did not understand the problem at first. The technique is, however, naïve for two reasons. First, it does eager evaluation. The split of the strings is complete and so is the numeric conversion. When two strings are compared the comparison will be often interrupted before all parts are needed. So the eager evaluation consumes a lot of time. The second problem is that numeric parts are explicitly converted to numbers (long). This not only consumes time, it also is an error-prone method because numeric parts that are longer than a long number will throw an exception. (So if you used the previous code it is time to replace it with the new implementation.)
One solution to the problems above is that numeric parts should be compared as special strings not as numbers. Second, using lazy evaluation would also remove the cost of over splitting. The lazy evaluation code for splitting can, however, be complicated.
The full splitting is, however, rarely needed so we can be optimistic and avoid caching. This is similar to using StrCmpLogicalW. The current implementation of StringLogicalComparer only parses the two strings at the same time and stops parsing at the moment the result of the comparison in known. The technique is also very fast (I hope Richard will test the code again) because it works using fixed-size char arrays. The only look-ahead is to find the end index of the current numerical parts in both strings.

History

03 August 2005
1. New version. Several bugs corrected.
02 August 2005
1. New version. Replaced the old one.
15 July 2005
1. Modified NumericComparer to allow cache size initialization.
2. Modified the article to show a Hashlist example and use DictionaryEntry.
3. Several minor article corrections.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

vcepa

Germany

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.