In this short article, I will explain how to sort strings in the natural numeric order. The .NET
System.IO.Directory class returns file and directory names ordered in alphabetic order (Win32 API functions for dealing with files order the strings alphabetically), whereas Windows Explorer shows them in natural numeric order. The table below shows the difference between these two orderings:
<TH bgColor=#e1e1e1>Alphabetic sort
<TH bgColor=#e1e1e1>Natural numeric sort
|DOS (CMD prompt) style||Windows Explorer Style|
In the alphabetic order '3.txt' comes before '10.txt' whereas in the natural numeric order '10.txt' comes after '3.txt', which is what we would expect. Windows Explorer uses the natural numeric order for files.
Thanks to Richard Deeming (see comments) I now know that there is a similar function in shlwapi.dll, called
StrCmpLogicalW used by Windows Explorer that works only in XP. My current .NET implementation emulates
StrCmpLogicalW in pure C# code so it can be used with the .NET version in any version of Windows where .NET runs. My implementation is, however, not fully compatible with the one in Windows Explorer. There are some very slight differences which I will explain after the examples.
StringLogicalComparer in my C# code emulates
NumericComparer is a class implementing the
System.Collections.IComparer interface to be used to sort collections.
Using the code
The natural numeric order comparer for strings is defined in a class named
NumericComparer : IComparer and can be found in the source code of this article. I will give several examples here of how the
NumericComparer class can be used in code.
Example 1 - Ordering an Array of Strings
This example shows how to use the
NumericComparator class to order strings. I will suppose, the strings are in a
string array as shown next:
string files = System.IO.Directory.GetFiles();
NumericComparer ns = new NumericComparer();
string dirs = System.IO.Directory.GetDirectories();
ns = new NumericComparer();
While we can reuse the same
NumericComparer object instance more than once, it is more efficient to create a new object when the set of strings to order changes (I will explain the implementation of
NumericComparer later, after the examples).
Example 2 - Ordering Items in a ListView Control
There are several ways to order elements in a
ListView control. I will show only how to order the elements responding to a column head click. To define a generic custom way to order rows of a
ListView control, when we click the
ListView headers, we need to respond to the
ColumnClick event and set the
ListViewItemSorter property of the
ListView control to a class that implements
IComparer. The custom
ListComparer comparer will usually take other arguments in the constructor, e.g. the column header being clicked that will serve as the index for sorting the
ListView elements, as demonstrated by the code snippet below:
private void lstFiles_ColumnClick(object sender,
ListComparer lc = new ListComparer(e.Column, ...);
lstFiles.ListViewItemSorter = lc;
Inside the custom
ListComparer, we will have code to order the
ListView elements depending on the clicked column header. If we suppose the first column (index 0) contains the strings that need to be ordered in the natural numeric order, then we can use the
StringLogicalComparer class that comes with
NumericComparer directly inside
ListComparer as follows (the code is simplified and has no error checking):
internal class ListComparer : IComparer
public int Compare(object x, object y)
ListViewItem lx = (ListViewItem)x;
ListViewItem ly = (ListViewItem)y;
int c = StringLogicalComparer.Compare(lx.SubItems.Text,
The strings (file names) in the first column of the
ListView control will now be in the natural numeric order.
Example 3 - Ordering Dictionary Entries
Sometimes we need to associate a data object with a key string and we may need to order the string keys and then access the data objects ordered by the keys.
If the keys are unique, we can use a
Hashtable to keep them and the associated data objects. .NET offers the possibility to order the keys of a
Hashtable and then retrieve the data objects with it:
Hashtable hash = new Hashtable();
NumericComparer nc = new NumericComparer();
SortedList list = new SortedList(hash, nc);
foreach(DictionaryEntry de in list)
A more interesting situation arises when the keys are not unique, that is when we can have different data objects that map to the same key. In this case, we have two choices.
- We can keep the data objects as
ArrayLists associated with the string keys in a
Hashtable. The order of data objects inside the
ArrayLists does not matter because they have the same key.
- We can build a simple data structure to keep the key and the value data objects or we can use a
System.Collections.DictionaryEntry structure. We can now store our data as
DictionaryEntry elements of an
ArrayList list = new ArrayList();
list.Add(new DictionaryEntry(fileName, data));
To order the elements in this case, we need also to create a custom (generic) comparer:
public class DictionaryEntryComparer : IComparer
private IComparer nc = null;
public DictionaryEntryComparer(IComparer nc)
if(nc == null) throw new Exception("null IComparer");
this.nc = nc;
public int Compare(object x, object y)
if((x is DictionaryEntry) && (y is DictionaryEntry))
We can now order the items of
list according to the keys, in the natural numeric order, using:
Of course, we can use any other
IComparer with the
DictionaryEntryComparer class we created.
Points of Interest
The complete code can be found in files StringLogicalComparer.cs and NumericComparer.cs in the source code files of this article.
A small difference with Windows Explorer
There is currently a small difference with Windows Explorer. My code will order files that start with special characters based on the code table order. Windows Explorer uses another order. For example:
<TD bgColor=#e1e1e1>Windows Explorer:
<TD bgColor=#e1e1e1>My code:
|(1.txt, [1.txt, _1.txt, =1.txt|
|(1.txt, =1.txt, [1.txt, _1.txt|
I cannot think of any reason why one order can be better than the other. So I see no reason why my code should emulate this specific order. My code uses the current profile to find the order of chars in the code page. Note that, the difference exists only in the first character. If such a special character is inside the file name, Windows Explorer gives the same order as my code.
A practical implication of this special behavior for the first character is that both
StrCmpLogicalW and my code work better with file names not with full paths. Use code similar to Example 3 to order file names and keep the directory information.
If you are a developer and want just to use the code, then there is no need to read beyond this point. If you are a student in an introductory computer science course, then the following could be interesting.
There are several ways to order strings in the numeric natural order. The problem is when a list of N items is sorted using quick sort then the
Compare function will be called more than N times which means that it would be nice to optimize the implementation if any.
The first version of my code used another implementation (see below). I thought the code was nicely optimized and the results of computation were cached. However, the comments of Richard Deeming below, made me wonder if I had it right. In the beginning I thought that the problem was with RegEx, and the
Hashtable may be because of deadline :), and even Richard did not do better :) to guess it right (see comments). To understand the problem with my first solution, I will list several possible implementations very shortly:
- A simple technique is to use padding with a special character ‘/’. This character has several nice properties. It is not used in file paths in Windows and its ASCII code is smaller than the one for digits. It can be used to pad the numeric parts of two strings so that they have the same length. Example: a10.txt and a1.txt will become a10.txt and a/1.txt. Then the alphabetical order can be used and a/1.txt will be smaller than a10.txt. Finally the '/' padding needs to be removed. This method works, but has some serious limitations. The '/' can be used only for file paths in Windows. If the strings contain ‘/’, this method will not work. This method requires also too many passes over the string and cannot be implemented with fixed char arrays. The method, however, treats numbers somehow uniformly and is different from the rest so it is interesting per se.
- The two strings to be compared are split into lists with alphabetical and numeric parts, the parts are then compared one by one. One optimization of this technique would be to remember the split in parts (cache it in a
Hashtable). Numeric parts can be converted to numbers and compared. The number conversions can also be cached. My first implementation used this technique.
The implementation is, however, slower than
StrCmpLogicalW despite the caching (it would be even slower without dynamic programming). As I read the Richard Deeming comments about the speed of code compared to
StrCmpLogicalW, I did not understand the problem at first. The technique is, however, naïve for two reasons. First, it does eager evaluation. The split of the strings is complete and so is the numeric conversion. When two strings are compared the comparison will be often interrupted before all parts are needed. So the eager evaluation consumes a lot of time. The second problem is that numeric parts are explicitly converted to numbers (
long). This not only consumes time, it also is an error-prone method because numeric parts that are longer than a
long number will throw an exception. (So if you used the previous code it is time to replace it with the new implementation.)
- One solution to the problems above is that numeric parts should be compared as special strings not as numbers. Second, using lazy evaluation would also remove the cost of over splitting. The lazy evaluation code for splitting can, however, be complicated.
- The full splitting is, however, rarely needed so we can be optimistic and avoid caching. This is similar to using
StrCmpLogicalW. The current implementation of
StringLogicalComparer only parses the two strings at the same time and stops parsing at the moment the result of the comparison in known. The technique is also very fast (I hope Richard will test the code again) because it works using fixed-size char arrays. The only look-ahead is to find the end index of the current numerical parts in both strings.
- 03 August 2005
- New version. Several bugs corrected.
- 02 August 2005
- New version. Replaced the old one.
- 15 July 2005
NumericComparer to allow cache size initialization.
- Modified the article to show a Hashlist example and use
- Several minor article corrections.