|
Introduction
In this short article, I will explain how to sort strings in the natural numeric order. The .NET System.IO.Directory class returns file and directory names ordered in alphabetic order (Win32 API functions for dealing with files order the strings alphabetically), whereas Windows Explorer shows them in natural numeric order. The table below shows the difference between these two orderings:
| Alphabetic sort |
Natural numeric sort |
| DOS (CMD prompt) style |
Windows Explorer Style |
1.txt 10.txt 3.txt a10b1.txt a1b1.txt a2b1.txt a2b11.txt a2b2.txt b1.txt b10.txt b2.txt |
1.txt 3.txt 10.txt a1b1.txt a2b1.txt a2b2.txt a2b11.txt a10b1.txt b1.txt b2.txt b10.txt |
In the alphabetic order '3.txt' comes before '10.txt' whereas in the natural numeric order '10.txt' comes after '3.txt', which is what we would expect. Windows Explorer uses the natural numeric order for files.
Thanks to Richard Deeming (see comments) I now know that there is a similar function in shlwapi.dll, called StrCmpLogicalW used by Windows Explorer that works only in XP. My current .NET implementation emulates StrCmpLogicalW in pure C# code so it can be used with the .NET version in any version of Windows where .NET runs. My implementation is, however, not fully compatible with the one in Windows Explorer. There are some very slight differences which I will explain after the examples.
The class StringLogicalComparer in my C# code emulates StrCmpLogicalW, and NumericComparer is a class implementing the System.Collections.IComparer interface to be used to sort collections.
Using the code
The natural numeric order comparer for strings is defined in a class named NumericComparer : IComparer and can be found in the source code of this article. I will give several examples here of how the NumericComparer class can be used in code.
Example 1 - Ordering an Array of Strings
This example shows how to use the NumericComparator class to order strings. I will suppose, the strings are in a string[] array as shown next: string[] files = System.IO.Directory.GetFiles();
NumericComparer ns = new NumericComparer();
Array.Sort(files, ns);
string[] dirs = System.IO.Directory.GetDirectories();
ns = new NumericComparer();
Array.Sort(dirs, ns);
While we can reuse the same NumericComparer object instance more than once, it is more efficient to create a new object when the set of strings to order changes (I will explain the implementation of NumericComparer later, after the examples).
Example 2 - Ordering Items in a ListView Control
There are several ways to order elements in a ListView control. I will show only how to order the elements responding to a column head click. To define a generic custom way to order rows of a ListView control, when we click the ListView headers, we need to respond to the ColumnClick event and set the ListViewItemSorter property of the ListView control to a class that implements IComparer. The custom ListComparer comparer will usually take other arguments in the constructor, e.g. the column header being clicked that will serve as the index for sorting the ListView elements, as demonstrated by the code snippet below: private void lstFiles_ColumnClick(object sender,
System.Windows.Forms.ColumnClickEventArgs e)
{
...
ListComparer lc = new ListComparer(e.Column, ...);
lstFiles.ListViewItemSorter = lc;
...
}
Inside the custom ListComparer, we will have code to order the ListView elements depending on the clicked column header. If we suppose the first column (index 0) contains the strings that need to be ordered in the natural numeric order, then we can use the StringLogicalComparer class that comes with NumericComparer directly inside ListComparer as follows (the code is simplified and has no error checking): internal class ListComparer : IComparer
{
...
public int Compare(object x, object y)
{
ListViewItem lx = (ListViewItem)x;
ListViewItem ly = (ListViewItem)y;
switch(column)
{
case 0:
int c = StringLogicalComparer.Compare(lx.SubItems[0].Text,
ly.SubItems[0].Text);
...
return c;
...
}
}
}
The strings (file names) in the first column of the ListView control will now be in the natural numeric order.
Example 3 - Ordering Dictionary Entries
Sometimes we need to associate a data object with a key string and we may need to order the string keys and then access the data objects ordered by the keys.
If the keys are unique, we can use a Hashtable to keep them and the associated data objects. .NET offers the possibility to order the keys of a Hashtable and then retrieve the data objects with it: Hashtable hash = new Hashtable();
...
hash.Add(key, data);
...
NumericComparer nc = new NumericComparer();
SortedList list = new SortedList(hash, nc);
foreach(DictionaryEntry de in list)
{
...
}
A more interesting situation arises when the keys are not unique, that is when we can have different data objects that map to the same key. In this case, we have two choices.
- We can keep the data objects as
ArrayLists associated with the string keys in a Hashtable. The order of data objects inside the ArrayLists does not matter because they have the same key.
- We can build a simple data structure to keep the key and the value data objects or we can use a
System.Collections.DictionaryEntry structure. We can now store our data as DictionaryEntry elements of an ArrayList: ArrayList list = new ArrayList();
...
list.Add(new DictionaryEntry(fileName, data));
To order the elements in this case, we need also to create a custom (generic) comparer: public class DictionaryEntryComparer : IComparer
{
private IComparer nc = null;
public DictionaryEntryComparer(IComparer nc)
{
if(nc == null) throw new Exception("null IComparer");
this.nc = nc;
}
public int Compare(object x, object y)
{
if((x is DictionaryEntry) && (y is DictionaryEntry))
{
return nc.Compare(((DictionaryEntry)x).Key,
((DictionaryEntry)y).Key);
}
return -1;
}
}
We can now order the items of list according to the keys, in the natural numeric order, using: list.Sort(new DictionaryEntryComparer(new
NumericComparer()));
Of course, we can use any other IComparer with the DictionaryEntryComparer class we created.
Points of Interest
The complete code can be found in files StringLogicalComparer.cs and NumericComparer.cs in the source code files of this article.
A small difference with Windows Explorer
There is currently a small difference with Windows Explorer. My code will order files that start with special characters based on the code table order. Windows Explorer uses another order. For example:
| Windows Explorer: |
(1.txt, [1.txt, _1.txt, =1.txt |
| My code: |
(1.txt, =1.txt, [1.txt, _1.txt |
I cannot think of any reason why one order can be better than the other. So I see no reason why my code should emulate this specific order. My code uses the current profile to find the order of chars in the code page. Note that, the difference exists only in the first character. If such a special character is inside the file name, Windows Explorer gives the same order as my code.
A practical implication of this special behavior for the first character is that both StrCmpLogicalW and my code work better with file names not with full paths. Use code similar to Example 3 to order file names and keep the directory information.
Implementation
If you are a developer and want just to use the code, then there is no need to read beyond this point. If you are a student in an introductory computer science course, then the following could be interesting.
There are several ways to order strings in the numeric natural order. The problem is when a list of N items is sorted using quick sort then the Compare function will be called more than N times which means that it would be nice to optimize the implementation if any.
The first version of my code used another implementation (see below). I thought the code was nicely optimized and the results of computation were cached. However, the comments of Richard Deeming below, made me wonder if I had it right. In the beginning I thought that the problem was with RegEx, and the Hashtable may be because of deadline :), and even Richard did not do better :) to guess it right (see comments). To understand the problem with my first solution, I will list several possible implementations very shortly:
- A simple technique is to use padding with a special character ‘/’. This character has several nice properties. It is not used in file paths in Windows and its ASCII code is smaller than the one for digits. It can be used to pad the numeric parts of two strings so that they have the same length. Example: a10.txt and a1.txt will become a10.txt and a/1.txt. Then the alphabetical order can be used and a/1.txt will be smaller than a10.txt. Finally the '/' padding needs to be removed. This method works, but has some serious limitations. The '/' can be used only for file paths in Windows. If the strings contain ‘/’, this method will not work. This method requires also too many passes over the string and cannot be implemented with fixed char arrays. The method, however, treats numbers somehow uniformly and is different from the rest so it is interesting per se.
- The two strings to be compared are split into lists with alphabetical and numeric parts, the parts are then compared one by one. One optimization of this technique would be to remember the split in parts (cache it in a
Hashtable). Numeric parts can be converted to numbers and compared. The number conversions can also be cached. My first implementation used this technique.
The implementation is, however, slower than StrCmpLogicalW despite the caching (it would be even slower without dynamic programming). As I read the Richard Deeming comments about the speed of code compared to StrCmpLogicalW, I did not understand the problem at first. The technique is, however, naïve for two reasons. First, it does eager evaluation. The split of the strings is complete and so is the numeric conversion. When two strings are compared the comparison will be often interrupted before all parts are needed. So the eager evaluation consumes a lot of time. The second problem is that numeric parts are explicitly converted to numbers (long). This not only consumes time, it also is an error-prone method because numeric parts that are longer than a long number will throw an exception. (So if you used the previous code it is time to replace it with the new implementation.)
- One solution to the problems above is that numeric parts should be compared as special strings not as numbers. Second, using lazy evaluation would also remove the cost of over splitting. The lazy evaluation code for splitting can, however, be complicated.
- The full splitting is, however, rarely needed so we can be optimistic and avoid caching. This is similar to using
StrCmpLogicalW. The current implementation of StringLogicalComparer only parses the two strings at the same time and stops parsing at the moment the result of the comparison in known. The technique is also very fast (I hope Richard will test the code again) because it works using fixed-size char arrays. The only look-ahead is to find the end index of the current numerical parts in both strings.
History
- 03 August 2005
- New version. Several bugs corrected.
- 02 August 2005
- New version. Replaced the old one.
- 15 July 2005
- Modified
NumericComparer to allow cache size initialization.
- Modified the article to show a Hashlist example and use
DictionaryEntry.
- Several minor article corrections.
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 36 (Total in Forum: 36) (Refresh) | FirstPrevNext |
|
|
 |
|
|
Thank you for sharing this code, first of ALL. IT was really the pain in the .... for me to do it and honestly, I never succeeded. But Now, I am.
I was just wondering that WHY did you code that a LOT if you know the API? Why NOT just use the API then?
VB Declaration for the API: <DllImport("shlwapi.dll", CharSet:=CharSet.Unicode, ExactSpelling:=True)> _ Private Shared Function StrCmpLogicalW(ByVal x As String, ByVal y As String) As Integer End Function
C# Declaration for the API: [DllImport("shlwapi.dll", CharSet=CharSet.Unicode, ExactSpelling=true)] static extern int StrCmpLogicalW(String x, String y);
And you finish job in just one line:
Return StrCmpLogicalW(x, y) (add terminator for C# :->)
For sure, it will be VERY Fast as compared to manual code and lot of checks etc. And secondly, there will be no difference in Explorer sort and our code. No?
Thanks again for sharing the damn hidden thing. Sameers
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
If you read the article carefully, the reasons not to use the shell function are:
a) it is supported in XP and up, not in older systems b) it is slower than the .net code given here c) it makes your app depend on the shell
If all of these are ok for you, you can use the shell version, there is nothing wrong with it.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
How to apply it to Datagridview which datasource set to bindingsource. It seems the sorting should be applied for a Bindingsource. But how do it?
|
| Sign In·View Thread·PermaLink | 1.50/5 (2 votes) |
|
|
|
 |
|
|
Hi,
This code is awesome and I am glad I found it. It makes my work for the sort implementation simple. But, I have a quick question. When I sort a set of string numbers, the order which I get back is: 15235.1 15235.02
I was expecting that the order would be other way around: 15235.02 15235.1
I am hoping that you would be able to point to the changes in the code to accomplish this.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Most people need it like below (and this also how Windows Explorer and my code order them):
001 01 1 002 02 2
and you need it like this:
001 002 01 02 1 2
This is of course possible, but no matter what order I choose some people will not like it. So my code does it by default the same as Windows Explorer does.
I added now ns.StringLogicalComparer.DefaultZeroesFirst that does what you expect. The latest version can be found at http://madebits.com/articles/numsort/index.php
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
I recently used your StringLogicalComparer class to sort an ArrayList which contained Custom Objects for example:-
public class Person: IComparable { public string Name; public string Email; public string Reference;
public int CompareTo(object obj) { if( !(obj is Person) ) throw new InvalidCastException("Not a valid Person object."); Person person = (Person)obj; return StringLogicalComparer.Compare(this.Reference,person.Reference); } }
Then of course when you wanted to sort this ArrayList all I had to do was:- people.Sort();
Now to the question. Is there a way of specifying what we sort by? Name, Email or Reference?
Cheers
Andy M
-- modified at 12:22 Thursday 22nd March, 2007
Sorry I was being a bit dumb when I asked this question. What I would need would be an IComparer for each of the three values.
EXAMPLE:-
public class PersonNameComparer: IComparer { public int Compare(object x, object y) { if((x is Person) && (y is Person)) { return StringLogicalComparer.Compare(((Person)x).Name,((Person)y).Name); } return -1; } }
people.Sort(new PersonNameComparer());
Thanks for the original Code anyway!!!
|
| Sign In·View Thread·PermaLink | 2.00/5 (2 votes) |
|
|
|
 |
|
|
Hi,
Great job.
Was looking for this algorithm since two days. Finally your code saved me and it is working fine. Thank you.
Ramkumar
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi, just wanted to say: Thank You!!
I've been searching for such a sorting class, google returned nothing but Code Project once again saved me from doing all the stuff myself.... Great one, works excellent, thanks!!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
I tried this class and works great with the exception of Icelandic special characters (haven't tried other non-english characters). To correct this I changed this line
r = Char.ToLower( s1[ i1 ] ).CompareTo( Char.ToLower( s2[ i2 ] ) ); to this...
r = ( Char.ToLower( s1[ i1 ] ) ).ToString( ).CompareTo( ( Char.ToLower( s2[ i2 ] ) ).ToString( ) );
The reason the first line gets this wrong is it uses a char comparison which uses ASCII code to compare, but by making it a string of one character instead it does a string comparison and gets it right, providing the correct locale is used of course.
Hope this helps.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Dear Vasian,
In the StringLogicalComparer I think that the second empty string comparison in the Compare method:
else if(s2.Equals(string.Empty)) return -1;
should be
else if(s2.Equals(string.Empty)) return 1;
otherwise when sorting a list containing empty strings the sort doesn't work properly. You can test this with input such as:
"Flat 11", "", "Flat 14", "Flat 17", "Flat 20", "Flat 2", "Flat 4",
The output is (using your supplied code):
Out: Flat 11 Out: Out: Flat 2 Out: Flat 4 Out: Flat 14 Out: Flat 17 Out: Flat 20
But as follows:
Out: Out: Flat 2 Out: Flat 4 Out: Flat 11 Out: Flat 14 Out: Flat 17 Out: Flat 20
when using the line that returns 1 for the second empty string.
I hope that's useful.
Claire
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Yes, it is surely a bug done in hurry. One of the two altenatives (the second) should return 1 and the other -1 (the same as for the null case). It has skipped me, given that I never tested it with empty strings . Thanks for noting it out. The optimized code below, which is recommended to use, does not have this bug.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
I am not sure if I really understood this. Can you be more elaborate what is problem that you see?
The code will compare characters based on the current locale, whatever it is.
Regarding the numeric parts, I do not see any reason why anyone would like to globalize it. E.g., US 1000, DE 1,000 but nevertheless one would name a file a1000.txt and not a1,000.txt or a1.000.txt. The only problem is if names need to be compared in the reverse order (a file named txt.0001a ???) in this case the code need be modified to compare strings on the reverse order (that from max length to 0).
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Maybe you may compare two bytes in the case of the [,(,_,= just convert the char to compare in byte and compare the two. I dont know if .net framework already do this.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The question is not how to do it, but why to do it. I see no reason why one order is better than the other.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Richard Deeming was kind to optimize the new version of the code I wrote, so it now performs better than StrCmpLogicalW (see the performance data from Richard below). The new code is given below with permission from Richard. It can be used as 'StringLogicalComparer.Default' in every place where IComparer is required.
// Optimized by Richard Deeming // Original code by Vasian Cepa
using System; using System.Collections;
namespace ns { public sealed class StringLogicalComparer : IComparer { private static readonly IComparer _default = new StringLogicalComparer(); private StringLogicalComparer() { } public static IComparer Default { get { return _default; } } public int Compare(object x, object y) { if (null == x && null == y) return 0; if (null == x) return -1; if (null == y) return 1; if (x is string && y is string) return Compare((string)x, (string)y); return Comparer.Default.Compare(x, y); } public static int Compare(string s1, string s2) { if (null == s1 || 0 == s1.Length) { if (null == s2 || 0 == s2.Length) return 0; return -1; } else if (null == s2 || 0 == s2.Length) { return 1; } int s1Length = s1.Length; int s2Length = s2.Length; bool sp1 = char.IsLetterOrDigit(s1[0]); bool sp2 = char.IsLetterOrDigit(s2[0]); if (sp1 && !sp2) return 1; if (!sp1 && sp2) return -1; char c1, c2; int i1 = 0, i2 = 0; int r = 0; bool letter1, letter2; while(true) { c1 = s1[i1]; c2 = s2[i2]; sp1 = char.IsDigit(c1); sp2 = char.IsDigit(c2); if (!sp1 && !sp2) { if (c1 != c2) { letter1 = char.IsLetter(c1); letter2 = char.IsLetter(c2); if (letter1 && letter2) { c1 = char.ToUpper(c1); c2 = char.ToUpper(c2); r = c1 - c2; if (0 != r) return r; } else if (!letter1 && !letter2) { r = c1 - c2; if (0 != r) return r; } else if (letter1) { return 1; } else if (letter2) { return -1; } } } else if (sp1 && sp2) { r = CompareNumbers(s1, s1Length, ref i1, s2, s2Length, ref i2); if (0 != r) return r; } else if (sp1) { return -1; } else if (sp2) { return 1; } i1++; i2++; if (i1 >= s1Length) { if (i2 >= s2Length) return 0; return -1; } else if (i2 >= s2Length) { return 1; } } } private static int CompareNumbers( string s1, int s1Length, ref int i1, string s2, int s2Length, ref int i2) { int nzStart1 = i1, nzStart2 = i2; int end1 = i1, end2 = i2; ScanNumber(s1, s1Length, i1, ref nzStart1, ref end1); ScanNumber(s2, s2Length, i2, ref nzStart2, ref end2); int start1 = i1; i1 = end1 - 1; int start2 = i2; i2 = end2 - 1; int length1 = end2 - nzStart2; int length2 = end1 - nzStart1; if (length1 == length2) { int r; for(int j1 = nzStart1, j2 = nzStart2; j1 <= i1; j1++, j2++) { r = s1[j1] - s2[j2]; if (0 != r) return r; } length1 = end1 - start1; length2 = end2 - start2; if (length1 == length2) return 0; } if (length1 > length2) return -1; return 1; } private static void ScanNumber(string s, int length, int start, ref int nzStart, ref int end) { nzStart = start; end = start; bool countZeros = true; char c = s[end]; while(true) { if (countZeros) { if ('0' == c) { nzStart++; } else { countZeros = false; } } end++; if (end >= length) break; c = s[end]; if (!char.IsDigit(c)) break; } } } }
The performance data are:
Sorting a list of 11 items:
Default sort: Elapsed time: 0.000140088219287944 --- NumericComparer: Elapsed time: 0.00254989330621831 --- StringLogicalComparer: Elapsed time: 0.00202851230200785 --- StrCmpLogicalW: Elapsed time: 0.00430522405694154 ---
Sorting a list of 256 items:
Default sort: Elapsed time: 0.000434439893449431 --- NumericComparer: Elapsed time: 0.00485548184057514 --- StringLogicalComparer: Elapsed time: 0.00275860460286999 --- StrCmpLogicalW: Elapsed time: 0.00530583128061181 ---
Sorting a list of 5222 items:
Default sort: Elapsed time: 0.0198101999971357 --- NumericComparer: Elapsed time: 0.136936525076619 --- StringLogicalComparer: Elapsed time: 0.0361899879700971 --- StrCmpLogicalW: Elapsed time: 0.0720836894566494 ---
Sorting a list of 8511 items:
Default sort: Elapsed time: 0.0327280901813078 --- NumericComparer: Elapsed time: 0.226701231990949 --- StringLogicalComparer: Elapsed time: 0.0595433327127431 --- StrCmpLogicalW: Elapsed time: 0.126432379772578 ---
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
In Windows XP or higher, you can use the same function that Explorer uses by P/Invoking the StrCmpLogicalW function.
[DllImport("shlwapi.dll", CharSet=CharSet.Unicode, ExactSpelling=true, SetLastError=true)] private static extern int StrCmpLogicalW(string strA, string strB);
You can even check the version of the ShlWapi library, and revert to a different method for older versions:
using System; using System.Collections; using System.Runtime.InteropServices; public sealed class StringLogicalComparer : IComparer { [StructLayout(LayoutKind.Sequential)] private struct DllVersionInfo { public int cbSize; public int dwMajorVersion; public int dwMinorVersion; public int dwBuildNumber; public int dwPlatformID; } [DllImport("shlwapi.dll", EntryPoint="DllGetVersion", SetLastError=true)] private static extern int GetShlWapiVersion(ref DllVersionInfo version); [DllImport("shlwapi.dll", CharSet=CharSet.Unicode, ExactSpelling=true, SetLastError=true)] private static extern int StrCmpLogicalW(string strA, string strB); private static readonly bool _isSupported; private static readonly IComparer _default; static StringLogicalComparer() { DllVersionInfo ver = new DllVersionInfo(); ver.cbSize = Marshal.SizeOf(ver); try { GetShlWapiVersion(ref ver); } catch { } _isSupported = 5 < ver.dwMajorVersion || (5 == ver.dwMajorVersion && 5 <= ver.dwMinorVersion); if (_isSupported) _default = new StringLogicalComparer(); else _default = CaseInsensitiveComparer.DefaultInvariant; } private StringLogicalComparer() { } public static IComparer Default { get { return _default; } } public static bool IsSupported { get { return _isSupported; } } public int Compare(object x, object y) { string left = x as string; if (null != left && 0 != left.Length) { string right = y as string; if (null != right && 0 != right.Length) { try { return StrCmpLogicalW(left, right); } catch { } } } return Comparer.Default.Compare(x, y); } } http://msdn.microsoft.com/library/en-us/shellcc/platform/shell/reference/shlwapi/string/strcmplogicalw.asp[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
|
| Sign In·View Thread·PermaLink | 5.00/5 (2 votes) |
|
|
|
 |
| | |