Sort titles using a TitleComparer





4.00/5 (7 votes)
Apr 9, 2004
4 min read

35813

595
An implementation of the IComparer interface that allows strings to be sorted as Titles.
Introduction
At some point in time, we all run into a situation where the order in which our human readers expect to see information displayed is not what we get from the built-in string sorting capabilities of SQL or Visual Studio. When listing street addresses, an end user probably expects "20 Main St." to appear after "9 Main St.", and not between "1 Main St." and "3 Main St.". Another situation in which basic string sorting produces less-than-desirable results is in the display of titles. Most end users want/expect to see "The Evil Dead" listed between "Event Horizon" and "Evil Dead II", rather than at the bottom of the list between "Swamp Thing" and "They Live".
Full-blown indexing systems often rely on sophisticated language parsers or on some form of coding to work around this issue (for example, using "<The >Bells of {Saint }<St. >Mary's" to display "The Bells of St. Mary's" and sort by "Bells of Saint Mary's"). Unless indexing is a core feature of your project, however, that type of solution is probably overkill (not to mention over budget!).
The traditional "simple" solution to this problem is to manipulate the Titles at the time they are entered into the system. One often-used approach is to modify the title, either by placing the problem words at the end of the title or by dropping them all together (e.g. "Beautiful Mind, A" or just "Beautiful Mind"). Another approach is to use multiple fields to store the title, either splitting it into First and Last parts or storing both a Display value and a Sort value.
This article presents a third option: using the .NET Framework's IComparer
interface to create a custom TitleComparer
that will sort strings according to the following rules:
- Articles such as "a", "an" and "the" will be ignored wherever they occur.
- Other words such as "of" will be ignored only when they are not the first word of the title.
- Punctuation will be replaced according to NISO rules.
- Numbers will be sorted by their numeric value.
- Titles will be compared word by word. If the end of one of the words is reached before a difference has been found, the longer word is considered "greater"; similarly, if the end of one of the titles is reached before a difference is found, the longer title is considered "greater".
Using the code
Using the code "as is" is really pretty straight-forward. First, include TitleComparer.vb in your project (or include it in a class library and reference the library instead). To compare one string to another using the title rules:
Dim oComparer As New TitleComparer()
Dim sMsg As String
Select Case oComparer.Compare(sTitle1, sTitle2)
Case 0
sMsg = "The two strings sort the same"
Case Is > 0
sMsg = String.Format("'{0}' is greater than '{1}'", sTitle1, sTitle2)
Case Is < 0
sMsg = String.Format("'{0}' is greater than '{1}'", sTitle2, sTitle1)
End Select
MsgBox(sMsg, MsgBoxStyle.Information)
To sort an ArrayList
of String
s:
Dim oList as New ArrayList()
PopulateArrayList(oList)
oList.Sort(New TitleComparer())
ArrayList
s aren't the ideal collection to use because they can contain any type of object, not just strings; unfortunately, the Specialized.StringCollection
won't work for us either in this case, because it doesn't expose a Sort
method. Although it isn't used in the demo, the demo project does contain a simple StringList
class that you can use as a sortable, type-safe collection in place of the ArrayList
in the above example.
Points of Interest
To handle the requirement of ignoring certain words wherever they occur, and others only if they're not at the start of the title, the TitleComparer
class contains two private Shared
properties: IngoredWords
and IgnoredNonInitialWords
. Being shared properties backed by shared StringCollections
means that the lists are only initialized once, even though there may be several instances of TitleComparer
.
The punctuation requirement is implemented using the shared Replace
method of the RegEx
class. For clarity in the demo, I constructed the regular expression using a StringBuilder
. If you have not worked with regular expressions before, this should be a rather simple introduction to one of their uses.
The demo project contains examples of other techniques, not directly related to implementing the TitleComparer
. The Movie.vb file demonstrates creating both a type-safe collection and a custom comparer to operate on a specific type (class). The main WinForm contains a thread-safe shared declaration of Sub Main
to provide the application with a Windows XP-style UI when running on that operating system.
Enhancement Ideas
The following list presents some ideas about how you might want to enhance/alter the provided code to meet the needs of your project:
- The most obvious adjustment you could make is to add or remove words from the
IngoredWords
andIgnoredNonInitialWords
lists. Or, if you don't require its functionality, you can remove theIgnoredNonInitialWords
list entirely. - As noted in the code, the two lists of "Ignore" words are currently hard-coded. Placing them in an XML file or database would allow you to make adjustments in the field, without having to recompile the application.
- The code currently sorts Roman numerals alphabetically, so nine (IX) will appear before five (V). If you add code to process Roman numerals numerically, be aware that the word mix is also a valid roman numeral (1009).
- To get the full performance benefit from using regular expressions, you'll probably want to use a string literal in place of the code that assembles the expression with a
StringBuilder
.
History
2004-04-09 – Released to Code Project.