65.9K
CodeProject is changing. Read more.
Home

Sort titles using a TitleComparer

starIconstarIconstarIconstarIconemptyStarIcon

4.00/5 (7 votes)

Apr 9, 2004

4 min read

viewsIcon

35813

downloadIcon

595

An implementation of the IComparer interface that allows strings to be sorted as Titles.

Sample Image - TitleComparer.png

Introduction

At some point in time, we all run into a situation where the order in which our human readers expect to see information displayed is not what we get from the built-in string sorting capabilities of SQL or Visual Studio. When listing street addresses, an end user probably expects "20 Main St." to appear after "9 Main St.", and not between "1 Main St." and "3 Main St.". Another situation in which basic string sorting produces less-than-desirable results is in the display of titles. Most end users want/expect to see "The Evil Dead" listed between "Event Horizon" and "Evil Dead II", rather than at the bottom of the list between "Swamp Thing" and "They Live".

Full-blown indexing systems often rely on sophisticated language parsers or on some form of coding to work around this issue (for example, using "<The >Bells of {Saint }<St. >Mary's" to display "The Bells of St. Mary's" and sort by "Bells of Saint Mary's"). Unless indexing is a core feature of your project, however, that type of solution is probably overkill (not to mention over budget!).

The traditional "simple" solution to this problem is to manipulate the Titles at the time they are entered into the system. One often-used approach is to modify the title, either by placing the problem words at the end of the title or by dropping them all together (e.g. "Beautiful Mind, A" or just "Beautiful Mind"). Another approach is to use multiple fields to store the title, either splitting it into First and Last parts or storing both a Display value and a Sort value.

This article presents a third option: using the .NET Framework's IComparer interface to create a custom TitleComparer that will sort strings according to the following rules:

  • Articles such as "a", "an" and "the" will be ignored wherever they occur.
  • Other words such as "of" will be ignored only when they are not the first word of the title.
  • Punctuation will be replaced according to NISO rules.
  • Numbers will be sorted by their numeric value.
  • Titles will be compared word by word. If the end of one of the words is reached before a difference has been found, the longer word is considered "greater"; similarly, if the end of one of the titles is reached before a difference is found, the longer title is considered "greater".

Using the code

Using the code "as is" is really pretty straight-forward. First, include TitleComparer.vb in your project (or include it in a class library and reference the library instead). To compare one string to another using the title rules:

Dim oComparer As New TitleComparer()
Dim sMsg As String

Select Case oComparer.Compare(sTitle1, sTitle2)
    Case 0
        sMsg = "The two strings sort the same"
    Case Is > 0
        sMsg = String.Format("'{0}' is greater than '{1}'", sTitle1, sTitle2)
    Case Is < 0
        sMsg = String.Format("'{0}' is greater than '{1}'", sTitle2, sTitle1)
End Select

MsgBox(sMsg, MsgBoxStyle.Information)

To sort an ArrayList of Strings:

Dim oList as New ArrayList()

PopulateArrayList(oList)

oList.Sort(New TitleComparer())

ArrayLists aren't the ideal collection to use because they can contain any type of object, not just strings; unfortunately, the Specialized.StringCollection won't work for us either in this case, because it doesn't expose a Sort method. Although it isn't used in the demo, the demo project does contain a simple StringList class that you can use as a sortable, type-safe collection in place of the ArrayList in the above example.

Points of Interest

To handle the requirement of ignoring certain words wherever they occur, and others only if they're not at the start of the title, the TitleComparer class contains two private Shared properties: IngoredWords and IgnoredNonInitialWords. Being shared properties backed by shared StringCollections means that the lists are only initialized once, even though there may be several instances of TitleComparer.

The punctuation requirement is implemented using the shared Replace method of the RegEx class. For clarity in the demo, I constructed the regular expression using a StringBuilder. If you have not worked with regular expressions before, this should be a rather simple introduction to one of their uses.

The demo project contains examples of other techniques, not directly related to implementing the TitleComparer. The Movie.vb file demonstrates creating both a type-safe collection and a custom comparer to operate on a specific type (class). The main WinForm contains a thread-safe shared declaration of Sub Main to provide the application with a Windows XP-style UI when running on that operating system.

Enhancement Ideas

The following list presents some ideas about how you might want to enhance/alter the provided code to meet the needs of your project:

  • The most obvious adjustment you could make is to add or remove words from the IngoredWords and IgnoredNonInitialWords lists. Or, if you don't require its functionality, you can remove the IgnoredNonInitialWords list entirely.
  • As noted in the code, the two lists of "Ignore" words are currently hard-coded. Placing them in an XML file or database would allow you to make adjustments in the field, without having to recompile the application.
  • The code currently sorts Roman numerals alphabetically, so nine (IX) will appear before five (V). If you add code to process Roman numerals numerically, be aware that the word mix is also a valid roman numeral (1009).
  • To get the full performance benefit from using regular expressions, you'll probably want to use a string literal in place of the code that assembles the expression with a StringBuilder.

History

2004-04-09 – Released to Code Project.