|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionThe following article presents a generic solution that mimics SQL's BackgroundI am currently working on a Web project designed to help a large organization keep track of membership, publish fuzzy-searchable online directories, print mailing labels, manage online music clips for student auditions, and plan and schedule events. The back-end database used to store our data consists of approximately 40 tables. Since we will eventually be handing this project over to a possibly less-experienced programmer when we are finished, one of our goals is to keep data access simple and minimize needless programmer fiddling at the SQL/table level. With this in mind, we decided to take a multi-tiered approach to data access. Skip ahead several months, and a fairly elegant data abstraction layer (DAL) has been built and is working nicely, serving as the go-between for database tables and in-memory business objects. One of the many bonuses of having a data abstraction layer is the ability to cache query results in-memory, cutting down significantly on the number of database round-trips and general database load. However, for obvious reasons, lists (of people, locations, etc.) must be returned in different sort orders on different Web pages and for different users. For example, one user may want to sort by Last Name, First Name, while another user wants to sort by ID. These sort requirements could easily counteract any benefit of using cached data if the programmer must re-issue new SQL queries with changed I wasn't thrilled with either of these options, so (neither wanting to rewrite the business objects nor to fiddle with existing class definitions), I decided to implement a generic solution that would mimic SQL's Before I continue, I would like to give due credit to Marc Gravell for his article on HyperPropertyDescriptors. Without his code, retrieval of Property values would be excruciatingly slow, and the reflection used in this article would not be a reasonable means to quickly sort large lists. So, thank you Marc! Using the CodeThe solution is wrapped in our company's namespace: Example 1: In-Place Sorting by Property Name// Sort a list of People by last name
List<People> rgPeople = RetrieveListOfAllPeople();
// Sort the list by the values in the "LastName" property of the "People" class
SortInPlace<People>(rgPeople, new SortPropOrFieldAndDirection("LastName"));
Example 2: Sort Using the Values of Multiple Properties// Sort a list of people by Birth date (descending), then SSN (ascending)
// Note: the sort will automatically compare dates by examining
// the "BirthDate" property type
List<SortPropOrFieldAndDirection> sortCriteria =
new List<SortPropOrFieldAndDirection>();
// Set up the sort criteria
sortCriteria.Add(new SortPropOrFieldAndDirection("BirthDate", true));
sortCriteria.Add(new SortPropOrFieldAndDirection("SSN", false));
// Perform the multi-property sort
List<People> sortedPeopleList = Sorting.MultiSort<People>(rgPeople, sortCriteria);
Example 3: Force a Numeric Sort on a String Property (Manual Sort Type Override)// Force a string field that contains numbers (AuditionRanking)
// to sort using number-comparison criteria
// (Example: 10 comes after 9 using numeric comparison,
// but "10" comes before "9" using string comparison)
SortInPlace<People>(rgPeople, new SortPropOrFieldAndDirection
("AuditionRanking", SortType.eInteger));
How Does it Work?One easy way to sort a list of objects of type The To obtain an appropriate /// <summary>
/// Retrieves an IComparer of type T, depending on whether the instance
/// of this class (or a derived class) references a property or field
/// </summary>
public IComparer<T> GetComparer<T>() where T : class
{
if (NameIsPropertyName)
{
CompareProperties<T> comp = new CompareProperties<T>
(sPropertyOrFieldName, fSortDescending, sortType);
comp.pi = pi;
comp.Initialize();
return comp;
}
else
{
CompareFields<T> comp = new CompareFields<T>
(sPropertyOrFieldName, fSortDescending, sortType);
comp.fi = fi;
comp.Initialize();
return comp;
}
}
Note:
/// <summary>
/// Sets up cached PropertyInfo and determines the best delegate to use
/// to compare values retrieved from that property.
/// </summary>
public void Initialize()
{
if (fFoundProperty == false)
{
// Only look up the Property TypeDescriptor once (the first time)
fFoundProperty = true;
if (pi == null)
{
// Find the TypeDescriptor for the property (and confirm it exists)
// Use this format to take advantage of HyperPropertyDescriptors
// (Marc Gravell)
PropertyDescriptorCollection props = TypeDescriptor.GetProperties(typeof(T));
property = props[sPropertyName];
pi = typeof(T).GetProperty(sPropertyName);
if (pi == null)
{
throw new Exception("Property name " + sPropertyName +
" not found while trying to compare objects of type " +
typeof(T).Name);
}
}
typ = pi.PropertyType;
// Set up the property comparison delegate to use based on the type of
// values we will be comparing
if (sortType == SortType.eUsePropertyOrFieldType)
{
// If we are using the reflected property type, we know that
// no type conversion is needed prior to comparison.
// Use fast direct-cast comparison routines
sortType = Sorting.GetSortTypeEnumForType(typ);
if (typ == typeof(string))
{
if (stringComparisonToUse == StringComparison.Ordinal)
DoCompare = StringCompareOrdinal;
else
DoCompare = StringCompare;
}
else if (typ == typeof(int) && !fSortDescending) DoCompare = CompareInt;
else if (typ == typeof(int)) DoCompare = CompareIntDesc;
else if (typ == typeof(DateTime)) DoCompare = CompareDates;
else if (typ == typeof(long)) DoCompare = CompareTypeSensitive<long>;
else if (typ == typeof(double)) DoCompare = CompareTypeSensitive<double>;
else if (typ == typeof(float)) DoCompare = CompareTypeSensitive<float>;
else if (typ == typeof(short)) DoCompare = CompareTypeSensitive<short>;
else if (typ == typeof(byte)) DoCompare = CompareTypeSensitive<byte>>;
else if (typ == typeof(bool)) DoCompare = CompareTypeSensitive<bool>>;
else DoCompare = CompareUsingToString;
}
else
{
// We are comparing using a user-specified type.
// A type conversion may be necessary
// and we may run into invalid cast exceptions -
// use more robust comparison routines
if (sortType == SortType.eString) DoCompare = CompareUsingToString;
else if (sortType == SortType.eByte) DoCompare = CompareUsingToByte;
else if (sortType == SortType.eDateTime) DoCompare = CompareUsingToDate;
else if (sortType == SortType.eInteger) DoCompare = CompareUsingToInt;
else if (sortType == SortType.eLong) DoCompare = CompareUsingToInt64;
else if (sortType ==
SortType.eDoubleOrFloat) DoCompare = CompareUsingToDouble;
else DoCompare = CompareUsingToString;
}
}
}
Each property comparison delegate is fairly compact and contains code similar to the following (the below code compares two integers): public int CompareInt(T x, T y)
{
int oX = (int)property.GetValue(x);
int oY = (int)property.GetValue(y);
return (oX < oY) ? -1 : ((oX == oY) ? 0 : 1);
}
Summary: Sorting Using a Single Field/PropertyStep #1: Creation of SortPropertyAndDirection sortBy = new SortPropertyAndDirection("AssignedTo");
Step #2: After we have specified criteria by which to sort and wish to do an in-place, single-property sort, simply call the Sorting.SortInPlace<WorkItem>(AllWorkItems, sortBy);
Under the covers, the
These steps are executed by the following code: // Retrieve an IComparer that contains logic for sorting
// this specific business object
// type by the specified criteria
IComparer<T> compare = sortBy.GetComparer<T>();
// lock the list for thread safety and then call
// the .NET Sort(IComparer<T>) method
lock (ListToBeSorted)
{
ListToBeSorted.Sort(compare);
}
"ORDER-BY"-like Sorting (Multi-Property Sorting)In brief, multi-property sorting first sorts a list using the primary sort criterion, and then sorts each sorted sub-list using the secondary criterion, and then sorts each sorted sub-sub-list using the ternary criterion, etc, etc, etc. Consider the following code: // Sort a list of people by Birth date (descending), then SSN (ascending)
// Note: the sort will automatically compare dates by examining the
// "BirthDate" property type
List<SortPropOrFieldAndDirection> sortCriteria = new List<SortPropOrFieldAndDirection>();
// Set up the sort criteria
sortCriteria.Add(new SortPropOrFieldAndDirection("BirthDate", true));
sortCriteria.Add(new SortPropOrFieldAndDirection("SSN", false));
// Perform the multi-property sort
List<People> sortedPeopleList = Sorting.MultiSort<People>(rgPeople, sortCriteria);
The above code is similar to adding an We've already gone over the
This is not the most memory-efficient solution, nor is the process of breaking a list into sublists as optimized as it could be. However, adding these optimizations significantly increases the complexity of the code while offering only minimal performance gain under normal use conditions (sorting lists of 20,000-50,000 objects on 2-3 properties). Simply put, the code is pretty darn speedy as it is. Without further ado, the code for multi-sort is as follows: /// <Summary>
/// Sorts a given list using multiple sort (comparison) criteria, similar
/// to an SQL "ORDER BY" clause
/// Example: Specifying the sort/comparison criteria
/// (1) "State" and (2) "Zipcode" using a list of
/// address entries will result in a sorted list containing
/// address entries FIRST sorted by state
/// (starting with the A* states - Alabama, Alaska, etc).
/// Then, the address entries
/// associated with each respective state will be further sorted
/// according to Zipcode.
/// </Summary>
/// <Typeparam name="T">The type of elements in the list to sort<Typeparam>
/// <param name="ListToBeSorted">The original list.
/// A new, sorted list will be returned.</param>
/// <param name="rgSortBy">A list of ordered criteria specifying
/// how the list should be sorted.</param>
/// <Returns>A new list containing all elements of ListToBeSorted,
/// sorted by the criteria specified in rgSortBy.
/// </Returns>
public static List<T> MultiSort<T>(List<T> ListToBeSorted,
List<SortPropOrFieldAndDirection> rgSortBy) where T : class
{
List<T> results;
// For thread safety, make a copy of the list up front.
// Note that you still need to
// lock the same instance of the list when/if it is modified
// by other threads.
lock (ListToBeSorted)
{
results = new List<T>(ListToBeSorted);
}
if (rgSortBy.Count == 1)
{
// if only sorting a single time,
// just call the basic in-place sorting on our copied "results" list
SortInPlace<T>(results, rgSortBy[0]);
return results;
}
try
{
List<List<T>> rgCopies = new List<List<T>>(1);
rgCopies.Add(results);
int sortByCount = rgSortBy.Count;
// For each criterion in the list of comparison criteria,
// one or more lists must be sorted.
// Each time a list is sorted, one or more sublists may be created.
// Each sublist contains items that were deemed
// to be "equivalent" according to the comparison criterion.
// Example: After sorting addresses entries by state
// you may have multiple sublists, each containing all of
// the address entries associated with a given state.
// Note: this is not the most efficient method
// (especially in terms of memory!), but it
// is sufficient in most scenarios and is easier to understand
// than many other methods of sorting a list using multiple criteria.
for (int i = 0; i < sortByCount; i++)
{
SortPropOrFieldAndDirection sortBy = rgSortBy[i];
if (string.IsNullOrEmpty(sortBy.sPropertyOrFieldName))
throw new Exception("MultiSort parameter rgSortBy was passed
an empty field name in rgSortBy [" + i.ToString() + "]");
// Retrieve an IComparer that contains logic for sorting
// this specific business object
// type by the specified criteria
IComparer<T> compare = sortBy.GetComparer<T>();
// Sort each sublist using the created IComparer<T>
foreach (List<T> lst in rgCopies) { lst.Sort(compare); }
if (i < sortByCount - 1)
{
// Create new sublists by searching for the
// sorted-by value boundaries/changes
// Our "best guess" (i.e. shot in the dark)
// is that we will create at least 8 sublists
// from the original list. NOT terribly efficient,
// but often sufficient.
// Some advanced methods involve tracking
// duplicate values DURING the sort itself
List<List<T>> rgNewCopies = new List<List<T>>(rgCopies.Count * 8);
for (int n = 0; n < rgCopies.Count; n++)
{
List<T> rgList = rgCopies[n];
// Be conservative and set the initial sublist capacity
// to a small number, but still honor the original
// list's item count. (Example: If you are sorting a list
// of "Address information" by Zipcode and the
// list has 32,000 entries, then initialize
// each sublist (each of which store all Address
// information entries with the same Zipcode)
// with a capacity of 1000. 32,000 / 32 = 1000
List<T> rgSublist = new List<T>(rgList.Count / 32);
// Compare items to the item that preceeded it
// to determine where the "value boundaries"
// are located. If you will be sorting frequently and
// do not have CPU cycles to burn :),
// a smarter boundary-finding algorithm should be used.
// (e.g. determine boundary locations
// when comparing elements during the sort routine).
// Another alternative is to take advantage of the fact
// that the list is sorted and to
// use a O(LogN) binary search rather than the
// (currently) linear O(N) search.
for (int j = 0; j < rgList.Count; j++)
{
T item = rgList[j];
if (j > 0)
{
// Compare the item to the preceding item
// using the same comparison criterion
// used during the sort
T itemprev = rgList[j - 1];
if (compare.Compare(item, itemprev) == 0)
{
// The item had the same property or
// field value as the preceding item.
// Add it on to the same sublist.
rgSublist.Add(item);
}
else
{
// The item did NOT have the same property
// or field value as the preceding item.
// "Close up" the previous sublist and
// start a new one.
rgNewCopies.Add(rgSublist);
rgSublist = new List<T>(rgList.Count / 32);
rgSublist.Add(item);
}
}
else
{
// The first item has no predecessor -
// just add the item to the first sublist
rgSublist.Add(item);
}
} // END: for (int j = 0; j < rgList.Count; j++) ...
// Add the last created sublist to our
// "master list of sublists" :P
// It may be that this list has 0 elements in some cases,
// but this is not a problem
rgNewCopies.Add(rgSublist);
} // END: for (int n = 0; n < rgCopies.Count; n++) ...
// Move to the next "level" of sublists in
// preparation for further sorting using the next
// sort/comparison criterion
rgCopies = rgNewCopies;
}
} // END: for (int i = 0; i < sortByCount; i++) ...
// reconstruct all resorted sub-sub-sub-sub-sublists into a single,
// final (flat) results list
results.Clear();
foreach (List<T> rgList in rgCopies) { results.AddRange(rgList); }
return results;
}
catch (Exception ex)
{
throw new Exception
("Exception in MultiSort while sorting a list of " + typeof(T).Name, ex);
}
}
Isn't it Slow Using Reflection?Using regular reflection ( public int CompareInt(T x, T y)
{
int oX = (int)property.GetValue(x);
int oY = (int)property.GetValue(y);
return (oX < oY) ? -1 : ((oX == oY) ? 0 : 1);
}
This code may be called thousands or hundreds of thousands of times to compare property values of two objects. To alleviate the performance bottleneck, I am using Marc Gravell's The graph below shows time taken to sort lists of various length using either value-retrieval by Field (
As expected, sorting on a single field or property is faster than sorting by multiple fields or properties. The above graph also shows that retrieving property values via Marc's The attached code also contains a simple demo project. The demo project generates from 64 to 100,000 fictional "Work Items" with dummy property values. To retrieve sort timing information, sort by clicking on the column headers or by using the multi-sort combo boxes on the right. Number of milliseconds elapsed to sort the item list is displayed in the lower right corner. You can also choose (via radio button) to sort by retrieving Field values or Property values - useful if you're interested in relative performance. Points of InterestThis solution was fairly easy to design and implement, but it took some time with the profiler before I realized just how slow property access was when retrieving values via History08/28/07 - Per request, some additions to the demo project:
Any comments/suggestions are welcome - I'd be happy to enhance and publish future versions to CodeProject.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||