Click here to Skip to main content
15,881,882 members
Articles / Programming Languages / C#
Tip/Trick

Sorting Multiple Word Strings by Keyword

Rate me:
Please Sign up or sign in to vote.
4.57/5 (3 votes)
12 Apr 2015CPOL5 min read 16.6K   6   2
Sorting multiple word strings by keyword

Introduction

This tip will describe a method for sorting multiple word strings by keyword. Sometimes, the sorting objective is not to sort alphabetically or numerically, but by some keyword. For example, the list below is a partial list of sensors from a crash test dummy containing multiple “words” to sort by.

Femur R Fx
Femur R Fy
Femur L Mx
Femur L My
Femur L Fz
Femur L Fx
Femur L Fy
Upper Neck Fy
Upper Neck Fx

For the following discussion, a word is a sub-string surrounded by white space or delimiting characters, R and L refer to right and left, M and F are moment and force sensors, and x, y, and z refer to the direction, or axis, of motion for the sensor. Assume the objective is to have these sorted with axes in order x, y, z, and the force sensors listed first and then moment sensors, and by left sensors, then right, and all these listed anatomically, head to foot. The desired sort of the list above would be:

Upper Neck Fx
Upper Neck Fy
Femur L Fx
Femur L Fy
Femur L Fz
Femur L Mx
Femur L My
Femur R Fx
Femur R Fy

The problem requires a series of sorting steps, each based on a different keyword in the string.

Using the Code

The code, written in C#, is shown in the function below. It’s written for string arrays, but could easily be changed to a List<string> type. The function can be called more than once passing different arrays of keywords.

C#
//--------------------------------------------------------------------------
// Returns string array sorted by passed SortArr keywords.
//--------------------------------------------------------------------------
private string[] SortByArr(string[] inArr, string[] SortArr) {
    string[] outarr = new string[inArr.Length];

    int i = 0; // Index of output array
    foreach (string sa in SortArr) { // Sort by SortArr list
        for (int j = 0; j < inArr.Length; j++) {
            if (inArr[j].ToUpper().Contains(sa)) {
                outarr[i++] = inArr[j]; // Add to out list
                inArr[j] = "";          // Flag - remove from in-list
            }
        }
    }
    // i at next index

    foreach (string s in inArr) {   // Pickup leftovers
        if (s != "") outarr[i++] = s;
    }

    return outarr;
}

The function takes the string[] array inArr (the unsorted list), and a list of keywords in string[] SortArr, and outputs the list sorted. The keywords used in SortArr are listed in the desired sort order. For the example list of strings above, the function could be called first to list by sensor axis, passing { “FX”, “FY”, “FZ”, “MX”, “MY”, “MZ”} as the SortArr list of keywords. This will sort the list so that sensors will be listed in order: Fx, Fy, Fz, Mx, My, Mz. It could be called again with the desired left-right sort keywords: {“LEFT”, “RIGHT”, “L”, “R”} to make strings containing a “left” come before strings containing “right”.

Since the original string list may have variations in the wording, this can cause misses in matching. This can be remedied by passing extra keywords to match. For example, adding “LEFT” to the list of keywords matches if “LEFT” was used in the string instead of “L”, and doesn’t cause a problem if no match is found (except adding some additional search time). The input strings are matched in upper case as well for the same reason (not case-sensitive).

As can be seen in the code, the outer loop goes through each keyword in SortArr, while the inner loop checks if the current keyword is found in any of the unsorted input strings. Matching is done using the .Contains() method here, but each string could also have been split into separate words using .Split(), with comparison done by direct (boolean “==”) word matches in an additional inner loop.

When a match is found, the string is saved to a new string array in the order in which it was found. The input strings in inArr that are matched are changed to “” (the empty string), to effectively flag it for removal from further searching (with a List<string> type, this action could be done with the .Remove() method). The last foreach loop takes up any leftover strings that weren’t matched and just adds them to the end of the new string array. The function then returns the new string array sorted.

As an example, the function below calls SortByArr several times with various sort keywords to get the desired sorted string array listed head to tibia, upper before lower, left before right, by force axis, then by moment axis, and ordered x, y, and z. Each subsequent call passes, as input, the output from the previous call, so that sorting proceeds.

C#
      //--------------------------------------------------------------------------
      // Returns the passed string array anatomically sorted (head first...).
      //--------------------------------------------------------------------------
      public string[] AnatomicSort(string[] inArr) {
          CStr cs = new CStr(); // Inst. class

          string[] outArr = new string[inArr.Length];
          outArr = cs.SortByArr(inArr, new string[] { "ACX", "AX", "FX",
                                         "ACY", "AY", "FY",
                                                         "ACZ", "AZ", "FZ",
"MX", "MY", "MZ" });
          outArr = cs.SortByArr(outArr, new string[] { "LEFT", "LE", "L" });
          outArr = cs.SortByArr(outArr, new string[] { "UPPER", "UP" });
          outArr = cs.SortByArr(outArr, new string[] {
                      "HEAD", "NECK", "CHST", "CHEST", "THORAX", "LUSP", "LUMBAR",
                      "PELV", "PELVIS", "FEMR", "FEMUR", "TIBI", "TIBIA"});

          return outArr;
      }

Note again that adding extra sort keywords like “CHST” or “THORAX”, as shown above, will catch strings that used those words instead of “CHEST” and sort them correctly, although this does add some more processing time. Note also that the last sort keyword may sometimes be omitted as shown in the code. For example, if sorting by “left” and “right”, “right” may be omitted, since once sorted by “left”, the remaining strings can only be “right”.

Points of Interest

The disadvantage to this sort method is that you need to know what the keywords are in advance of calling the function, but sometimes this may be remedied by searching the list for keywords. For instance, suppose it’s desired to sort by the crash test dummy’s identification (ID) code. Although the IDs could be written in various ways and may not be exactly known in advance, an assumption could be made, for example, that they usually appear as the first word in the string and begin with the character ‘H’. Code could be written to go through the list to find these ID codes, saving only the unique ones in a keyword list, and then passing this list to the sort function as keywords.

It’s interesting to note that when sorting multiple times with different keyword lists, the order of sorting is very important. For example, if the “HEAD… TIBIA” sort was done first, any of the subsequent sorts would disorder the list again. At first, it didn’t seem possible that the list could be sorted by this method as required, but the correct sort order happened to be found. In fact, the sort order shown above appears to be the unique solution (for the type of order required by the user).

In general, the order of a sort should be done from least to greatest “precedence”. For example, doing the calls to sort by “UPPER/LOWER” first and then “LEFT/RIGHT” (switched from above) places higher precedence on listing by “LEFT/RIGHT” so that it lists left-uppers, then left-lowers, then right-uppers, then right-lowers. However, the desired order of left-uppers, then right-uppers, then left-lowers, then right-lowers, required the “UPPER/LOWER” sort to be done first. Some experimentation may be required in practical usage. It's unknown at this time whether there is a branch of order theory which treats this particular sort order topic, but further discussion would be welcome.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionA suggestion Pin
George Swan12-Apr-15 20:29
mveGeorge Swan12-Apr-15 20:29 
AnswerRe: A suggestion Pin
Darryl Bryk13-Apr-15 6:20
Darryl Bryk13-Apr-15 6:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.