Click here to Skip to main content
Click here to Skip to main content

An Explanation of My High Capacity List Alphabetizing Computer Program Algorithm

By , 10 Sep 2012
Rate this:
Please Sign up or sign in to vote.

Introduction

Have you ever wondered about the software programs that alphabetize large lists of items? We take them for granted to do our everyday tasks on the computer, but what exactly makes them function? Popular database and many other software packages have devised their own algorithms for taking care of this job. I have developed my own approach for handling this important task. I will present a detailed explanation here of how it works.

An Overview of my Problem

In 1996 I was working on an inventory system for a customer here in my hometown of Cleveland, Ohio USA. This software package I was making in procedural C programming language had to alphabetize a large number of items…about 8,000 to 10,000. The sorting program I had at the time was something I created in the early 1990s and could only sort up to 1,500 items. It was taken from an actual program that alphabetized a customer list. This customer list was a binary fixed record width format (SDF) text file with each record equal to 140 characters in length. This low capacity procedural C alphabetizing code is listed on my website.

Back in the mid nineteen nineties, most IBM PC based computers were running Intel 486, Intel Pentium, AMD K-5 and other processors that were comparable in terms of speed and power. However, their capability and that of the hard drives at the time seemed like they had to struggle to handle a high capacity alphabetizing task like the one my application required.

I had to start with the basic programming idea behind the low capacity procedural C sorting code and somehow expand it so it could process much larger data files of list items. If I tried to design the new sorting code so it did most of the work on the mechanical disk, that would create a whole new problem even though the alphabetizing algorithm itself may work fine. Trying to sort a large amount of data on a disk drive would create a very big reduction in speed because of slowness of the mechanical moving parts of the hard drive. The customer would certainly object to the lower speed of operation and I would be sent back to the drawing board to start over.

Performing the sorting on the hard disk was obviously a road to nowhere with a data sort operation of this size. The only other option I could think of was to do the bulk of the work in the memory. By concentrating the data manipulation in solid state electronic memory, I could escape the slower world of the mechanical disk drive and pick up much more speed. This was especially important at the time because of the less powerful processors as mentioned above. Another compelling reason for shifting the work into memory was that doing much of the work on a disk that could potentially have any number of sector errors on it could create catastrophic problems. This would likely throw a wrench into the sorting process and create a corrupted data file at the end. Of course this is also possible with concentrating the work in memory, but it’s less likely to occur.

Moving Forward

I will begin discussing the "nuts and bolts" of how my algorithm works shortly. This new and improved alphabetizing code for high capacity sorting jobs was later adapted to C++ for Windows and I have included pieces of the C++ code along with diagrams to help illustrate the logic flow. Please note that some of the C++ variables are referred to as non-persistent variables while the "top" and "bott" variables are called persistent variables. This is because non-persistent variables are completely reset to new values during the processing while persistent ones are incremented or decremented at various times, but never reset. Also, you will notice I refer to various data structures I use such as "grid", "name", and "stor" as conventional data structures. They are allocated within the boundaries of the 64K data segment as prescribed by the small memory model I used in the programming readout this article links to. This is to differentiate them from the far memory data structures. This algorithm was performed on binary fixed record width format (SDF) text files. I use these in my application development because they are easy to work with. The algorithm can easily be adjusted to work with variable width delimited (CSV) text files also.

The Main Objective: Larger Sorting Capacity

Now that I had decided to focus most of the processing in the solid state electronic memory, I had to come up with a way to do this so it could allocate the capacity for large data files of list items. In the Borland procedural C programming platform I was using, there were 6 memory models to choose from: tiny, small, medium, compact, large and huge. I always used the small memory model since it was the default and I just became accustomed to dealing with it since I started with C programming in 1990. In the small memory model, the code and data each have their own 64K memory segments. In order to manipulate large lists of items, I would need a much larger space of memory than a 64K data segment that also had to hold a variety of other data structures.

I decided to use the far side of the heap, or what is known as "far memory". To set this up, I first included a necessary C header file for allocating far memory:

// alloc.h is needed for far memory data structures.
#include <alloc.h>

Then I declared 3 far memory pointers like this at the top of the sorting module:

// declare far memory pointers.
unsigned long int far *s;
unsigned long int far *s1;
unsigned long int far *s2;

I allocated them like this to handle up to 16,000 list items before the sorting code commenced operation:

// allocate far memory.
s = ( unsigned long int far * ) farmalloc(65000L);
s1 = ( unsigned long int far * ) farmalloc(65000L);
s2 = ( unsigned long int far * ) farmalloc(65000L);

The reason I set up 3 far memory data structures is because all of them are needed to manipulate the data with the new sorting algorithm I created. This gave me the space to manipulate up to 16,000 list items. I could have allocated for a larger number of data records, but this was more than enough to do the job at hand.

Assigning a Numerical Weight to Each Item in the Data File

The processing starts with applying a mathematical formula to the first four characters of each list element in the binary fixed record width format (SDF) text file. Consider the following numerical succession of powers of "10":

10,000,000    1,000,000     100,000     10,000     1,000     100     10     1

Next remove the following powers of "10" in the above numerical succession:

1,000,000
10,000
100
10

This is what is left with these powers of "10" in the updated numerical succession:

10,000,000     100,000     1,000     1

The ASCII codes of each character in a given list element can range from 32 to 126. Each of these ASCII codes has been "mapped" to numerical values ranging from 0 to 94. The numerical values for each of the first four characters starting from the beginning in a given list element will each be multiplied by the updated numerical succession in a left to right fashion.

This is the math formula I use in the programming to assign numerical weights to each list element:

(10,000,000 X numerical value of character 1) + 
(100,000 X numerical value of character 2) + 
(1,000 X numerical value of character 3) + 
(1 X numerical value of character 4)

is equal to the numerical weight for this list element in the data file. Consider the following list element example:

"SMITHSON"

"S" = Character 1
"M" = Character 2
"I" = Character 3
"T" = Character 4
"H" = Character 5
"S" = Character 6
"O" = Character 7
"N" = Character 8

ASCII code for Character 1: S  = 83 which corresponds to numerical value 51 per the algorithm.
ASCII code for Character 2: M = 77 which corresponds to numerical value 45 per the algorithm.
ASCII code for Character 3: I   = 73 which corresponds to numerical value 41 per the algorithm.
ASCII code for Character 4: T  = 84 which corresponds to numerical value 52 per the algorithm.

Now let’s plug in the numerical values from this example to the math formula to yield the numerical weight for the above list element as it is used in my algorithm:

(10,000,000 X 51) + (100,000 X 45) + (1,000 X 41) + (1 X 52)  = 514,541,052

This math formula is something I came up with that I believed would be a good way to assign a numerical weight to each list element. Here is a partial piece of the code that performs this task in the program:

// assign a numerical value to variable ‘var1’ based on compared ascii code for char.
if(kh<=32) var1=0;
if(kh==33) var1=1;
if(kh==34) var1=2;
if(kh==35) var1=3;
if(kh==36) var1=4;
if(kh==37) var1=5;
if(kh==38) var1=6;
.
.
.
.
.
if(kh==119) var1=87;
if(kh==120) var1=88;
if(kh==121) var1=89;
if(kh==122) var1=90;
if(kh==123) var1=91;
if(kh==124) var1=92;
if(kh==125) var1=93;
if(kh==126) var1=94;

    // multiply the numeric variable "var1" for first character of element by 10,000,000 and
    // add to total for this element in far memory data structure "s".
    if(d==0) *(s+aa) = *(s+aa) + (var1 * 10000000);

    // multiply the numeric variable "var1" for second character of element by 100,000 and
    // add to total for this element in far memory data structure "s".
    if(d==1) *(s+aa) = *(s+aa) + (var1 * 100000);

    // multiply the numeric variable "var1" for third character of element by 1,000 and add
    // to total for this element in far memory data structure "s".
    if(d==2) *(s+aa) = *(s+aa) + (var1 * 1000); 

    // multiply the numeric variable "var1" for fourth character of element by 1 and add to
    // total for this element in far memory data structure "s".
    if(d==3) *(s+aa) = *(s+aa) + (var1 * 1);

        // ( first element character numerical value X 10,000,000 ) + 
        // ( second element character numerical value X 100,000 ) + 
        // ( third element character numerical value X 1,000 ) + 
        // ( fourth element character numerical value X 1 ) = numerical 
        // weighted value for the list element. this accumulated numerical value is stored in far
        // memory data structure "s" for the corresponding list element's sequential position in
        // the data file.

// the first four characters of each element are subjected to the above math formula.
// they are given a numerical weighting which will later be used to segment the actual
// alphabetizing process into "top1" and "bott1" processing regions. in other words, the
// alphabetizing process is performed in a compartmentalized fashion.

// end do-while loop for generating numerical weights for the list elements.

d++; 

} while(d<4);


    // as we are moving from the top to the bottom of the data file, keep track of the
    // lowest primary and highest primary accumulated numerical values as they are
    // stored in far memory data structure "s". the numercial value extremes
    // (highest and lowest) are assigned to the primary high "up1" and primary low "low1"
    // numercial value variables, respectively.

    // if it is the first sequential record in the file, then assign numerical weights to 
    // primary high "up1" and primary low "low1" variables.
    if(aa==0) { 
    low1 = *(s+aa);                                     
    up1 = *(s+aa);                                   
    }
    // if not the first sequential record in the file, then do this:
    if( *(s+aa) < low1 ) low1 = *(s+aa); 

        if( *(s+aa) > up1 ) up1 = *(s+aa);

        // move to next record while not at the end of the data file. the constant record length
        // in this case is "RECORDLENGTH" and fixed field width (SDF format).                                                     
        aa++;
        max14 = max14 + RECORDLENGTH;

        } while(aa<g);

        // close the data file. end do-while loop for calculating highest and lowest numerical
        // weights ("low1" and "up1" variables) for the list elements.
        ifile.close();

The lowest and highest numerical weights are now known after we have applied this math formula to all the list elements in the data file. All numerical weights will be stored in the far memory data structure "s" in positions that correspond to their sequential positions in the non-alphabetized list (See Figure 1).

 

// if the lowest primary "low1" and highest primary "up1" variables are not equal
// (meaning there are records to be alphabetized), then proceed with processing.
if ( low1 != up1 ) { 

// initialize high "main" processing loop exit variable "qqq" to stay within the processing loop.
qqq=0;
// initialize low "main" processing loop exit variable "sss" to stay within the processing loop.
sss=0;
// begin "main" processing loop and redefine "top1" and "bott1" pairs of processing regions
// until all elements are sorted.
do { 


    // assign primary high numerical variable "up1" to secondary low numerical variable "low2".
    low2 = up1;
    // assign primary low numerical variable "low1" to secondary high numerical variable "up2".
    up2 = low1;

        // the loop that follows will set boundaries and numerical values for processing regions
        // "top1" and "bott1". each of these processing regions can handle up to 150 elements with
        // the same numerical weighting as calculated above on the first four characters of the
        // list element. i need to mention something here. there is a highly unlikely possibility
        // that the algorithm could malfunction if a specific numerical weight that has been assigned
        // to a "top1" or "bott1" processing region exceeds 150 elements. this is because it would
        // exceed the row depth of the 2 dimensional "grid" conventional data structure which
        // is heavily used for sorting and insertion operations in the "top1" and "bott1"
        // processing regions. i have used this algorithm in many of my programs and 
        // have never seen this happen. 

        // initialize list element loop counting variable "ii" to 0.
        ii=0;
        // initialize upper processing region "top1" non-persistent processing region.
        top1=0;
        // initialize lower processing region "bott1" non-persistent processing region.
        bott1=0;
        // initialize the start of upper processing region "start" non-persistent
        // list element marker with "top" persistent element marker.
        start=top;
        // initialize the start of lower processing region "finish" non-persistent
        // list element marker with "bott" persistent element marker.
        finish=bott;

            // <----- start of do-while loop.
            do {

            // if the numerically weighted value for the given list element in far memory data structure
            // "s" for index "ii" is equal to the lowest primary variable "low1" and the high "main"
            // processing loop exit variable "qqq" is set to stay within the "main" processing loop,
            if( *(s+ii) == low1 && qqq == 0) {  

            // then assign the sequential numerical file position value for the given list
            // element "ii" to far memory data structure "s1" in the array position denoted by
            // "top" persistent element marker.
            *(s1+top) = ii; 

            // next increment the "top" persistent list element marker and "top1" non-persistent list element
            // marker by 1 each.
            top++; 
            top1++;                              
            }

            // if the numerically weighted value for the given list element in far memory data structure
            // "s" for index "ii" is equal to the highest primary variable "up1" and the low "main"
            // processing loop exit variable "sss" is set to stay within the processing loop,
            if( *(s+ii) == up1 && sss == 0) { 

            // then assign the sequential numerical file position value for the given list
            // element "ii" to far memory data structure "s1" in the array position denoted by
            // "bott" persistent element marker.
            *(s1+bott) = ii; 

            // next decrement the "bott" persisten element marker and increment "bott1"
            // non-persistent list element marker by 1 each.
            bott--;  
            bott1++;                              
                                       	}

            // if the numerically weighted value for the given list element in far memory data 
            // structure "s" is greater than the lowest primary variable "low1" and is less than 
            // the lowest secondary variable "low2" then assign numerically weighted value 
            // for the given list element for index "ii" in far memory data structure "s" to the 
            // lowest secondary variable "low2". 
            if( (*(s+ii) > low1) && (*(s+ii) < low2) ) low2 = *(s+ii);                                                                
                                                                

            // if the numerically weighted value for the given list element in far memory data 
            // structure "s" is less than the highest primary variable "up1" and is greater than 
            // the highest secondary variable "up2" then assign numerically weighted value 
            // for the given list element for index "ii" in far memory data structure "s" to the 
            // highest secondary variable "up2". 
            if( (*(s+ii) < up1)  && (*(s+ii) > up2) ) up2 = *(s+ii);                                                                      
                                                
            // increment list element counting variable "ii" by 1. loop sequentially through the list until  
            // the list element counting variable is at the end. this looping routine will set
            // the records of the same numerically weighted value to be processed for alphabetizing
            // in processing region "top1" and the same for processing region "bott1". the "start"
            // non-persistent element marker is the beginning of the "top1" processing region and
            // "start+top1" is the end. the "finish" non-persistent element marker is the end
            // of the "bott1" processing region and "finish-bott1+1" is the beginning.

            // <----- end of do-while loop.
            Ii++;
            } while(ii<g);

// first, set the integer variable "r" equal to 0. if the secondary low variable "low2" and the
// secondary high variable "up2" are equal and the low "main" processing loop exit variable "sss"
// is set to stay within the "main" processing loop then set the low "main" processing loop exit
r=0;
if(low2 == up2 && sss == 0) { 
// variable "sss" to attempt to exit the "main" processing loop. also, set the integer variable "r" 
// equal to 1 so the very next logic structure will not be evaluated.
sss=1;
r=1;
}

    // if the integer variable "r" is still set to 0 then evaluate the next logic structure which is
    // described as follows: if the secondary low variable "low2" and the secondary high
    // variable "up2" are equal and the low "main" processing loop exit variable "sss" is set to attempt
    // to exit the "main" processing loop then set the high "main" processing loop exit variable "qqq"
    // to attempt to exit the "main" processing loop.
    if(low2 == up2 && r == 0 && sss == 1) qqq=1; 

        // if the secondary low numerically weighted variable "low2" is greater than the secondary
        // high numerically weighted variable "up2" then set the low "main" processing loop exit variable
        // "sss" and the high "main" processing loop exit variable "qqq" to both exit the "main"
        // processing loop which will cease  the redefinition of successive "top1" and "bott1" processing regions.
        if(low2 > up2) { 
        qqq=1;
        sss=1;
        }

            // next assign secondary low numerically weighted variable "low2" to primary low
            // numerically weighted variable "low1". next assign secondary high numerically
            // weighted variable "up2" to primary high numerically weighted variable "up1".
            low1 = low2;                			 
            up1 = up2;

In the above code segment, the first thing that occurs is to see whether or not the lowest and highest numerical weights are equal. This compares the lowest primary variable "low1" to the highest primary variable "up1". If they are equal, the start of processing will be aborted because all list elements will have the same numerical weight. This means the first four characters of all list elements are the same. This would be highly unusual because they would already be nearly alphabetized to begin with and the probability of ever encountering a list like this would be remote. In the end, the original data file to be alphabetized would be left intact and not be reconstructed at the end. If they are unequal, the lowest primary variable "low1" and the highest primary variable "up1" would represent two different sets of numerically weighted list elements and therefore processing would continue with the commencement of the "main" processing loop.

A Tale of Two Far Memory Processing Regions: "TOP1" and "BOTT1"

The C++ program cycles around a do-while loop which I call the "main" processing loop. I use two segments of far memory to facilitate the sorting process which I call the "top1" and "bott1" processing regions. The "top1" and "bott1" processing regions will be repeatedly redefined with each iteration through the "main" processing loop. This is the "segmented mechanism" which drives the sorting process.

Both of these processing regions actually begin as numerical variables. They later evolve into processing regions. First, they are both initialized to 0. Then the "top1" variable is incremented by 1 for each list element in the far memory data structure "s" that corresponds to the lowest primary variable "low1" (lowest current numerical weight). Next, the "bott1" variable is incremented by 1 for each list element in the far memory data structure "s" that corresponds to the highest primary variable "up1" (highest current numerical weight). This is done in the previous code snippet above. Also, the "main" processing loop exit variables "qqq" and "sss" cannot be set to exit the "main" processing loop while both processing regions need to be redefined to process unsorted list elements. In other words, "qqq" must be set to 0 for "top1" to include the lowest current numerical weight in its processing region that is being defined. And "sss" must be set to 0 for "bott1" to include the highest current numerical weight in its processing region which is also being defined.

One other thing to notice in the previous code snippet are 2 markers I use for the list elements, "start" and "finish". "start" is assigned the value in "top", and "finish" is assigned the value in "bott". "start" is a non-persistent element marker used to denote the list item count or depth of the "top1" processing region. "finish" is a non-persistent element marker used to denote the list item count or depth of the "bott1" processing region. Both "top" and "bott" are persistent element markers that are incremented along with "top1" and "bott1". (See Figures 7 and 8 to see a visual representation of the "top1" and "bott1" processing regions.)

After the redefinition process is completed, the "top1" processing region will encompass list elements corresponding to the lowest current numerical weight. The same is true for the "bott1" processing region, but with a numerical weight that corresponds to the highest current numerical weight. The algorithm will use both processing regions to facilitate the actual sorting process, the specifics of which I will not get into with this article. To view that, you can refer to the "improved alphabetizing code" hyperlink near the beginning of the article. After the sorting has been performed, the program will iterate around the "main" processing loop and proceed to redefine new pairs of "top1" and "bott1" processing regions. (See Figure 2).

Both processing regions will approach each other in spatial proximity as they move toward the center of the far memory data structure "s" from being redefined with each pass through the "main" processing loop. Each new "top1" processing region will have a higher numerical weight than its predecessor "top1" region. Each new "bott1" processing region will have a lower numerical weight than its predecessor "bott1" region. Please refer to figures 3, 4, 5, and 6 for a visual illustration of the progression of the algorithm as successive "top1" and "bott1" processing regions are redefined with each pass through the "main" processing loop.

Notice what happens in Figure 6 after the processing in successive "top1" and "bott1" processing regions reaches the middle of far memory in data structure "s". The "top1" processing region with the least lowest numerical weight is adjacent to the "bott1" processing region with the least highest numerical weight. The processing will cease at this point because there will be no more list elements left to sort. The "main" processing loop will then be exited and the new alphabetized array of list element positions stored in far memory data structure "s1" will be written to a new data file. (See Figures 9 and 10).

Here I want to talk about ways the "main" processing loop could be exited before the data is written back to a new sorted data file. As the processing draws to a close in the middle of the far memory data structure "s", it will not necessarily end with an even pair of final "top1" and "bott1" processing regions. It can also near completion with either of the "top1" or "bott1" processing regions having its "main" processing loop exit variable set to attempt to exit the "main" processing loop. To be more specific, the "top1" processing region could have its "main" loop exit variable "qqq" set to 1 which means there are no more "top1" regions to be redefined while the "bott1" processing region could have its "main" loop exit variable "sss" set to 0 meaning there is another "bott1" processing region to be redefined and alphabetized. The opposite of this can also occur.

An Analogy that May Help Clarify the Logic Flow

Knowing this narrative may be overwhelming for some readers, I would like to take a page from American history that may be helpful in creating a better understanding of how my algorithm works.

During the latter part of the 19th century, the United States turned its attention to nation building. Connecting the vast expanse of North America by way of a coast-to-coast railroad became a national priority. This was the start of America’s first Transcontinental Railroad.

Two railroad companies, the Union Pacific and the Central Pacific, spearheaded this ambitious and daunting task. The Central Pacific began building its railway eastward from Sacramento, California, while the Union Pacific began construction work heading westward from Omaha, Nebraska.

Both crews in the east and west worked relentlessly for seven years. On April 28, 1868 the Union Pacific’s construction gang of Chinese and Irish workers laid ten miles of railroad track in a single day as a result of a $10,000 bet. This astounding feat has never been repeated by a human railroad construction crew since.

Finally, on May 10, 1869 the construction was completed at Promontory Point in the Utah territory. The Union Pacific’s No. 119 engine and the Central Pacific’s No. 60 engine (aka the Jupiter) were drawn up face-to-face separated by the width of a single railroad tie. At the Golden Spike ceremony, three spikes were driven in to connect the two railways: one gold, one silver and one made of gold, silver and iron. Travel time between the east and west coasts was reduced from between four to six months to only six days by railroad!

Now, the progression of my algorithm is quite similar to the construction of America’s first Transcontinental Railroad when you take a moment to really think about it. The solid state electronic memory, or allocated far memory segment, is like a long stretch of terrain awaiting the arrival of "alphabetizing construction of list elements" so to speak. The "top1" and "bott1" processing regions are like "two construction gangs" that commence sorting work beginning at opposite ends of the far memory segment. They each work hard to alphabetize list elements of the same numerical weight as previously described. After they finish the sorting in their respective processing regions, the sorted results are "laid down as a sort of alphabetized railroad track length" in each of their "top1" and "bott1" processing regions, respectively. After the program loops around the "main" processing loop and new "top1" and "bott1" processing regions have been defined, the process repeats itself.

As the algorithm moves along, it begins to resemble two work crews gradually progressing towards a conclusion in the middle of far memory. In reality, it happens much more quickly in the world of solid state electronic memory, but I think you get the idea.

Finally, The "Golden Spike ceremony" happens in a metaphorical sense when the "top1" and "bott1" processing regions are adjacent to one another in far memory as I discussed earlier.

A Potential Problem and a Remedy

Here I would like to expand on a potential problem with the algorithm and a recommended solution that should take care of it. The 2 dimensional "grid" conventional data structure is used extensively to manipulate list elements in the "top1" and "bott1" processing regions. It is designed to hold up to 150 list elements of the same numerical weight. You need to be conscious about how much row depth you give the 2 dimensional "grid" conventional data structure so it and other conventional data structures taken together don’t breach the 64K data segment of the small memory model that is used. The problem arises if there are more than 150 elements in a "top1" or "bott1" processing region. The algorithm won’t abort or malfunction, but rather it will include only the first 150 list elements in a processing region and no more. I never really tried to address this potential snafu because it is highly unlikely to occur in the first place. There would have to be more than 150 "Smiths" or "Joneses" to trigger it. This could happen in an application like a voter registration verification list that could include a lot of the same last names.

A good way to correct this is to declare a fourth far memory data structure that is the same size as each of the first three. It would replace and perform the job of the 2 dimensional "grid" conventional data structure, yet it would always be large enough to hold all the list elements for a particular numerical weight. This is because it would be allocated to hold as many list elements as are contained in the entire data file.

Just Say "No" to Redundant, Speed Robbing Code

Many of you may be wondering by now about the speed of the algorithm. I tested it with a binary fixed record width format (SDF) text file containing 10,959 part numbers. On a Gateway Pentium 4 tower CPU using an old 6 GB Quantum Bigfoot hard drive the processing took a little over 3 seconds. When it was run on a Dell M5030 laptop with an AMD V160 Processor at 2.4 GHz it took about 1 second. There are some areas in the repetitive loop processing that could be redesigned or eliminated that should further increase the processing speed since less work is required to achieve the same result. After I finished this in 1996 it seemed to work in a reasonable amount of time so I didn’t go back and try to optimize it some more. Here I will elaborate with some selected areas in the code that could be improved to yield more processing speed.

// start do-while loop for generating numerical weights for the list elements. get a
// character from the current file pointer position and prepare for further processing.
// I also capitalized the character for some reason... it doesn't affect processing.
inn -> seekpos(max14, ios::in);
d=0;
do {                                                 		
kh=ifile.readByte();                        		
khchar[0]=kh;                                		
strupr(khchar);                               		
kh=khchar[0];
if(kh<=32) var1=0;
if(kh==33) var1=1;
if(kh==34) var1=2;
if(kh==35) var1=3;
if(kh==36) var1=4;
if(kh==37) var1=5;
if(kh==38) var1=6;
if(kh==39) var1=7;

This block of code that tests for ASCII characters 32 through 126 could be replaced with the C++ function atoi(). It would eliminate much of the repetitive conditional "if" logic structure comparisons and convert the character to an integer. This new integer value could then be used in the math formula that computes numerical weights for each list element in the processing. Here is another place for adding some speed:

// set do-while loop counter variable “ii” to the non-persistent "finish-bott1+1" list element
// marker denoting the beginning of the "bott1" processing region.
ii=finish - bott1 + 1;

// <----- start of do-while loop.
do {                                      			

    // retrieve and calculate file stream position offset of the list element for do-while loop
    // counter "ii" in the far memory data structure “s1”.
    xxx1 = *(s1+ii);                                       	
    xxx1 = xxx1 * RECORDLENGTH;       		 

        // use calculated file stream position offset "xxx1" to retrieve all characters of the list element
        // except the first four characters into the "name" conventional data structure. compare this to
        // the 2 dimensional "grid" conventional data structure set at lower index counter "lowx". if they
        // are equal and high processing loop “1” exit variable "qqqx" is set to stay within processing
        // loop "1" then assign the sequential list element position in far memory data structure "s1" for
        // do-while loop counter variable “ii” to far memory data structure "s2" for non-persistent index
        // marker "topx". next increment non-persistent index marker "topx" by 1.
        r=0;
        inn -> seekpos(xxx1+4, ios::in);                                         		
        for(v=0; v<FIELDLENGTH-4; v++) name[v]=ifile.readByte();      	
        strupr(name);                                                                                     	
                                                                                                             	 

        for(v=0; v<FIELDLENGTH-4; v++) {                                             	
        if(name[v]!=grid[v][lowx]) r=1;                                                        	
        }                                                                                                         	
                                                                                                             	
        if(r == 0 && qqqx == 0) {                                                                	
        *(s2+topx) = *(s1+ii);                                        
        topx++;
        }

    // retrieve and calculate file stream position offset of the list element for do-while loop
    // counter "ii" in the far memory data structure “s1”.
    xxx1 = *(s1+ii);
    xx1 = xxx1 * RECORDLENGTH;

        // use calculated file stream position offset "xxx1" to retrieve all characters of the list element
        // except the first four characters into the "name" conventional data structure. compare this to
        // the 2 dimensional "grid" conventional data structure set at higher index counter "upx". if they
        // are equal and low processing loop “1” exit variable "sssx" is set to stay within processing
        // loop "1" then assign the sequential list element position in far memory data structure "s1" for
        // do-while loop counter variable “ii” to far memory data structure "s2" for non-persistent index
        // marker "bottx". next decrement non-persistent index marker "bottx" by 1.
        r=0;
        inn -> seekpos(xxx1+4, ios::in);                                                   	
        for(v=0; v<FIELDLENGTH-4; v++) name[v]=ifile.readByte();     	
        strupr(name);                                                                                    	
        for(v=0; v<FIELDLENGTH-4; v++) {                                           	
        if(name[v]!=grid[v][upx]) r=1;                                                        	
        }                                                                                                       	
                                                                                                           	
        if(r == 0 && sssx == 0) {
        *(s2+bottx) = *(s1+ii);                                                                    	
        bottx--;
        }

// increment do-while loop counter variable “ii” by 1.
// <----- end of do-while loop. do-while loop counter variable “ii”
ii++;                                   			
} while(ii<finish+1);

In the "top1" and "bott1" processing sections of the code, there is a patch of code enclosed by processing loop "2". There are two places where the "xxx1" file stream position offset is calculated twice. It is then used to retrieve data into the "name" conventional data structure for comparison operations in two different rows in the 2 dimensional "grid" conventional data structure. It only needs to be calculated once to achieve the same result. In fact, the "name" conventional data structure only needs to retrieve the data once with each processing loop "2" iteration instead of twice.

Conclusion

I have used this sorting algorithm in many C++ applications, typically for sorting lists that need to be previewed and printed as reports. It has proven itself to be reliable as well as fast. I have also adapted it for sorting numbers and dates. I have worked almost entirely in the Cleveland, Ohio USA area since 1990. If you would like to learn more about my developer skills then please visit my software developer website.

References

  • http://www (dot) accelerationwatch (dot) com/promontorypoint (dot) html
  • http://en (dot) wikipedia (dot) org/wiki/Promontory,_Utah
  • http://www (dot) history (dot) com/topics/transcontinental-railroad

NOTE: This algorithm and its associated C++ programming code have been submitted to the United States Copyright Office. The programming code associated with this article is presented "AS IS" for the limited purpose of discussing a computer programming technique created by the developer, Douglas B. Miller – Computer Program Designer. Additionally, Douglas B. Miller – Computer Program Designer offers no implied warranties or guarantees for its performance on any desktop or laptop computer. Douglas B. Miller – Computer Program Designer shall not be held responsible or liable for malfunctions of the programming code associated with this article due to the user’s defective hardware such as memory modules, hard drives, networks, power supplies, system boards, etc. In no way shall Douglas B. Miller – Computer Program Designer, the developer of this algorithm and programming code, be held responsible or liable for any mishaps or damage from its usage in general.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

doug433
Software Developer DOUGLAS B. MILLER, COMPUTER PROGRAM DESIGNER
United States United States
No Biography provided
Follow on   Twitter

Comments and Discussions

 
QuestionMy vote of "0" PinmemberBill Gord17-Sep-12 11:30 
GeneralMy vote of 1 PinmemberBill Gord17-Sep-12 11:28 
GeneralMy vote of 1 PinmemberJames Hurburgh8-May-12 17:38 
SuggestionSuggestion Pinmvpthatraja4-May-12 20:26 
GeneralRe: Suggestion Pinmemberdoug4333-Sep-12 15:41 
GeneralMy vote of 1 Pinmemberwb17-Apr-12 21:14 
Question[My vote of 1] seems to demonstrate a complete lack of understanding of the art of programming PinmemberMehuge17-Apr-12 1:06 
GeneralMy vote of 2 PinmemberCorvus Corax16-Apr-12 16:14 
GeneralMy vote of 2 PinmemberJasmine250116-Apr-12 9:31 
QuestionSwitch statements PinmemberOscar016-Apr-12 9:04 
General[My vote of 2] OMG PinmemberGalatei16-Apr-12 8:45 
GeneralRe: [My vote of 2] OMG PingroupPaul @ The Computer Station17-Apr-12 6:31 
QuestionStill dont get it Pinmemberwb15-Apr-12 23:20 
AnswerRe: Still dont get it Pinmemberdoug43316-Apr-12 2:09 
SuggestionRe: Still dont get it PingroupPaul @ The Computer Station17-Apr-12 6:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140415.2 | Last Updated 10 Sep 2012
Article Copyright 2012 by doug433
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid