Introduction
Have you ever
wondered about the software programs that alphabetize large lists of items? We
take them for granted to do our everyday tasks on the computer, but what
exactly makes them function? Popular database and many other software packages
have devised their own algorithms for taking care of this job. I have developed
my own approach for handling this important task. I will present a detailed
explanation here of how it works.
An Overview of my Problem
In 1996 I was
working on an inventory system for a customer here in my hometown of Cleveland,
Ohio USA. This software package I was making in procedural C programming
language had to alphabetize a large number of items…about 8,000 to 10,000. The
sorting program I had at the time was something I created in the early 1990s
and could only sort up to 1,500 items. It was taken from an actual program that
alphabetized a customer list. This customer list was a binary fixed record
width format (SDF) text file with each record equal to 140 characters in
length. This low capacity procedural C alphabetizing
code is listed on my website.
Back in the
mid nineteen nineties, most IBM PC based computers were running Intel 486,
Intel Pentium, AMD K-5 and other processors that were comparable in terms of
speed and power. However, their capability and that of the hard drives at the
time seemed like they had to struggle to handle a high capacity alphabetizing
task like the one my application required.
I had to
start with the basic programming idea behind the low capacity procedural C
sorting code and somehow expand it so it could process much larger data files
of list items. If I tried to design the new sorting code so it did most of the
work on the mechanical disk, that would create a whole new problem even though
the alphabetizing algorithm itself may work fine. Trying to sort a large amount
of data on a disk drive would create a very big reduction in speed because
of slowness of the mechanical moving parts of the hard drive. The customer
would certainly object to the lower speed of operation and I would be sent back
to the drawing board to start over.
Performing
the sorting on the hard disk was obviously a road to nowhere with a data sort
operation of this size. The only other option I could think of was to do the
bulk of the work in the memory. By concentrating the data manipulation in solid
state electronic memory, I could escape the slower world of the mechanical disk
drive and pick up much more speed. This was especially important at the time
because of the less powerful processors as mentioned above. Another compelling
reason for shifting the work into memory was that doing much of the work on a
disk that could potentially have any number of sector errors on it could create
catastrophic problems. This would likely throw a wrench into the sorting
process and create a corrupted data file at the end. Of course this is also
possible with concentrating the work in memory, but it’s less likely to occur.
Moving Forward
I will begin
discussing the "nuts and bolts" of how my algorithm works shortly. This
new and improved
alphabetizing code for high capacity sorting jobs was later adapted to C++
for Windows and I have included pieces of the C++ code along with diagrams to
help illustrate the logic flow. Please note that some of the C++ variables are
referred to as non-persistent variables while the "top" and "bott" variables
are called persistent variables. This is because non-persistent variables are
completely reset to new values during the processing while persistent ones are
incremented or decremented at various times, but never reset. Also, you will
notice I refer to various data structures I use such as "grid", "name", and
"stor" as conventional data structures. They are allocated within the boundaries
of the 64K data segment as prescribed by the small memory model I used in the
programming readout this article links to. This is to differentiate them from
the far memory data structures. This algorithm was performed on binary fixed
record width format (SDF) text files. I use these in my application
development because they are easy to work with. The algorithm can easily be
adjusted to work with variable width delimited (CSV) text files also.
The Main Objective: Larger Sorting Capacity
Now that I
had decided to focus most of the processing in the solid state electronic
memory, I had to come up with a way to do this so it could allocate the
capacity for large data files of list items. In the Borland procedural C
programming platform I was using, there were 6 memory models to choose from:
tiny, small, medium, compact, large and huge. I always used the small memory
model since it was the default and I just became accustomed to dealing with it
since I started with C programming in 1990. In the small memory model, the code
and data each have their own 64K memory segments. In order to manipulate large
lists of items, I would need a much larger space of memory than a 64K data
segment that also had to hold a variety of other data structures.
I decided to
use the far side of the heap, or what is known as "far memory". To set
this up, I first included a necessary C header file for allocating far memory:
#include <alloc.h>
Then I
declared 3 far memory pointers like this at the top of the sorting module:
unsigned long int far *s;
unsigned long int far *s1;
unsigned long int far *s2;
I allocated
them like this to handle up to 16,000 list items before the sorting code
commenced operation:
s = ( unsigned long int far * ) farmalloc(65000L);
s1 = ( unsigned long int far * ) farmalloc(65000L);
s2 = ( unsigned long int far * ) farmalloc(65000L);
The reason I
set up 3 far memory data structures is because all of them are needed to
manipulate the data with the new sorting algorithm I created. This gave me the
space to manipulate up to 16,000 list items. I could have allocated for a
larger number of data records, but this was more than enough to do the job at
hand.
Assigning a Numerical Weight to Each Item in the Data File
The processing
starts with applying a mathematical formula to the first four characters of
each list element in the binary fixed record width format (SDF) text file.
Consider the following numerical succession of powers of "10":
10,000,000 1,000,000 100,000 10,000 1,000 100 10 1
Next remove
the following powers of "10" in the above numerical succession:
1,000,000
10,000
100
10
This is what
is left with these powers of "10" in the updated numerical succession:
10,000,000 100,000 1,000 1
The ASCII
codes of each character in a given list element can range from 32 to 126. Each
of these ASCII codes has been "mapped" to numerical values ranging from 0 to
94. The numerical values for each of the first four characters starting from
the beginning in a given list element will each be multiplied by the updated
numerical succession in a left to right fashion.
This is the
math formula I use in the programming to assign numerical weights to each list
element:
(10,000,000 X numerical value of character 1) +
(100,000 X numerical value of character 2) +
(1,000 X numerical value of character 3) +
(1 X numerical value of character 4)
is equal to
the numerical weight for this list element in the data file. Consider the
following list element example:
"SMITHSON"
"S" = Character 1
"M" = Character 2
"I" = Character 3
"T" = Character 4
"H" = Character 5
"S" = Character 6
"O" = Character 7
"N" = Character 8
ASCII code for Character 1: S = 83 which corresponds to numerical value 51 per the algorithm.
ASCII code for Character 2: M = 77 which corresponds to numerical value 45 per the algorithm.
ASCII code for Character 3: I = 73 which corresponds to numerical value 41 per the algorithm.
ASCII code for Character 4: T = 84 which corresponds to numerical value 52 per the algorithm.
Now let’s
plug in the numerical values from this example to the math formula to yield the
numerical weight for the above list element as it is used in my algorithm:
(10,000,000 X 51) + (100,000 X 45) + (1,000 X 41) + (1 X 52) = 514,541,052
This math
formula is something I came up with that I believed would be a good way to
assign a numerical weight to each list element. Here is a partial piece of the
code that performs this task in the program:
if(kh<=32) var1=0;
if(kh==33) var1=1;
if(kh==34) var1=2;
if(kh==35) var1=3;
if(kh==36) var1=4;
if(kh==37) var1=5;
if(kh==38) var1=6;
.
.
.
.
.
if(kh==119) var1=87;
if(kh==120) var1=88;
if(kh==121) var1=89;
if(kh==122) var1=90;
if(kh==123) var1=91;
if(kh==124) var1=92;
if(kh==125) var1=93;
if(kh==126) var1=94;
if(d==0) *(s+aa) = *(s+aa) + (var1 * 10000000);
if(d==1) *(s+aa) = *(s+aa) + (var1 * 100000);
if(d==2) *(s+aa) = *(s+aa) + (var1 * 1000);
if(d==3) *(s+aa) = *(s+aa) + (var1 * 1);
d++;
} while(d<4);
if(aa==0) {
low1 = *(s+aa);
up1 = *(s+aa);
}
if( *(s+aa) < low1 ) low1 = *(s+aa);
if( *(s+aa) > up1 ) up1 = *(s+aa);
aa++;
max14 = max14 + RECORDLENGTH;
} while(aa<g);
ifile.close();
The lowest
and highest numerical weights are now known after we have applied this math
formula to all the list elements in the data file. All numerical weights will
be stored in the far memory data structure "s" in positions that correspond to
their sequential positions in the non-alphabetized list (See Figure 1).

// if the lowest primary "low1" and highest primary "up1" variables are not equal
// (meaning there are records to be alphabetized), then proceed with processing.
if ( low1 != up1 ) {
// initialize high "main" processing loop exit variable "qqq" to stay within the processing loop.
qqq=0;
// initialize low "main" processing loop exit variable "sss" to stay within the processing loop.
sss=0;
// begin "main" processing loop and redefine "top1" and "bott1" pairs of processing regions
// until all elements are sorted.
do {
// assign primary high numerical variable "up1" to secondary low numerical variable "low2".
low2 = up1;
// assign primary low numerical variable "low1" to secondary high numerical variable "up2".
up2 = low1;
// the loop that follows will set boundaries and numerical values for processing regions
// "top1" and "bott1". each of these processing regions can handle up to 150 elements with
// the same numerical weighting as calculated above on the first four characters of the
// list element. i need to mention something here. there is a highly unlikely possibility
// that the algorithm could malfunction if a specific numerical weight that has been assigned
// to a "top1" or "bott1" processing region exceeds 150 elements. this is because it would
// exceed the row depth of the 2 dimensional "grid" conventional data structure which
// is heavily used for sorting and insertion operations in the "top1" and "bott1"
// processing regions. i have used this algorithm in many of my programs and
// have never seen this happen.
// initialize list element loop counting variable "ii" to 0.
ii=0;
// initialize upper processing region "top1" non-persistent processing region.
top1=0;
// initialize lower processing region "bott1" non-persistent processing region.
bott1=0;
// initialize the start of upper processing region "start" non-persistent
// list element marker with "top" persistent element marker.
start=top;
// initialize the start of lower processing region "finish" non-persistent
// list element marker with "bott" persistent element marker.
finish=bott;
// <----- start of do-while loop.
do {
// if the numerically weighted value for the given list element in far memory data structure
// "s" for index "ii" is equal to the lowest primary variable "low1" and the high "main"
// processing loop exit variable "qqq" is set to stay within the "main" processing loop,
if( *(s+ii) == low1 && qqq == 0) {
// then assign the sequential numerical file position value for the given list
// element "ii" to far memory data structure "s1" in the array position denoted by
// "top" persistent element marker.
*(s1+top) = ii;
// next increment the "top" persistent list element marker and "top1" non-persistent list element
// marker by 1 each.
top++;
top1++;
}
// if the numerically weighted value for the given list element in far memory data structure
// "s" for index "ii" is equal to the highest primary variable "up1" and the low "main"
// processing loop exit variable "sss" is set to stay within the processing loop,
if( *(s+ii) == up1 && sss == 0) {
// then assign the sequential numerical file position value for the given list
// element "ii" to far memory data structure "s1" in the array position denoted by
// "bott" persistent element marker.
*(s1+bott) = ii;
// next decrement the "bott" persisten element marker and increment "bott1"
// non-persistent list element marker by 1 each.
bott--;
bott1++;
}
// if the numerically weighted value for the given list element in far memory data
// structure "s" is greater than the lowest primary variable "low1" and is less than
// the lowest secondary variable "low2" then assign numerically weighted value
// for the given list element for index "ii" in far memory data structure "s" to the
// lowest secondary variable "low2".
if( (*(s+ii) > low1) && (*(s+ii) < low2) ) low2 = *(s+ii);
// if the numerically weighted value for the given list element in far memory data
// structure "s" is less than the highest primary variable "up1" and is greater than
// the highest secondary variable "up2" then assign numerically weighted value
// for the given list element for index "ii" in far memory data structure "s" to the
// highest secondary variable "up2".
if( (*(s+ii) < up1) && (*(s+ii) > up2) ) up2 = *(s+ii);
// increment list element counting variable "ii" by 1. loop sequentially through the list until
// the list element counting variable is at the end. this looping routine will set
// the records of the same numerically weighted value to be processed for alphabetizing
// in processing region "top1" and the same for processing region "bott1". the "start"
// non-persistent element marker is the beginning of the "top1" processing region and
// "start+top1" is the end. the "finish" non-persistent element marker is the end
// of the "bott1" processing region and "finish-bott1+1" is the beginning.
// <----- end of do-while loop.
Ii++;
} while(ii<g);
// first, set the integer variable "r" equal to 0. if the secondary low variable "low2" and the
// secondary high variable "up2" are equal and the low "main" processing loop exit variable "sss"
// is set to stay within the "main" processing loop then set the low "main" processing loop exit
r=0;
if(low2 == up2 && sss == 0) {
// variable "sss" to attempt to exit the "main" processing loop. also, set the integer variable "r"
// equal to 1 so the very next logic structure will not be evaluated.
sss=1;
r=1;
}
// if the integer variable "r" is still set to 0 then evaluate the next logic structure which is
// described as follows: if the secondary low variable "low2" and the secondary high
// variable "up2" are equal and the low "main" processing loop exit variable "sss" is set to attempt
// to exit the "main" processing loop then set the high "main" processing loop exit variable "qqq"
// to attempt to exit the "main" processing loop.
if(low2 == up2 && r == 0 && sss == 1) qqq=1;
// if the secondary low numerically weighted variable "low2" is greater than the secondary
// high numerically weighted variable "up2" then set the low "main" processing loop exit variable
// "sss" and the high "main" processing loop exit variable "qqq" to both exit the "main"
// processing loop which will cease the redefinition of successive "top1" and "bott1" processing regions.
if(low2 > up2) {
qqq=1;
sss=1;
}
// next assign secondary low numerically weighted variable "low2" to primary low
// numerically weighted variable "low1". next assign secondary high numerically
// weighted variable "up2" to primary high numerically weighted variable "up1".
low1 = low2;
up1 = up2;
In the above
code segment, the first thing that occurs is to see whether or not the lowest
and highest numerical weights are equal. This compares the lowest primary
variable "low1" to the highest primary variable "up1". If they are equal, the
start of processing will be aborted because all list elements will have the
same numerical weight. This means the first four characters of all list
elements are the same. This would be highly unusual because they would already
be nearly alphabetized to begin with and the probability of ever encountering a
list like this would be remote. In the end, the original data file to be
alphabetized would be left intact and not be reconstructed at the end. If they
are unequal, the lowest primary variable "low1" and the highest primary
variable "up1" would represent two different sets of numerically weighted list
elements and therefore processing would continue with the commencement of the
"main" processing loop.
A Tale of Two Far Memory Processing Regions: "TOP1" and "BOTT1"
The C++
program cycles around a do-while loop which I call the "main" processing loop.
I use two segments of far memory to facilitate the sorting process which I call
the "top1" and "bott1" processing regions. The "top1" and "bott1" processing
regions will be repeatedly redefined with each iteration through the "main"
processing loop. This is the "segmented mechanism" which drives the
sorting process.
Both of these
processing regions actually begin as numerical variables. They later
evolve into processing regions. First, they are both initialized to 0. Then the
"top1" variable is incremented by 1 for each list element in the far memory
data structure "s" that corresponds to the lowest primary variable "low1"
(lowest current numerical weight). Next, the "bott1" variable is incremented by
1 for each list element in the far memory data structure "s" that corresponds
to the highest primary variable "up1" (highest current numerical weight). This
is done in the previous code snippet above. Also, the "main" processing loop
exit variables "qqq" and "sss" cannot be set to exit the "main" processing loop
while both processing regions need to be redefined to process unsorted list
elements. In other words, "qqq" must be set to 0 for "top1" to include the
lowest current numerical weight in its processing region that is being defined.
And "sss" must be set to 0 for "bott1" to include the highest current numerical
weight in its processing region which is also being defined.
One other
thing to notice in the previous code snippet are 2 markers I use for the list
elements, "start" and "finish". "start" is assigned the value in "top",
and "finish" is assigned the value in "bott". "start" is a non-persistent
element marker used to denote the list item count or depth of the "top1"
processing region. "finish" is a non-persistent element marker used to denote
the list item count or depth of the "bott1" processing region. Both "top" and
"bott" are persistent element markers that are incremented along with "top1"
and "bott1". (See Figures 7 and 8 to see a visual representation of the "top1"
and "bott1" processing regions.)

After the
redefinition process is completed, the "top1" processing region will encompass
list elements corresponding to the lowest current numerical weight. The same is
true for the "bott1" processing region, but with a numerical weight that
corresponds to the highest current numerical weight. The algorithm will use
both processing regions to facilitate the actual sorting process, the specifics
of which I will not get into with this article. To view that, you can refer to the
"improved alphabetizing code" hyperlink near the beginning of the
article. After the sorting has been performed, the program will iterate around
the "main" processing loop and proceed to redefine new pairs of "top1" and
"bott1" processing regions. (See Figure 2).

Both
processing regions will approach each other in spatial proximity as they move
toward the center of the far memory data structure "s" from being redefined
with each pass through the "main" processing loop. Each new "top1" processing region
will have a higher numerical weight than its predecessor "top1" region. Each
new "bott1" processing region will have a lower numerical weight than its
predecessor "bott1" region. Please refer to figures 3, 4, 5, and 6 for a visual
illustration of the progression of the algorithm as successive "top1" and
"bott1" processing regions are redefined with each pass through the "main"
processing loop.


Notice what
happens in Figure 6 after the processing in successive "top1" and "bott1"
processing regions reaches the middle of far memory in data structure "s". The
"top1" processing region with the least lowest numerical weight is
adjacent to the "bott1" processing region with the least highest numerical
weight. The processing will cease at this point because there will be no
more list elements left to sort. The "main" processing loop will then be exited
and the new alphabetized array of list element positions stored in far memory
data structure "s1" will be written to a new data file. (See Figures 9 and 10).


Here I want
to talk about ways the "main" processing loop could be exited before the data
is written back to a new sorted data file. As the processing draws to a
close in the middle of the far memory data structure "s", it will not
necessarily end with an even pair of final "top1" and "bott1" processing
regions. It can also near completion with either of the "top1" or "bott1"
processing regions having its "main" processing loop exit variable set to
attempt to exit the "main" processing loop. To be more specific, the "top1"
processing region could have its "main" loop exit variable "qqq" set to 1 which
means there are no more "top1" regions to be redefined while the "bott1"
processing region could have its "main" loop exit variable "sss" set to 0 meaning
there is another "bott1" processing region to be redefined and alphabetized.
The opposite of this can also occur.
An Analogy that May Help Clarify the Logic Flow
Knowing this
narrative may be overwhelming for some readers, I would like to take a page from
American history that may be helpful in creating a better understanding of how
my algorithm works.
During the
latter part of the 19th century, the United States turned its
attention to nation building. Connecting the vast expanse of North America by
way of a coast-to-coast railroad became a national priority. This was the start
of America’s first Transcontinental Railroad.
Two railroad
companies, the Union Pacific and the Central Pacific, spearheaded this
ambitious and daunting task. The Central Pacific began building its railway
eastward from Sacramento, California, while the Union Pacific began
construction work heading westward from Omaha, Nebraska.
Both crews in
the east and west worked relentlessly for seven years. On April 28, 1868 the
Union Pacific’s construction gang of Chinese and Irish workers laid ten miles
of railroad track in a single day as a result of a $10,000 bet. This astounding
feat has never been repeated by a human railroad construction crew since.
Finally, on
May 10, 1869 the construction was completed at Promontory Point in the Utah
territory. The Union Pacific’s No. 119 engine and the Central Pacific’s No. 60
engine (aka the Jupiter) were drawn up face-to-face separated by the
width of a single railroad tie. At the Golden Spike ceremony, three spikes were
driven in to connect the two railways: one gold, one silver and one made of
gold, silver and iron. Travel time between the east and west coasts was reduced
from between four to six months to only six days by railroad!
Now, the
progression of my algorithm is quite similar to the construction of America’s
first Transcontinental Railroad when you take a moment to really think about
it. The solid state electronic memory, or allocated far memory segment, is like
a long stretch of terrain awaiting the arrival of "alphabetizing
construction of list elements" so to speak. The "top1" and "bott1"
processing regions are like "two construction gangs" that commence
sorting work beginning at opposite ends of the far memory segment. They each
work hard to alphabetize list elements of the same numerical weight as
previously described. After they finish the sorting in their respective
processing regions, the sorted results are "laid down as a sort of
alphabetized railroad track length" in each of their "top1" and "bott1"
processing regions, respectively. After the program loops around the "main"
processing loop and new "top1" and "bott1" processing regions have been
defined, the process repeats itself.
As the
algorithm moves along, it begins to resemble two work crews gradually
progressing towards a conclusion in the middle of far memory. In
reality, it happens much more quickly in the world of solid state electronic
memory, but I think you get the idea.
Finally, The "Golden
Spike ceremony" happens in a metaphorical sense when the "top1" and
"bott1" processing regions are adjacent to one another in far memory as I
discussed earlier.
A Potential Problem and a Remedy
Here I would
like to expand on a potential problem with the algorithm and a recommended
solution that should take care of it. The 2 dimensional "grid" conventional
data structure is used extensively to manipulate list elements in the "top1"
and "bott1" processing regions. It is designed to hold up to 150 list elements
of the same numerical weight. You need to be conscious about how much row depth
you give the 2 dimensional "grid" conventional data structure so it and other
conventional data structures taken together don’t breach the 64K data segment
of the small memory model that is used. The problem arises if there are more
than 150 elements in a "top1" or "bott1" processing region. The algorithm
won’t abort or malfunction, but rather it will include only the first 150 list
elements in a processing region and no more. I never really tried to address
this potential snafu because it is highly unlikely to occur in the first place.
There would have to be more than 150 "Smiths" or "Joneses" to trigger it. This
could happen in an application like a voter registration verification list that
could include a lot of the same last names.
A good way to
correct this is to declare a fourth far memory data structure that is the same
size as each of the first three. It would replace and perform the job of the 2
dimensional "grid" conventional data structure, yet it would always be large
enough to hold all the list elements for a particular numerical weight. This is
because it would be allocated to hold as many list elements as are contained in
the entire data file.
Just Say "No" to Redundant, Speed Robbing Code
Many of you
may be wondering by now about the speed of the algorithm. I tested it with a
binary fixed record width format (SDF) text file containing 10,959 part
numbers. On a Gateway Pentium 4 tower CPU using an old 6 GB Quantum Bigfoot hard
drive the processing took a little over 3 seconds. When it was run on a Dell
M5030 laptop with an AMD V160 Processor at 2.4 GHz it took about 1 second.
There are some areas in the repetitive loop processing that could be redesigned
or eliminated that should further increase the processing speed since less work
is required to achieve the same result. After I finished this in 1996 it seemed
to work in a reasonable amount of time so I didn’t go back and try to optimize
it some more. Here I will elaborate with some selected areas in the code that
could be improved to yield more processing speed.
inn -> seekpos(max14, ios::in);
d=0;
do {
kh=ifile.readByte();
khchar[0]=kh;
strupr(khchar);
kh=khchar[0];
if(kh<=32) var1=0;
if(kh==33) var1=1;
if(kh==34) var1=2;
if(kh==35) var1=3;
if(kh==36) var1=4;
if(kh==37) var1=5;
if(kh==38) var1=6;
if(kh==39) var1=7;
This block of
code that tests for ASCII characters 32 through 126 could be replaced with the
C++ function atoi(). It would eliminate much of the repetitive conditional "if"
logic structure comparisons and convert the character to an integer. This new
integer value could then be used in the math formula that computes numerical
weights for each list element in the processing. Here is another place for
adding some speed:
// set do-while loop counter variable “ii” to the non-persistent "finish-bott1+1" list element
// marker denoting the beginning of the "bott1" processing region.
ii=finish - bott1 + 1;
// <----- start of do-while loop.
do {
// retrieve and calculate file stream position offset of the list element for do-while loop
// counter "ii" in the far memory data structure “s1”.
xxx1 = *(s1+ii);
xxx1 = xxx1 * RECORDLENGTH;
// use calculated file stream position offset "xxx1" to retrieve all characters of the list element
// except the first four characters into the "name" conventional data structure. compare this to
// the 2 dimensional "grid" conventional data structure set at lower index counter "lowx". if they
// are equal and high processing loop “1” exit variable "qqqx" is set to stay within processing
// loop "1" then assign the sequential list element position in far memory data structure "s1" for
// do-while loop counter variable “ii” to far memory data structure "s2" for non-persistent index
// marker "topx". next increment non-persistent index marker "topx" by 1.
r=0;
inn -> seekpos(xxx1+4, ios::in);
for(v=0; v<FIELDLENGTH-4; v++) name[v]=ifile.readByte();
strupr(name);
for(v=0; v<FIELDLENGTH-4; v++) {
if(name[v]!=grid[v][lowx]) r=1;
}
if(r == 0 && qqqx == 0) {
*(s2+topx) = *(s1+ii);
topx++;
}
// retrieve and calculate file stream position offset of the list element for do-while loop
// counter "ii" in the far memory data structure “s1”.
xxx1 = *(s1+ii);
xx1 = xxx1 * RECORDLENGTH;
// use calculated file stream position offset "xxx1" to retrieve all characters of the list element
// except the first four characters into the "name" conventional data structure. compare this to
// the 2 dimensional "grid" conventional data structure set at higher index counter "upx". if they
// are equal and low processing loop “1” exit variable "sssx" is set to stay within processing
// loop "1" then assign the sequential list element position in far memory data structure "s1" for
// do-while loop counter variable “ii” to far memory data structure "s2" for non-persistent index
// marker "bottx". next decrement non-persistent index marker "bottx" by 1.
r=0;
inn -> seekpos(xxx1+4, ios::in);
for(v=0; v<FIELDLENGTH-4; v++) name[v]=ifile.readByte();
strupr(name);
for(v=0; v<FIELDLENGTH-4; v++) {
if(name[v]!=grid[v][upx]) r=1;
}
if(r == 0 && sssx == 0) {
*(s2+bottx) = *(s1+ii);
bottx--;
}
// increment do-while loop counter variable “ii” by 1.
// <----- end of do-while loop. do-while loop counter variable “ii”
ii++;
} while(ii<finish+1);
In the "top1"
and "bott1" processing sections of the code, there is a patch of code enclosed
by processing loop "2". There are two places where the "xxx1" file stream
position offset is calculated twice. It is then used to retrieve data into the
"name" conventional data structure for comparison operations in two different
rows in the 2 dimensional "grid" conventional data structure. It only needs to
be calculated once to achieve the same result. In fact, the "name" conventional
data structure only needs to retrieve the data once with each processing loop
"2" iteration instead of twice.
Conclusion
I have used
this sorting algorithm in many C++ applications, typically for sorting lists
that need to be previewed and printed as reports. It has proven itself to be
reliable as well as fast. I have also adapted it for sorting numbers and dates.
I have worked almost entirely in the Cleveland, Ohio USA area since 1990. If
you would like to learn more about my developer
skills then please visit my software developer website.
References
- http://www
(dot) accelerationwatch (dot) com/promontorypoint (dot) html
- http://en
(dot) wikipedia (dot) org/wiki/Promontory,_Utah
- http://www (dot) history (dot) com/topics/transcontinental-railroad
NOTE: This
algorithm and its associated C++ programming code have been submitted to the
United States Copyright Office. The programming code associated with this
article is presented "AS IS" for the limited purpose of discussing a computer
programming technique created by the developer, Douglas B. Miller – Computer
Program Designer. Additionally, Douglas B. Miller – Computer Program Designer
offers no implied warranties or guarantees for its performance on any desktop
or laptop computer. Douglas B. Miller – Computer Program Designer shall not be
held responsible or liable for malfunctions of the programming code associated
with this article due to the user’s defective hardware such as memory modules,
hard drives, networks, power supplies, system boards, etc. In no way shall
Douglas B. Miller – Computer Program Designer, the developer of this algorithm
and programming code, be held responsible or liable for any mishaps or damage
from its usage in general.