It seems like your additional constraints can be met by setting the initial ordering so that participants with the same hair/eye color will meet in the appropriate rounds. Then just follow the same matching process.
That's what crossed my mind as well... but I think that method can't be combined with the 'rolling-table'. Beneath my interpretation of your idea.
The initial setup could be a full match, like so:
1,2,3,4,5,6 = brown hair
7,8,9,10,11,12 = blond hair
As soon as the table rotates it's values, 7 and 6 start matching with incorrect participants. The number of faulty matches will increase with each rotate.
And how do I match participants on eye-color, as I have already sorted them on hair color.
Currently I've assigned a rolling table to each hair color
After the rotate(s), everything looks fine.
Brown = rotating table
Blond = rotating table
This method works for the first couple of rounds. Until the participants have to be re-sorted. Doing that will override the roilling-table logic and participants will meet eachother twice, never or none (empty place)
1,2,3 and 5 have already met during the haircolor match. They will meet eachother again during the eye-color matching rounds.
Even if the eye-color match would work.
Last rounds everybody should meet eachother who have not seen the other yet.
The problem is starting to get interesting. For some participant sets there may be no perfect solution.
A characteristic of this problem is that good solutions to the whole problem will tend to be composed of good solutions to subproblems (e.g. with same-color participants matched during certain rounds). This characteristic suggests two promising approaches: 1. Dynamic Programming and 2. Genetic Algorithms.
Dynamic Programming builds up optimal solutions for small numbers of participants, combining them to construct optimal solutions for greater numbers of participants. Genetic Algorithms take a set of complete solutions, rank them, and combine the best ones to (hopefully) make better ones.
A third approach (which may be best if you can figure out how to implement it) is to take a decent solution, then transform it one step at a time to progressively better solutions. For example, order the participants so that matching colors mostly meet during the appropriate rounds. Then for the particpants that DON'T match during these rounds, swap partners so that they DO match. The challenge here is to make other corrections to compensate for this disruption to the paring system.
Algorithm 9.2.8 Monte Carlo Rabin-Karp Search
This algorithm searches for occurrences of a pattern p in a text t. It prints out a list of indexes such that with high probability t[i..i +m− 1] = p for every index i on the list.
Input Parameters: p, t
Output Parameters: None
m = p.length
n = t.length
q = randomly chosen prime number less than mn2
r = 2m−1 mod q
// computation of initial remainders
f = 0
pfinger = 0for j = 0 to m-1
f = 2 * f + t[j] mod q
pfinger = 2 * pfinger + p[j] mod q
i = 0while (i + m ≤ n)
if (f[i] == pfinger)
prinln(“Match at position” + i)
f[i + 1] = 2 * (f[i]- r * t[i]) + t[i + m] mod q
i = i + 1
I wan't to construct a list of numbers which covers all combinations of those numbers.
Say for example that I want all combinations of the numbers 1 2 3 4, with the lenght of four (or say numbers 1 to 9 but still with the lenght of four, or if I know that '2' must be in the combination i.e. 2xxx, x2xx, xx2x or xxx2), like 1234, 1324 etc, but as a sequential string, e.g. 1234232 etc, where every new number becomes a new combination (in this case that string tests 1234, 2342, 3423, 4232).
How can I construct such an algorithm to finde the shortest possible string covering all combinations? I think it's called Euler path, but not sure. Did some googling.
Anyone who can push me in the right direction? Maybe an implementation as well?
Thanks in advance!
I have a simple question: What is the best algorithm for external sorting.
The external merge sort or the external distribution sort, or would you recommend a completly another sort algorithm?
I have to program an external sort program in C# but before I have to decide which approach to choose.
Files larger than 2 GB have to be sorted as fast as possible and therefore no internal algorithm is capable of handle such large files.
Merge sort is the classic way of sorting data that's too large to fit into memory, but shortcuts are possible.
One suggestion: Just read the keys of your records into memory, each paired with the record number it occurs in, e.g.:
(1, key1), (2, key2), ..., (n, keyn).
Then sort these pairs on the keys. When these pairs are sorted, the record numbers will have the sorted order, e.g. the first record number in the sorted pairs will be the record with the lowest key, etc...
Then assuming you can fit k records into memory, read in the k lowest records, write them to your output file, read the next k lowest records, append them to your output file and so on.
When done, your output file will have the sorted data.
I tried this approach, but it seems to be very slow because of the many hd seek commands.
The sort of the (Key,Line Number)-pairs is rather fast, but the adjacent building of the output file takes quite a while. To get one specific line I compute the absolute address of the line and deduct a seek command.
I also tried the direct approach, holding (Key,Complete line)-pairs is about two times faster, although fewer pairs fit into main memory, but no successional seeking is necessary
The results also surprised me. Further ideas are welcome
A further idea: Get a 64-bit machine, which will allow more than 4 gigs of memory, and do a normal sort.
If this must run on a 32-bit machine, I could provide a program that could read the whole file into memory compressed, sort it in its compressed form, and write out the sorted file, uncompressed. It would take some work so I'd have to charge for it however. Also the amount of compression would depend on characteristics of your data; random numbers wouldn't compress, but real-world data would probably compress to about 10% of the original size.
If your keys fit in memory fine, and sort fine, then that is as fast as its going to get. If the seek to pull data off the hard disk by key is an issue, then you need more RAM.
Think of it this way. You have the sorted keys, so your only challenge is to pull records off disk. Thats going to be slow. I'd start to look at why you have a 2 gig flatfile in the first place, and why that needs to be sorted so quickly.
Did you understand Mr. Balkany's suggestion? If you can fit 1/16 of the data in RAM along with all the indices, then you should be able to process everything (after the sort) in 16 passes through the data file, with no random seeks.
Suppose, for example, that there are 16,000,000 records and you can hold an array recBuff of 1,000,000 records in RAM along with an array finalPos of 16,000,000 integers. First, fill in finalPos such that finalPos(0) says where record #0 in the original file should go; finalPos(1) says where record #1 should go, etc. This can be done in linear time.
Next, read through the entire source file; after reading record #n from the file, look at finalPos(n). If it's less than 1,000,000 then store the record in recBuff(finalPos(n)). Otherwise discard it. Once this is done, recBuff(0..999999) will hold the first million records. Write them to disk.
Now read through the source file again. This time, look for records where finalPos(n) is in the range 1,000,000 to 1,999,999 and store those records in recBuff(finalPos(n)-1000000). Once all records have been read, recBuff will hold the next million records. Write those to disk.
If recBuff and finalPos fit in RAM without swapping, the program should run very fast. Doubling the number of items in recBuff will double the speed, if it does not cause swapping. If it does cause swapping, it will dog the performance.
If there are so many records that the finalPos array itself takes an excessive amount of space, a temp file could be created which interleaves the source data with the finalPos items (since finalPos is always read in order). That would free up more space for recBuff.
First of all I have to ask, if this is the algorithm called "bucket sort" / "radix sort", because you didn't mentioned any comparison operations.
The point I don't get is how one record is classifed to the current recBuf()-borders.
I will demonstrate my lack of understanding with an example.
Given an unsorted array: 5,3,2,9,1,8,2,4,7,2,6. Let's assume that my internal memory can only hold 3 values
First, I read the entire array and my goal is to classify 1,2,2 into the first recBuf(0..2). And that's my problem of understanding, how can I know that "3" belongs to the second recBuf(3..5).
There are three instances of "2" and so the array is not uniformly distributed.
How many records are there, how big are the keys, and how big are the records? Do you have one, two, or more disk drives available for processing?
If the keys are small enough (and there are few enough of them) that they can all fit into memory, you should start by sorting the keys (each one accompanied by an integer giving the location in the original file). Then proceed as I described.
If the number of records so large that e.g. only 10% of the keys will fit into memory, then I would suggest that you come up with some means of partitioning the keys into, say, 65,536 buckets. It doesn't matter whether the distribution is particularly even, provided that no single bucket holds more than 10%, and preferably no more than 2% or so. Make a pass through the file and count how many keys fit into each bucket.
Once that is done, count how many buckets one could add, starting at the bottom, before they totaled 10% of the records. Make a pass through the original file reading into RAM all the records that fit into those buckets. Then sort them in RAM and write them out. Then repeat the procedure, starting the the bucket after the last one that was used in the first pass. Then sort those and write those out.
The exact procedure you should use will vary depending upon what your data looks like and the number of separate disks available. Nonetheless, the key observations are (1) it's often good to sort with records containing just key and a reference to the original record, since the number of such records that can fit in RAM is larger than the number of full records that would fit; (2) it's better to think in terms of reading through a whole file, fetching some data into RAM and ignoring other data, than to think in terms of grabbing lots of little pieces of data scattered through a file; (3) though I haven't touched on this much, for really big jobs, having two or three hard drives will help things a lot.
Looks like it was done using distributed computing.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.
Last Visit: 31-Dec-99 18:00 Last Update: 31-Oct-14 8:27