Help, I'm stuck again.
Your solution works like a charm. I've written a custom class with 2 Arrays and a rotate function. The rotate function shifts the value positions within the Arrays, except for position 1 in Array 1.
I'm walking into problems when I have certain criteria conditions per round(s).
Round 1-3, participants with the same hair-color have to meet. If there's no match assign random participant.
Round 4-6, participants with the same eye-color have to meet. If there's no match assign random participant.
I can fill in round 1-3... it's still a struggle though. For each matching hair-type I've created a 'rolling-table' and on the empty places I assign a random participant (from collection of non-matches). After all the rollingtables have finished, I make a rolling-table of the left-overs (non-match collection) and process them as well.
The problem is that I don't know how to continue. Some particpants could have matching hair-color as well as eye color. How do avoid fall-out because of double matching?
When P1 and P2 have brown hair and grey eyes, the could have met during round 1-3, If they did they should not meet again. But there're other Participants with grey-eyes that will have to meet with P1 and P2....
I'm thinking about this for the last couple of days and can't solve this. Much respect rewarded for solving this.
It seems like your additional constraints can be met by setting the initial ordering so that participants with the same hair/eye color will meet in the appropriate rounds. Then just follow the same matching process.
That's what crossed my mind as well... but I think that method can't be combined with the 'rolling-table'. Beneath my interpretation of your idea.
The initial setup could be a full match, like so:
1 2 3 7 8 9
4 5 6 10 11 12
1,2,3,4,5,6 = brown hair
7,8,9,10,11,12 = blond hair
1 3 7 8 9 12
2 4 5 6 10 11
As soon as the table rotates it's values, 7 and 6 start matching with incorrect participants. The number of faulty matches will increase with each rotate.
And how do I match participants on eye-color, as I have already sorted them on hair color.
Currently I've assigned a rolling table to each hair color
After the rotate(s), everything looks fine.
1 3 6 8 9 12
2 4 5 7 10 11
Brown = rotating table
Blond = rotating table
This method works for the first couple of rounds. Until the participants have to be re-sorted. Doing that will override the roilling-table logic and participants will meet eachother twice, never or none (empty place)
4 6 7 1 2 3
8 10 12 5 9 11
1,2,3 and 5 have already met during the haircolor match. They will meet eachother again during the eye-color matching rounds.
Even if the eye-color match would work.
Last rounds everybody should meet eachother who have not seen the other yet.
The problem is starting to get interesting. For some participant sets there may be no perfect solution.
A characteristic of this problem is that good solutions to the whole problem will tend to be composed of good solutions to subproblems (e.g. with same-color participants matched during certain rounds). This characteristic suggests two promising approaches: 1. Dynamic Programming and 2. Genetic Algorithms.
Dynamic Programming builds up optimal solutions for small numbers of participants, combining them to construct optimal solutions for greater numbers of participants. Genetic Algorithms take a set of complete solutions, rank them, and combine the best ones to (hopefully) make better ones.
A third approach (which may be best if you can figure out how to implement it) is to take a decent solution, then transform it one step at a time to progressively better solutions. For example, order the participants so that matching colors mostly meet during the appropriate rounds. Then for the particpants that DON'T match during these rounds, swap partners so that they DO match. The challenge here is to make other corrections to compensate for this disruption to the paring system.
Algorithm 9.2.8 Monte Carlo Rabin-Karp Search
This algorithm searches for occurrences of a pattern p in a text t. It prints out a list of indexes such that with high probability t[i..i +m− 1] = p for every index i on the list.
Input Parameters: p, t
Output Parameters: None
m = p.length
n = t.length
q = randomly chosen prime number less than mn2
r = 2m−1 mod q
// computation of initial remainders
f = 0
pfinger = 0
for j = 0 to m-1
f = 2 * f + t[j] mod q
pfinger = 2 * pfinger + p[j] mod q
i = 0
while (i + m ≤ n)
if (f[i] == pfinger)
prinln(“Match at position” + i)
f[i + 1] = 2 * (f[i]- r * t[i]) + t[i + m] mod q
i = i + 1
I wan't to construct a list of numbers which covers all combinations of those numbers.
Say for example that I want all combinations of the numbers 1 2 3 4, with the lenght of four (or say numbers 1 to 9 but still with the lenght of four, or if I know that '2' must be in the combination i.e. 2xxx, x2xx, xx2x or xxx2), like 1234, 1324 etc, but as a sequential string, e.g. 1234232 etc, where every new number becomes a new combination (in this case that string tests 1234, 2342, 3423, 4232).
How can I construct such an algorithm to finde the shortest possible string covering all combinations? I think it's called Euler path, but not sure. Did some googling.
Anyone who can push me in the right direction? Maybe an implementation as well?
Thanks in advance!
I have a simple question: What is the best algorithm for external sorting.
The external merge sort or the external distribution sort, or would you recommend a completly another sort algorithm?
I have to program an external sort program in C# but before I have to decide which approach to choose.
Files larger than 2 GB have to be sorted as fast as possible and therefore no internal algorithm is capable of handle such large files.
Merge sort is the classic way of sorting data that's too large to fit into memory, but shortcuts are possible.
One suggestion: Just read the keys of your records into memory, each paired with the record number it occurs in, e.g.:
(1, key1), (2, key2), ..., (n, keyn).
Then sort these pairs on the keys. When these pairs are sorted, the record numbers will have the sorted order, e.g. the first record number in the sorted pairs will be the record with the lowest key, etc...
Then assuming you can fit k records into memory, read in the k lowest records, write them to your output file, read the next k lowest records, append them to your output file and so on.
When done, your output file will have the sorted data.
I tried this approach, but it seems to be very slow because of the many hd seek commands.
The sort of the (Key,Line Number)-pairs is rather fast, but the adjacent building of the output file takes quite a while. To get one specific line I compute the absolute address of the line and deduct a seek command.
I also tried the direct approach, holding (Key,Complete line)-pairs is about two times faster, although fewer pairs fit into main memory, but no successional seeking is necessary
The results also surprised me. Further ideas are welcome
A further idea: Get a 64-bit machine, which will allow more than 4 gigs of memory, and do a normal sort.
If this must run on a 32-bit machine, I could provide a program that could read the whole file into memory compressed, sort it in its compressed form, and write out the sorted file, uncompressed. It would take some work so I'd have to charge for it however. Also the amount of compression would depend on characteristics of your data; random numbers wouldn't compress, but real-world data would probably compress to about 10% of the original size.
If your keys fit in memory fine, and sort fine, then that is as fast as its going to get. If the seek to pull data off the hard disk by key is an issue, then you need more RAM.
Think of it this way. You have the sorted keys, so your only challenge is to pull records off disk. Thats going to be slow. I'd start to look at why you have a 2 gig flatfile in the first place, and why that needs to be sorted so quickly.
Did you understand Mr. Balkany's suggestion? If you can fit 1/16 of the data in RAM along with all the indices, then you should be able to process everything (after the sort) in 16 passes through the data file, with no random seeks.
Suppose, for example, that there are 16,000,000 records and you can hold an array recBuff of 1,000,000 records in RAM along with an array finalPos of 16,000,000 integers. First, fill in finalPos such that finalPos(0) says where record #0 in the original file should go; finalPos(1) says where record #1 should go, etc. This can be done in linear time.
Next, read through the entire source file; after reading record #n from the file, look at finalPos(n). If it's less than 1,000,000 then store the record in recBuff(finalPos(n)). Otherwise discard it. Once this is done, recBuff(0..999999) will hold the first million records. Write them to disk.
Now read through the source file again. This time, look for records where finalPos(n) is in the range 1,000,000 to 1,999,999 and store those records in recBuff(finalPos(n)-1000000). Once all records have been read, recBuff will hold the next million records. Write those to disk.
If recBuff and finalPos fit in RAM without swapping, the program should run very fast. Doubling the number of items in recBuff will double the speed, if it does not cause swapping. If it does cause swapping, it will dog the performance.
If there are so many records that the finalPos array itself takes an excessive amount of space, a temp file could be created which interleaves the source data with the finalPos items (since finalPos is always read in order). That would free up more space for recBuff.
Last Visit: 31-Dec-99 18:00 Last Update: 12-Jun-21 13:35