

It seems like your additional constraints can be met by setting the initial ordering so that participants with the same hair/eye color will meet in the appropriate rounds. Then just follow the same matching process.





That's what crossed my mind as well... but I think that method can't be combined with the 'rollingtable'. Beneath my interpretation of your idea.
The initial setup could be a full match, like so:
Hair color
Brown Blond
1 2 3 7 8 9
4 5 6 10 11 12
1,2,3,4,5,6 = brown hair
7,8,9,10,11,12 = blond hair
Rotate()
Hair color
Brown Blond
1 3 7 8 9 12
2 4 5 6 10 11
As soon as the table rotates it's values, 7 and 6 start matching with incorrect participants. The number of faulty matches will increase with each rotate.
And how do I match participants on eyecolor, as I have already sorted them on hair color.

Currently I've assigned a rolling table to each hair color
After the rotate(s), everything looks fine.
Hair color
Brown Blond
1 3 6 8 9 12
2 4 5 7 10 11
Brown = rotating table
Blond = rotating table
This method works for the first couple of rounds. Until the participants have to be resorted. Doing that will override the roillingtable logic and participants will meet eachother twice, never or none (empty place)
Eye color
Brown Blue
4 6 7 1 2 3
8 10 12 5 9 11
1,2,3 and 5 have already met during the haircolor match. They will meet eachother again during the eyecolor matching rounds.
Even if the eyecolor match would work.
Last rounds everybody should meet eachother who have not seen the other yet.
** it's making me nuts! **
*sigh*
P.S. I REALLY appreciate your input!
modified on Thursday, October 16, 2008 10:04 AM





The problem is starting to get interesting. For some participant sets there may be no perfect solution.
A characteristic of this problem is that good solutions to the whole problem will tend to be composed of good solutions to subproblems (e.g. with samecolor participants matched during certain rounds). This characteristic suggests two promising approaches: 1. Dynamic Programming and 2. Genetic Algorithms.
Dynamic Programming builds up optimal solutions for small numbers of participants, combining them to construct optimal solutions for greater numbers of participants. Genetic Algorithms take a set of complete solutions, rank them, and combine the best ones to (hopefully) make better ones.
A third approach (which may be best if you can figure out how to implement it) is to take a decent solution, then transform it one step at a time to progressively better solutions. For example, order the participants so that matching colors mostly meet during the appropriate rounds. Then for the particpants that DON'T match during these rounds, swap partners so that they DO match. The challenge here is to make other corrections to compensate for this disruption to the paring system.





Where can I find sample codes showing implementation of Monte Carlo rabin_karp search.
Thanks





From this website[^].
Algorithm 9.2.8 Monte Carlo RabinKarp Search
This algorithm searches for occurrences of a pattern p in a text t. It prints out a list of indexes such that with high probability t[i..i +m− 1] = p for every index i on the list.
Input Parameters: p, t
Output Parameters: None
mc_rabin_karp_search(p, t)
{
m = p.length
n = t.length
q = randomly chosen prime number less than mn2
r = 2m−1 mod q
f[0] = 0
pfinger = 0
for j = 0 to m1
{
f[0] = 2 * f[0] + t[j] mod q
pfinger = 2 * pfinger + p[j] mod q
}
i = 0
while (i + m ≤ n)
{
if (f[i] == pfinger)
prinln(“Match at position” + i)
f[i + 1] = 2 * (f[i] r * t[i]) + t[i + m] mod q
i = i + 1
}
}





Thanks, but am after a working example not just a pseudocode.





Angelinna wrote: but am after a working example not just a pseudocode
plz gimme codez (urgent?)
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke
[My articles]





Angelinna wrote: am after a working example not just a pseudocode
Why can't you take the pseudo code and implement it in what ever language you are programming in?
"The clue train passed his station without stopping."  John Simmons / outlaw programmer
"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks"  Pete O'Hanlon
"Not only do you continue to babble nonsense, you can't even correctly remember the nonsense you babbled just minutes ago."  Rob Graham





Because "she" comes in here and expects guys to fall all over her doing her homework for her.
If you don't have the data, you're just another a**hole with an opinion.





Let’s say we have an array of integers
int[] myArray = new int[] {1,2,3,4,5};
So the length of this array is 4 [i.e. n=4] since C# array index starts at 0
Yes the length will be 5 and not 4 as pointed out in the next post. Its my bad  Sorry!
Define integer k such that 0<= k < n [n = length of an array]
For example, If k = 2 then the output should be
{3,4,5,1,2} i.e starting from kth position move all the array elements to the top of an array.
If k = 3, output would be
{4,5,1,2,3}
Here is the challenge.
Yes this is trivial if we write a loop that starts at 0 and goes up to n like
for(int i =0;i<n;i++){}
We want to optimize this loop so that it would not loop till n1. anything less than n1 is a good solution.
[Tip: if you want to reverse this array like 5,4,3,2,1 – you can use the loop like
for(int i=0;i<n/2;i++)
modified on Wednesday, October 1, 2008 5:23 PM





abhigad wrote: So the length of this array is 4 [i.e. n=4] since C# array index starts at 0
The length of the array is 5 , independently if it is 0 based or 1 based.
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke
[My articles]





Your question is unclear.





So you mean:
int[] in = new int[] {...}
int[] out = new int[in.Length];
Array.Copy(in (k > length) => out (0 ...));
Array.Copy(in (0 > k) => out (length  k));
Or is this some school assignment where you have to shuffle inplace?





Hi!
I wan't to construct a list of numbers which covers all combinations of those numbers.
Say for example that I want all combinations of the numbers 1 2 3 4, with the lenght of four (or say numbers 1 to 9 but still with the lenght of four, or if I know that '2' must be in the combination i.e. 2xxx, x2xx, xx2x or xxx2), like 1234, 1324 etc, but as a sequential string, e.g. 1234232 etc, where every new number becomes a new combination (in this case that string tests 1234, 2342, 3423, 4232).
How can I construct such an algorithm to finde the shortest possible string covering all combinations? I think it's called Euler path, but not sure. Did some googling.
Anyone who can push me in the right direction? Maybe an implementation as well?
Thanks in advance!
modified on Wednesday, October 1, 2008 4:41 AM





I have a simple question: What is the best algorithm for external sorting.
The external merge sort or the external distribution sort, or would you recommend a completly another sort algorithm?
Background:
I have to program an external sort program in C# but before I have to decide which approach to choose.
Files larger than 2 GB have to be sorted as fast as possible and therefore no internal algorithm is capable of handle such large files.





Merge sort is the classic way of sorting data that's too large to fit into memory, but shortcuts are possible.
One suggestion: Just read the keys of your records into memory, each paired with the record number it occurs in, e.g.:
(1, key1), (2, key2), ..., (n, keyn).
Then sort these pairs on the keys. When these pairs are sorted, the record numbers will have the sorted order, e.g. the first record number in the sorted pairs will be the record with the lowest key, etc...
Then assuming you can fit k records into memory, read in the k lowest records, write them to your output file, read the next k lowest records, append them to your output file and so on.
When done, your output file will have the sorted data.





I tried this approach, but it seems to be very slow because of the many hd seek commands.
The sort of the (Key,Line Number)pairs is rather fast, but the adjacent building of the output file takes quite a while. To get one specific line I compute the absolute address of the line and deduct a seek command.
I also tried the direct approach, holding (Key,Complete line)pairs is about two times faster, although fewer pairs fit into main memory, but no successional seeking is necessary
The results also surprised me. Further ideas are welcome





A further idea: Get a 64bit machine, which will allow more than 4 gigs of memory, and do a normal sort.
If this must run on a 32bit machine, I could provide a program that could read the whole file into memory compressed, sort it in its compressed form, and write out the sorted file, uncompressed. It would take some work so I'd have to charge for it however. Also the amount of compression would depend on characteristics of your data; random numbers wouldn't compress, but realworld data would probably compress to about 10% of the original size.





64bit: Would be nice, but my clients won't change their complete IT infrastructure.
What type of compression do you recommend?
I tried zip and gzip with lowest compression rate, but they slowed down the overall process too much.
Could a simple RunLengthCompression be satisfying?





Runlength compression would help if you have long sequences of the same symbol in your data.
I have a compression technique that gives more compression than in currentlyavailable commercial products. If you give me an email address I'll send you a link that describes it.





If your keys fit in memory fine, and sort fine, then that is as fast as its going to get. If the seek to pull data off the hard disk by key is an issue, then you need more RAM.
Think of it this way. You have the sorted keys, so your only challenge is to pull records off disk. Thats going to be slow. I'd start to look at why you have a 2 gig flatfile in the first place, and why that needs to be sorted so quickly.





Did you understand Mr. Balkany's suggestion? If you can fit 1/16 of the data in RAM along with all the indices, then you should be able to process everything (after the sort) in 16 passes through the data file, with no random seeks.
Suppose, for example, that there are 16,000,000 records and you can hold an array recBuff of 1,000,000 records in RAM along with an array finalPos of 16,000,000 integers. First, fill in finalPos such that finalPos(0) says where record #0 in the original file should go; finalPos(1) says where record #1 should go, etc. This can be done in linear time.
Next, read through the entire source file; after reading record #n from the file, look at finalPos(n). If it's less than 1,000,000 then store the record in recBuff(finalPos(n)). Otherwise discard it. Once this is done, recBuff(0..999999) will hold the first million records. Write them to disk.
Now read through the source file again. This time, look for records where finalPos(n) is in the range 1,000,000 to 1,999,999 and store those records in recBuff(finalPos(n)1000000). Once all records have been read, recBuff will hold the next million records. Write those to disk.
If recBuff and finalPos fit in RAM without swapping, the program should run very fast. Doubling the number of items in recBuff will double the speed, if it does not cause swapping. If it does cause swapping, it will dog the performance.
If there are so many records that the finalPos array itself takes an excessive amount of space, a temp file could be created which interleaves the source data with the finalPos items (since finalPos is always read in order). That would free up more space for recBuff.





First of all I have to ask, if this is the algorithm called "bucket sort" / "radix sort", because you didn't mentioned any comparison operations.
The point I don't get is how one record is classifed to the current recBuf()borders.
I will demonstrate my lack of understanding with an example.
Given an unsorted array: 5,3,2,9,1,8,2,4,7,2,6. Let's assume that my internal memory can only hold 3 values
First, I read the entire array and my goal is to classify 1,2,2 into the first recBuf(0..2). And that's my problem of understanding, how can I know that "3" belongs to the second recBuf(3..5).
There are three instances of "2" and so the array is not uniformly distributed.





How many records are there, how big are the keys, and how big are the records? Do you have one, two, or more disk drives available for processing?
If the keys are small enough (and there are few enough of them) that they can all fit into memory, you should start by sorting the keys (each one accompanied by an integer giving the location in the original file). Then proceed as I described.
If the number of records so large that e.g. only 10% of the keys will fit into memory, then I would suggest that you come up with some means of partitioning the keys into, say, 65,536 buckets. It doesn't matter whether the distribution is particularly even, provided that no single bucket holds more than 10%, and preferably no more than 2% or so. Make a pass through the file and count how many keys fit into each bucket.
Once that is done, count how many buckets one could add, starting at the bottom, before they totaled 10% of the records. Make a pass through the original file reading into RAM all the records that fit into those buckets. Then sort them in RAM and write them out. Then repeat the procedure, starting the the bucket after the last one that was used in the first pass. Then sort those and write those out.
The exact procedure you should use will vary depending upon what your data looks like and the number of separate disks available. Nonetheless, the key observations are (1) it's often good to sort with records containing just key and a reference to the original record, since the number of such records that can fit in RAM is larger than the number of full records that would fit; (2) it's better to think in terms of reading through a whole file, fetching some data into RAM and ignoring other data, than to think in terms of grabbing lots of little pieces of data scattered through a file; (3) though I haven't touched on this much, for really big jobs, having two or three hard drives will help things a lot.





"Mathematicians in California could be in line for a $100,000 prize (£54,000) for finding a new prime number which has 13 million digits."
BBC article[^].
Looks like it was done using distributed computing.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.




