|
With many places to stop for wine and cheese.
I made some severe changes yesterday, but the worst of the calculations still need work.
Update:
Target=4294967296; Spokes=(30,8); Bytes=143165632; Elapsed=00:04:48.9588449; Primes=203280221
Target=17179869184; Spokes=(30,8); Bytes=572662336; Elapsed=00:19:56.7513899; Primes=762939111
(That's 2^34. This is decreased from thirty-one minutes yesterday.)
I still need to eliminate a div and a mod that get used a gazillion times.
modified on Sunday, November 30, 2008 11:44 AM
|
|
|
|
|
I have my Sieve working, I think. What I am looking for is the 4 or 5 primes
around the boundary of 2^(32*k) to validate the Sieve Blocks that I have
created - primes just less and just greater than 2^32, 2^64, 2^96, 2^128, 2^160,
2^192, 2^224, 2^256 and 2^288. I would like to know that my algorithm is correct
before I try to create 2048 (or more) Sieve Blocks.
Does anyone have such numbers gathering dust? The first 15 Million Primes do not
even come close to the first boundary, much less the 9th.
Prime number counts for each block: Millisecond timing
Sieve Block 0 = 203,280,220 primes. 4203 I/O 310344 Masking
Sieve Block 1 = 190,335,585 primes. 4390 I/O 512266 Masking
Sieve Block 2 = 186,011,076 primes. 7515 I/O 564594 Masking
Sieve Block 3 = 183,312,229 primes. 7468 I/O 651703 Masking
Sieve Block 4 = 181,354,450 primes. 6941 I/O 619688 Masking
Sieve Block 5 = 179,823,053 primes. 7722 I/O 832343 Masking
Sieve Block 6 = 178,574,138 primes. 5438 I/O 647093 Masking
Sieve Block 7 = 177,515,527 primes. 5765 I/O 871422 Masking
Sieve Block 8 = 176,604,424 primes. 5282 I/O 570749 Masking
50024 5580202
5630226 = 1:33:50.226
Dave.
|
|
|
|
|
I have just thought of a way to self check my calculations for determining the
correct bit to use as a start for a Sieve Block that is not Sieve Block 0. The
first Sieve Block, Sieve Block 0, is a lead pipe cinch. It IS correct. The
problem is whether or not the subsequent blocks are correct - whether or not all
bits have been examined to remove from the subsequent blocks, or whether or not
the correct starting bit was calculated for any Prime Number that is not in that
Sieve Block. My thought was to increase the size of the Sieve Block being
processed to twice the size of the original. Essentially, calculate the Sieve
for two blocks as if they were one. There would be no calculations to make for
switching from one Sieve Block to the next, any more than would need to be made
in passing from one half of a single Sieve Block to the second half of the
block, the data bits are contiguous. The original Sieve Block 0 would be copied
to a .SAV file. Once the new pair of Sieve Blocks has been calculated, compare
the new Sieve Block 0 to the saved copy, if they are the same, then continue
(they WILL be the same for Sieve Block 0), then save the new Sieve Block 0 as a
.SIV file, and save the second Sieve Block as a .SAV file (it is a valid Sieve
Block 1 since it was masked contiguously with Sieve Block 0). Now calculate the
start of Sieve Block 1, and create a new pair of Sieve Block 1 and Sieve Block 2
as was done for Sieve Block 0 and Sieve Block 1. Again compare Sieve Block 1
with the prior saved .SAV file, and if they match, save the new Sieve Block 1 as
a .SIV file, and Sieve Block 2 as a .SAV file. Each Sieve Block build builds a
correct block to validate the next block when it is created. Twice the work, but
you have validation. Actually, the actual masking of the bits is quite fast, the
reading and writing and calculations for the starting bit are the time killers.
One question. Would it be sufficient to take an MD5 checksum of the second
block, then an MD5 checksum of the next newly created block and compare the MD5
checksum to determine if valid? I think it wise to save a file of the MD5
checksums anyway to be able to validate the correctness of the blocks when read
back in. I would actually save them in a file instead of attaching them to the
end of the file data. The file data is sector and cluster bounded, to add an
extra DWORD would take extra sectors, whereas a file with MD5 checksums would
only appear once. I had planned to use 2 750 GB external hard drives to save the
files giving 1500 GB storage, and 6000 Sieve Blocks (1/4 GB per Sieve Block).
This would be all primes below 2^188000.
Does anyone see any problems with this approach?
Dave.
|
|
|
|
|
I did some research yesterday and came up with a banding technique, similar to what Dave describes. I currently scan [0,2^32) for primes in less than 4 minutes, using a small buffer only (less than 1MB). Apart from speed it has the advantage of being able to scan much larger ranges too. I think I'll turn it into a CP article in a few days/weeks, there still are some areas that need more investigation first.
|
|
|
|
|
I have it working for up to the first six primes, but it appears that using the first three offers the best performance (to 2^32):
C:\>Eratosthenes 4294967296 Zero
Target=4294967296; Spokes=(1,1); Bytes=(536870920,536870920); Elapsed=00:21:24.1512386; Primes=203280221
C:\>Eratosthenes 4294967296 One
Target=4294967296; Spokes=(2,1); Bytes=(268435464,268435464); Elapsed=00:09:32.2740256; Primes=203280221
C:\>Eratosthenes 4294967296 Two
Target=4294967296; Spokes=(6,2); Bytes=(89478488,178956976); Elapsed=00:05:51.1191606; Primes=203280221
C:\>Eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:04:53.7847739; Primes=203280221
C:\>Eratosthenes 4294967296 Four
Target=4294967296; Spokes=(210,48); Bytes=(2556536,122713728); Elapsed=00:07:46.0915766; Primes=203280221
C:\>Eratosthenes 4294967296 Five
Target=4294967296; Spokes=(2310,480); Bytes=(232416,111559680); Elapsed=00:41:44.5200097; Primes=203280221
And when I tried it using the first six it ran overnight before I killed it.
I still have more to do on it.
|
|
|
|
|
Hi,
I haven't done the actual spoke stuff yet, I decided to first optimize my "implied two-spoke" strategy as you called it; it is now below 1 minute for 2^32 (using longs and memory bands, so I can also search much larger ranges).
I am not sure how to interpret your "Bytes=(..,..)" stuff.
You having six-primed spokes run forever seems to indicate something is still not correct in your code.
I do expect spokes to improve performance a lot, so your results are a surprise to me. I plan to store all relevant spokes in a single array.
Greetings,
|
|
|
|
|
Any possibility of writing an article about it and posting to CP? I would be interested!
|
|
|
|
|
Yes, I plan to write at least one article on the subject; I still need to finish some research first, so it will be a few more weeks before anything shows though.
|
|
|
|
|
|
No! It'll be mis-used by students.
|
|
|
|
|
Potential misuse should not stop us now; it did not stop mankind from coming up with good and bad things such as fire, steam engines, cars, roads, arms, computers, Internet, and Windows.
I'll make the article long enough so none of the lazy students will ever read it.
|
|
|
|
|
That might work. It would be better in French, though.
|
|
|
|
|
PIEBALDconsult wrote: It would be better in French
The CP police won't allow it, they are very reluctant to articles and messages of which they don't understand most of the words.
modified on Wednesday, December 3, 2008 11:52 PM
|
|
|
|
|
Certain optimizations can be made to a classic or implied-two-spoke sieve that won't work for more spokes.
If it's hard-coded to only count the primes, it'll be quicker too.
For mine, I pass in a delegate of what to do with each prime, so I have a lot more method calls.
C:\>Eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:04:53.1354700; Primes=203280221
^ ^ ^ ^
| | | |
Total spokes --------------- | | |
Allocated spokes -------------- | |
Bytes per array ------------------------------ |
Total bytes allocated for arrays ----------------------
Luc Pattyn wrote: I do expect spokes to improve performance a lot
Try it, you'll see.
How would you cross off the 25?
1 7 13 19 25 31
5 11 17 23 29 35
This is an area that still needs work, and I think I may have an idea...
|
|
|
|
|
Hi,
thanks for the clarification.
PIEBALDconsult wrote: I pass in a delegate of what to do with each prime
I don't. I might do that when I am done, right now I do not want to wait for a machine to execute millions of delegates I do not really need. I call a non-virtual method for each prime; it displays a few of them, and keeps track of count.
PIEBALDconsult wrote: Try it, you'll see.
I will.
PIEBALDconsult wrote: How would you cross off the 25?
I am considering only one array containing the primes of all relevant spokes; I currently store 1, 3, 5, 7, 9, 11, 13, 15 etc because the odd one is the only relevant spoke in a two-spoke setup. In the six-spoke scenario that would be 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, etc.
Therefore 25, just like any other non-prime, will be flagged as composite, every time it appears as a multiple of one of its factors; in this case that would be only once, and it would also be the first relevant multiple of prime 5.
|
|
|
|
|
Update:
I now have a Core 2 Quad and 4GB RAM (still running XR Pro), and after a few more tweaks I have:
C:\>eratosthenes 4294967296 Zero
Target=4294967296; Spokes=(1,1); Bytes=(536870920,536870920); Elapsed=00:11:24.2121472; Primes=203280221
C:\>eratosthenes 4294967296 One
Target=4294967296; Spokes=(2,1); Bytes=(268435464,268435464); Elapsed=00:05:14.2174818; Primes=203280221
C:\>eratosthenes 4294967296 Two
Target=4294967296; Spokes=(6,2); Bytes=(89478488,178956976); Elapsed=00:03:21.4457024; Primes=203280221
C:\>eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:02:59.1627191; Primes=203280221
C:\>eratosthenes 4294967296 Four
Target=4294967296; Spokes=(210,48); Bytes=(2556536,122713728); Elapsed=00:05:38.4098073; Primes=203280221
C:\>eratosthenes 4294967296 Five
Target=4294967296; Spokes=(2310,480); Bytes=(232416,111559680); Elapsed=00:36:54.0737311; Primes=203280221
Now I have to add threading... I'll be back.
Update to the update:
I tweaked a bit more (inlined some methods) so now I get:
C:\>eratosthenes 4294967296 Zero
Target=4294967296; Spokes=(1,1); Bytes=(536870920,536870920); Elapsed=00:12:00.3882952; Primes=203280221
C:\>eratosthenes 4294967296 One
Target=4294967296; Spokes=(2,1); Bytes=(268435464,268435464); Elapsed=00:05:24.4898381; Primes=203280221
C:\>eratosthenes 4294967296 Two
Target=4294967296; Spokes=(6,2); Bytes=(89478488,178956976); Elapsed=00:03:24.4878007; Primes=203280221
C:\>eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:02:46.4519220; Primes=203280221
C:\>eratosthenes 4294967296 Four
Target=4294967296; Spokes=(210,48); Bytes=(2556536,122713728); Elapsed=00:04:22.4536602; Primes=203280221
C:\>eratosthenes 4294967296 Five
Target=4294967296; Spokes=(2310,480); Bytes=(232416,111559680); Elapsed=00:24:59.2846800; Primes=203280221
C:\>eratosthenes 4294967296 Six
Target=4294967296; Spokes=(30030,5760); Bytes=(17880,102988800); Elapsed=06:30:34.0253234; Primes=203280221
Notice the first successful run of 2^32 with six primes!
Further update:
C:\>eratosthenes 4294967296 Six
Target=4294967296; Spokes=(30030,5760); Bytes=(17880,102988800); Elapsed=05:53:56.1556709; Primes=203280221
I did try threading, but it was awful, I'll try again later.
modified on Monday, January 12, 2009 10:05 AM
|
|
|
|
|
My requirement is : I will be having billions of data records coming in per day.From those I have to filter some 1000-2000 records based on record names which are stored in a filter.
My doubt is what is the suiatble data structure to be used and which search algo will meet this high volume requirement.
Sanchit
|
|
|
|
|
Sanchit,
You need to be more specific. In what format are the input records or is this something which can be defined for this implementation? In what form are the record names in the filter and can this format be defined for this implementation?
Dave.
|
|
|
|
|
The most efficient data structure for lookups is the hash table.
Create a hash table with a size that's about three times the number of names you're looking for, that's a prime number. Insert the 1000-2000 names you're looking for.
Then for each data record that comes in, look up the name in the hash table. Use a language like C++, which executes faster than C#.
|
|
|
|
|
You may also want to look at a radix tree:
http://en.wikipedia.org/wiki/Radix_tree[^]
It will save the time required to calculate a hash for each of the billions of records at the cost of a longer search per record.
Hash table:
- calc hash of record (slow)
- check table for hash value (quick)
- handle searching collisions (quick)
- if found matching hash in table do a full string compare on string record and hash table record as hash algo can produce same hash for different strings. (slow)
Radix tree:
- search tree for record string (slow)
The radix should be faster. The hash function will likely have a similar cost to the radix tree search. It may even be worse as the hash looks at each character in the string whereas the radix search can fail out early.
...cmk
The idea that I can be presented with a problem, set out to logically solve it with the tools at hand, and wind up with a program that could not be legally used because someone else followed the same logical steps some years ago and filed for a patent on it is horrifying.
- John Carmack
|
|
|
|
|
The initial post mentioned that the search would be for record names that were stored in a filter. If only a field of the record (the name) needed to be checked, wouldn't that reduce the effort to create a hash - in fact wouldn't a Radix search be faster? That is why I asked for more information about the input and filter format.
Dave.
|
|
|
|
|
Member 4194593 wrote: If only a field of the record (the name) needed to be checked,
I understood that each record would contain a (single) filter field, that's why i suggested as radix tree. A hash would likely be more approriate if you wanted to match several fields.
Member 4194593 wrote: wouldn't that reduce the effort to create a hash
Yes. Sounds like you're in favour of using a hash table.
Member 4194593 wrote: in fact wouldn't a Radix search be faster?
Yes, based on my op. Sounds like you're in favour of using a radix tree.
Original post: 'based on record names'
I assume this to mean strings, in which case my op stands.
...cmk
The idea that I can be presented with a problem, set out to logically solve it with the tools at hand, and wind up with a program that could not be legally used because someone else followed the same logical steps some years ago and filed for a patent on it is horrifying.
- John Carmack
|
|
|
|
|
Good point. Some other pros and cons: A radix tree would require more memory accesses. But on the other hand, a radix tree would take less memory than a hash table, so would be more likely to fit entirely in a cache.
The hash-value calculation wouldn't necessarily be slow; it would be all integer operations on a key that could be in a single cache line. Also the number of collisions can be reduced in two ways:
1. Randomly-hashed values follow a Poisson distribution, so the expected number of collisions can be calculated for a given hash-table size. The table size can be increased to achieve an acceptable collision rate.
2. Assuming the list of names is fixed, a perfect hash function http://en.wikipedia.org/wiki/Perfect_hash_function[^] can be constructed, which will have no collisions.
|
|
|
|
|
If it's necessary to check billions of records per day, that sounds like enough computer work--regardless of algorithm--that profiling will be necessary to achieve optimal performance. Radix trees can be good if the records to be searched for differ from other records in the characters that are examined first. They can be bad if records often differ only in the characters that are examined much later.
With hash functions, there is often going to be some trade-offs involving the time required to perform a hash, the hash table size, and the frequency of hash collisions and false matches. Increasing the complexity of a hash function such that the time required increases by more than half the cost of a direct record comparison won't make much sense if doing so unless it saves an average of one direct record comparison for every two records. Having a hash table so large that every access entails a cache miss may slow things down more than would the occasional hash collision or false match.
One approach that may be helpful would be to compute a somewhat long hash of a record, hash that to produce an index into a hash table, and have the table itself store the 'long' hash in addition to the actual record. One could thus at very little expense test the long hash of the record to be checked against the long hash in the table. This would eliminate a large fraction of the direct record comparisons that would otherwise be needed.
BTW, how fast are modern machines at 32x32 integer multiplication? If the strings are zero-padded as well as zero-terminated and multiplies are cheap, it might be good to do something like:
unsigned long *ptr;
unsigned long hash,temp;
do
{
temp = *ptr++;
hash += hash >> 23;
hash ^= temp;
hash *= some_big_prime_number;
} while(temp);
That would gobble up 32 bits of string data per loop iteration; I would expect the multiply, even if it takes multiple cycles, could be done concurrently with the next word fetch. Not the world's most sophisticated hash function by any stretch, but the execution time should be pretty minimal. If you want to use the hash function to yield a table index, use the upper bits. Taking the modulus of a prime number (distinct from the one used for the multiply) may be better, but modulus operations are more expensive than shifts.
|
|
|
|
|
Interesting points. Your comment about better performance if the records differ in the characters examined first reminded me of the ID3 algorithm http://en.wikipedia.org/wiki/ID3_algorithm[^]. Basically it builds a decision tree that first examines the attributes of a record that differ the most. This avoids wasting time examining attributes that don't differ much among the records.
Your point about cache misses is good; for this volume of data the best data structure is one that fits in the cache.
|
|
|
|
|