|
When you count primes, do you count them as you go (as I'm doing so far) or wait until the sieve is complete and then count them?
I'm considering changing my code to do all the sieving, then write the raw arrays to a file, and count or print them later.
I keep putting off working on threading.
|
|
|
|
|
Hi,
I do everything as I go, when the sieve is complete all has been done already, the only thing left is
to print a summary line (and close the output file, if any).
Threading is not helping that much, I got the code that fast that the only thing threading buys me
is having a second thread preparing the next buffer, shaving off some 30% of total elapsed time.
I see no way to gain more from threading than that; reason is in the end the problem is cache-bandwidth limited and not CPU-cycle limited.
|
|
|
|
|
Luc Pattyn wrote: shaving off some 30% of total elapsed time
I'd take that. With threading, mine was slower, I except because of all the thread creates.
I expect that I could create one or two threads that perform work as I provide it and otherwise sleep.
Luc Pattyn wrote: cache-bandwidth limited
Which I'm probably having too because I can't perform all the work on one spoke, then go to the next; I have to do a small amount of work on one, go to the next, and continue cycling through all the spokes until they're all done.
...
I just had a thought... I could copy a vertical slice of the table into a separate array and work on it with many fewer (1/64?) iterations through my List<Spoke>! I can also make that part of the process quicker; I just realized how badly I implemented it.
|
|
|
|
|
PIEBALDconsult wrote: I could create one or two threads that perform work as I provide it and otherwise sleep.
Yep, I have the main thread, one backgroundworker to do all the work, and one more thread to help the BGW; it gets created once, and two AutoResetEvents are needed to synchronize with the BGW.
PIEBALDconsult wrote: I can't perform all the work on one spoke, then go to the next; I have to do a small amount of work on one, go to the next,
Yes, that is what I meant long ago when I said I would try to merge the data of the spokes. The spokes theory makes a nice picture, but it does not really show in my code.
|
|
|
|
|
I don't know anything about Wheel Factorization, but I did my playing around with a massive bit map. I only used the odd bits so the least significant bit was 1, then 3, 5, .... I created byte bounded masks for each of the smaller primes, then masked out blocks of bits through the array. I saved the lowest primes on a file, then I scanned the array for the first bit still set, then masked bigger and bigger blocks of bits and masked some more. Save a bit, make a mask, mask bits in the array, then find a new bit.
I also saved the masks in a file. My massive bit map was 2GB representing numbers to 4GB. When I had done that, I recalculated the start for each mask for the next 2GB of bits, and continued.
I finally gave up and just found a source for the first 15 million primes.
Dave.
|
|
|
|
|
Member 4194593 wrote: to 4GB
That's 2^32, yes? How long did it take? I haven't gone beyond that yet because I want to improve the speed of what I have so far.
Member 4194593 wrote: I finally gave up and just found a source for the first 15 million primes.
It's the journey, not the destination.
|
|
|
|
|
It is an endless journey with many scenery routes. Not even sure what the destination is.
|
|
|
|
|
With many places to stop for wine and cheese.
I made some severe changes yesterday, but the worst of the calculations still need work.
Update:
Target=4294967296; Spokes=(30,8); Bytes=143165632; Elapsed=00:04:48.9588449; Primes=203280221
Target=17179869184; Spokes=(30,8); Bytes=572662336; Elapsed=00:19:56.7513899; Primes=762939111
(That's 2^34. This is decreased from thirty-one minutes yesterday.)
I still need to eliminate a div and a mod that get used a gazillion times.
modified on Sunday, November 30, 2008 11:44 AM
|
|
|
|
|
I have my Sieve working, I think. What I am looking for is the 4 or 5 primes
around the boundary of 2^(32*k) to validate the Sieve Blocks that I have
created - primes just less and just greater than 2^32, 2^64, 2^96, 2^128, 2^160,
2^192, 2^224, 2^256 and 2^288. I would like to know that my algorithm is correct
before I try to create 2048 (or more) Sieve Blocks.
Does anyone have such numbers gathering dust? The first 15 Million Primes do not
even come close to the first boundary, much less the 9th.
Prime number counts for each block: Millisecond timing
Sieve Block 0 = 203,280,220 primes. 4203 I/O 310344 Masking
Sieve Block 1 = 190,335,585 primes. 4390 I/O 512266 Masking
Sieve Block 2 = 186,011,076 primes. 7515 I/O 564594 Masking
Sieve Block 3 = 183,312,229 primes. 7468 I/O 651703 Masking
Sieve Block 4 = 181,354,450 primes. 6941 I/O 619688 Masking
Sieve Block 5 = 179,823,053 primes. 7722 I/O 832343 Masking
Sieve Block 6 = 178,574,138 primes. 5438 I/O 647093 Masking
Sieve Block 7 = 177,515,527 primes. 5765 I/O 871422 Masking
Sieve Block 8 = 176,604,424 primes. 5282 I/O 570749 Masking
50024 5580202
5630226 = 1:33:50.226
Dave.
|
|
|
|
|
I have just thought of a way to self check my calculations for determining the
correct bit to use as a start for a Sieve Block that is not Sieve Block 0. The
first Sieve Block, Sieve Block 0, is a lead pipe cinch. It IS correct. The
problem is whether or not the subsequent blocks are correct - whether or not all
bits have been examined to remove from the subsequent blocks, or whether or not
the correct starting bit was calculated for any Prime Number that is not in that
Sieve Block. My thought was to increase the size of the Sieve Block being
processed to twice the size of the original. Essentially, calculate the Sieve
for two blocks as if they were one. There would be no calculations to make for
switching from one Sieve Block to the next, any more than would need to be made
in passing from one half of a single Sieve Block to the second half of the
block, the data bits are contiguous. The original Sieve Block 0 would be copied
to a .SAV file. Once the new pair of Sieve Blocks has been calculated, compare
the new Sieve Block 0 to the saved copy, if they are the same, then continue
(they WILL be the same for Sieve Block 0), then save the new Sieve Block 0 as a
.SIV file, and save the second Sieve Block as a .SAV file (it is a valid Sieve
Block 1 since it was masked contiguously with Sieve Block 0). Now calculate the
start of Sieve Block 1, and create a new pair of Sieve Block 1 and Sieve Block 2
as was done for Sieve Block 0 and Sieve Block 1. Again compare Sieve Block 1
with the prior saved .SAV file, and if they match, save the new Sieve Block 1 as
a .SIV file, and Sieve Block 2 as a .SAV file. Each Sieve Block build builds a
correct block to validate the next block when it is created. Twice the work, but
you have validation. Actually, the actual masking of the bits is quite fast, the
reading and writing and calculations for the starting bit are the time killers.
One question. Would it be sufficient to take an MD5 checksum of the second
block, then an MD5 checksum of the next newly created block and compare the MD5
checksum to determine if valid? I think it wise to save a file of the MD5
checksums anyway to be able to validate the correctness of the blocks when read
back in. I would actually save them in a file instead of attaching them to the
end of the file data. The file data is sector and cluster bounded, to add an
extra DWORD would take extra sectors, whereas a file with MD5 checksums would
only appear once. I had planned to use 2 750 GB external hard drives to save the
files giving 1500 GB storage, and 6000 Sieve Blocks (1/4 GB per Sieve Block).
This would be all primes below 2^188000.
Does anyone see any problems with this approach?
Dave.
|
|
|
|
|
I did some research yesterday and came up with a banding technique, similar to what Dave describes. I currently scan [0,2^32) for primes in less than 4 minutes, using a small buffer only (less than 1MB). Apart from speed it has the advantage of being able to scan much larger ranges too. I think I'll turn it into a CP article in a few days/weeks, there still are some areas that need more investigation first.
|
|
|
|
|
I have it working for up to the first six primes, but it appears that using the first three offers the best performance (to 2^32):
C:\>Eratosthenes 4294967296 Zero
Target=4294967296; Spokes=(1,1); Bytes=(536870920,536870920); Elapsed=00:21:24.1512386; Primes=203280221
C:\>Eratosthenes 4294967296 One
Target=4294967296; Spokes=(2,1); Bytes=(268435464,268435464); Elapsed=00:09:32.2740256; Primes=203280221
C:\>Eratosthenes 4294967296 Two
Target=4294967296; Spokes=(6,2); Bytes=(89478488,178956976); Elapsed=00:05:51.1191606; Primes=203280221
C:\>Eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:04:53.7847739; Primes=203280221
C:\>Eratosthenes 4294967296 Four
Target=4294967296; Spokes=(210,48); Bytes=(2556536,122713728); Elapsed=00:07:46.0915766; Primes=203280221
C:\>Eratosthenes 4294967296 Five
Target=4294967296; Spokes=(2310,480); Bytes=(232416,111559680); Elapsed=00:41:44.5200097; Primes=203280221
And when I tried it using the first six it ran overnight before I killed it.
I still have more to do on it.
|
|
|
|
|
Hi,
I haven't done the actual spoke stuff yet, I decided to first optimize my "implied two-spoke" strategy as you called it; it is now below 1 minute for 2^32 (using longs and memory bands, so I can also search much larger ranges).
I am not sure how to interpret your "Bytes=(..,..)" stuff.
You having six-primed spokes run forever seems to indicate something is still not correct in your code.
I do expect spokes to improve performance a lot, so your results are a surprise to me. I plan to store all relevant spokes in a single array.
Greetings,
|
|
|
|
|
Any possibility of writing an article about it and posting to CP? I would be interested!
|
|
|
|
|
Yes, I plan to write at least one article on the subject; I still need to finish some research first, so it will be a few more weeks before anything shows though.
|
|
|
|
|
|
No! It'll be mis-used by students.
|
|
|
|
|
Potential misuse should not stop us now; it did not stop mankind from coming up with good and bad things such as fire, steam engines, cars, roads, arms, computers, Internet, and Windows.
I'll make the article long enough so none of the lazy students will ever read it.
|
|
|
|
|
That might work. It would be better in French, though.
|
|
|
|
|
PIEBALDconsult wrote: It would be better in French
The CP police won't allow it, they are very reluctant to articles and messages of which they don't understand most of the words.
modified on Wednesday, December 3, 2008 11:52 PM
|
|
|
|
|
Certain optimizations can be made to a classic or implied-two-spoke sieve that won't work for more spokes.
If it's hard-coded to only count the primes, it'll be quicker too.
For mine, I pass in a delegate of what to do with each prime, so I have a lot more method calls.
C:\>Eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:04:53.1354700; Primes=203280221
^ ^ ^ ^
| | | |
Total spokes --------------- | | |
Allocated spokes -------------- | |
Bytes per array ------------------------------ |
Total bytes allocated for arrays ----------------------
Luc Pattyn wrote: I do expect spokes to improve performance a lot
Try it, you'll see.
How would you cross off the 25?
1 7 13 19 25 31
5 11 17 23 29 35
This is an area that still needs work, and I think I may have an idea...
|
|
|
|
|
Hi,
thanks for the clarification.
PIEBALDconsult wrote: I pass in a delegate of what to do with each prime
I don't. I might do that when I am done, right now I do not want to wait for a machine to execute millions of delegates I do not really need. I call a non-virtual method for each prime; it displays a few of them, and keeps track of count.
PIEBALDconsult wrote: Try it, you'll see.
I will.
PIEBALDconsult wrote: How would you cross off the 25?
I am considering only one array containing the primes of all relevant spokes; I currently store 1, 3, 5, 7, 9, 11, 13, 15 etc because the odd one is the only relevant spoke in a two-spoke setup. In the six-spoke scenario that would be 1, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, etc.
Therefore 25, just like any other non-prime, will be flagged as composite, every time it appears as a multiple of one of its factors; in this case that would be only once, and it would also be the first relevant multiple of prime 5.
|
|
|
|
|
Update:
I now have a Core 2 Quad and 4GB RAM (still running XR Pro), and after a few more tweaks I have:
C:\>eratosthenes 4294967296 Zero
Target=4294967296; Spokes=(1,1); Bytes=(536870920,536870920); Elapsed=00:11:24.2121472; Primes=203280221
C:\>eratosthenes 4294967296 One
Target=4294967296; Spokes=(2,1); Bytes=(268435464,268435464); Elapsed=00:05:14.2174818; Primes=203280221
C:\>eratosthenes 4294967296 Two
Target=4294967296; Spokes=(6,2); Bytes=(89478488,178956976); Elapsed=00:03:21.4457024; Primes=203280221
C:\>eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:02:59.1627191; Primes=203280221
C:\>eratosthenes 4294967296 Four
Target=4294967296; Spokes=(210,48); Bytes=(2556536,122713728); Elapsed=00:05:38.4098073; Primes=203280221
C:\>eratosthenes 4294967296 Five
Target=4294967296; Spokes=(2310,480); Bytes=(232416,111559680); Elapsed=00:36:54.0737311; Primes=203280221
Now I have to add threading... I'll be back.
Update to the update:
I tweaked a bit more (inlined some methods) so now I get:
C:\>eratosthenes 4294967296 Zero
Target=4294967296; Spokes=(1,1); Bytes=(536870920,536870920); Elapsed=00:12:00.3882952; Primes=203280221
C:\>eratosthenes 4294967296 One
Target=4294967296; Spokes=(2,1); Bytes=(268435464,268435464); Elapsed=00:05:24.4898381; Primes=203280221
C:\>eratosthenes 4294967296 Two
Target=4294967296; Spokes=(6,2); Bytes=(89478488,178956976); Elapsed=00:03:24.4878007; Primes=203280221
C:\>eratosthenes 4294967296 Three
Target=4294967296; Spokes=(30,8); Bytes=(17895704,143165632); Elapsed=00:02:46.4519220; Primes=203280221
C:\>eratosthenes 4294967296 Four
Target=4294967296; Spokes=(210,48); Bytes=(2556536,122713728); Elapsed=00:04:22.4536602; Primes=203280221
C:\>eratosthenes 4294967296 Five
Target=4294967296; Spokes=(2310,480); Bytes=(232416,111559680); Elapsed=00:24:59.2846800; Primes=203280221
C:\>eratosthenes 4294967296 Six
Target=4294967296; Spokes=(30030,5760); Bytes=(17880,102988800); Elapsed=06:30:34.0253234; Primes=203280221
Notice the first successful run of 2^32 with six primes!
Further update:
C:\>eratosthenes 4294967296 Six
Target=4294967296; Spokes=(30030,5760); Bytes=(17880,102988800); Elapsed=05:53:56.1556709; Primes=203280221
I did try threading, but it was awful, I'll try again later.
modified on Monday, January 12, 2009 10:05 AM
|
|
|
|
|
My requirement is : I will be having billions of data records coming in per day.From those I have to filter some 1000-2000 records based on record names which are stored in a filter.
My doubt is what is the suiatble data structure to be used and which search algo will meet this high volume requirement.
Sanchit
|
|
|
|
|
Sanchit,
You need to be more specific. In what format are the input records or is this something which can be defined for this implementation? In what form are the record names in the filter and can this format be defined for this implementation?
Dave.
|
|
|
|
|