
Imagine the following interactive mathematical application:
 the data loaded in the application is split in two different 'types'
 the first type, let's call it the 'interactive' side of the application, contains data that is manipulated (added, deleted, changed, ...) by the user. It's effect on other data can be immediately calculated and we can quite easily determine which parts of the screen needs to be redrawn to make the screen back consistent
 the second type, let's call it the 'constraint' side of the application, contains data that is less frequently manipulated by the user. But if it is manipulated, it could have an impact on any element of the 'interactive' side.
The problem is that there is no onetoone relation between elements in the 'constraints' side and elements in the 'interactive' side. If we want to show the user an alwaysuptodate 'interactive' view, certain complex calculations are needed which can easily take up several minutes.
My current approach is to disable the 'interactive' side from the moment anything is changed on the 'constraint' side. The user can then continue to work on the 'constraint' side, but if he wants to go back to the 'interactive' side of the application, he will have to perform a full recalculation.
Result: a slow, but always consistent application (regarding the 'interactive' side).
The alternative could be to keep the 'interactive' side enabled while the user is changing elements in the 'constraint' side. Often the user can know what the impact is of his changes on the 'interactive' side and simply pressing F5 on the correct cell in the interactive grid can then calculate the effects from the 'constraint' side to that specific cell of the 'interactive' side.
Result: a faster, but not always consistent application, meaning the user could draw conclusions from incorrect values.
Does anyone of you have an application with a similar problem (choosing between slow and consistent or fast and inconsistent)? Any tips on how to have a fast(er) application with still having a quite good data consistency in the 'interactive' side?
Thanks in advance for your tips.
Enjoy life, this is not a rehearsal !!!





Hi,
First of all I would not mind if you were to give a more concrete description of your app...
Second, IMO if the app, due to changes in the constraints, is going to modify the interactive fields, then those fields must be disabled (or even hidden) for as long as they are unstable. It is completely unacceptable to have the user enter data and later find out it gets modified or even deleted automatically after a while.
Finally if a calculation takes minutes, it is a candidate for optimization. No one wants to wait any longer than necessary in front of a screen.





I'm afraid that's almost all the information I can give without going into the gory details.
The problem is that in many cases, if the user changes constraint X it only has an impact on interactive field Y, and in 90% of the cases, the user knows this, but the application doesn't.
The application can only know this after performing a global consistency check, which can take several minutes, due to the use of complex mathematical formulas.
Compare this with pivot tables in Excel. Changing the slightest value in a sheet may invalidate a cell in the pivot table.
Excel doesn't automatically recalculate the whole pivot table because it probably takes too much time, but in practice and in most cases the user could know what the exact effect is, and live without a pivot table that is not necessarily uptodate.
Enjoy life, this is not a rehearsal !!!





You need to identify the dependencies between data elements in the interactive side and elements in the constraint side. Then when a constraint element is changed, you only update the interactiveside data which is dependent on it.
This structure has been studied in Artificial Intelligence for some time. The main references are:
Doyle, J., "A Truth Maintenance System," Artificial Intelligence 12 (1979) 231272.
de Kleer, J., "An AssumptionBased TMS," Artificial Intelligence 28 (1986), 127162.






You're not uniformly sampling the entire space, thats why there "seems" to be a bias to one area in your image  you should not be using an exponential distribution.
the crux of the innerloop of the algorithm is:
loop begin:
random_point = select_uniformly_random_point_from(minx,miny,maxx,maxy);
nearest_point_in_graph = nearest(Graph,random_point);
new_point = nearest_point_in_graph + deltaQ nearest_point_in_graph  random_point
add_to_graph(Graph,nearest_point_in_graph,new_point);
loop end
yours does't look like that at all.
Note:
1. The routine "select_uniformly_random_point_from", when plotting the points its selects should look similar to this image: http://www.codeproject.com/KB/recipes/Wykobi/wykobi_randpntsaabb.png[^] if it doesn't then you have a problem.
2. this statement "new_point = nearest_point_in_graph + deltaQ nearest_point_in_graph  random_point" looks something like this in code:
new_point.x = nearest_point_in_graph.x + deltaQ nearest_point_in_graph.x  random_point.
new_point.y = nearest_point_in_graph.y + deltaQ nearest_point_in_graph.y  random_point.y
3. Don't mix graphical or GUI aspects with computational code, its generally considered bad practice.





Don't cross post. You've posted this same thing in four different forums.
"The clue train passed his station without stopping."  John Simmons / outlaw programmer
"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks"  Pete O'Hanlon
"Not only do you continue to babble nonsense, you can't even correctly remember the nonsense you babbled just minutes ago."  Rob Graham





I've been working on improving my Sieve of Eratosthenes routine ever since I first had to write one in Pascal way back in college (well, not constantly), always trying to reach a higher maximum. I just got around to writing one in C#.
I was reading up on Wheel Factorization (at least the type described on Wikipedia) as a way to extend my reach while using the same amount of memory. It turns out that, because I consider only the odd integers, I am already using a simple form of Wheel Factorization. Now I'm interested in getting to the next step, but I'm stumped by the statement:
"
The remaining numbers in the wheel contain mostly prime numbers. Use other methods such as Sieve of Eratosthenes to remove the remaining nonprimes.
"  Wikipedia
When I look at the example given, I don't see how Sieve of Eratosthenes can be successfully applied to each spoke separately. It appears I have to allocate memory for all the spokes I need to check (1 and 5) and sieve them concurrently, which won't extend the range (for a given amount of memory) as much as I was hoping.
Let's say I want to allocate only enough memory for twelve flags.
(1) With a onespoke wheel we get the classic Sieve:
1 2 3 4 5 6 7 8 9 10 11 12
(2) But we know we don't need to check the even numbers, so we can use those twelve flags to represent only the odd numbers and thereby double our reach:
1 3 5 7 9 11 13 15 17 19 21 23
(3) If we use a fullyallocated twospoke wheel we still only reach 12, which gains us nothing:
1 3 5 7 9 11
2 4 6 8 10 12
(4) But we don't need to allocate spoke 2, so we can use that memory to double our reach:
1 3 5 7 9 11 13 15 17 19 21 23
2 4 6 8 10 12 14 16 18 20 22 24 << not allocated
It should be apparent that (4) is equivalent to (2), notice that the sieve can successfully be applied to spoke 1.
(5) When we get to a 6spoke wheel we can have it fullyallocated, which is little better than (3):
1 7
2 8
3 9
4 10
5 11
6 12
(6) Or, preferably, only allocate one spoke at a time as I did in (4):
1 7 13 19 25 31 37 43 49 55 61 67
2 8 14 20 26 32 38 44 50 56 62 68 << not allocated
3 9 15 21 27 33 39 45 51 57 63 69 << not allocated
4 10 16 22 28 34 40 46 52 58 64 70 << not allocated
5 11 17 23 29 35 41 47 53 59 65 71 << to be allocated later
6 12 18 24 30 36 42 48 54 60 66 72 << not allocated
In theory, this will extend the reach by a factor of six; in practice it won't, because 25 and 55 appear in spoke 1 while their factors are in spoke 5.
Unlike (4) we can't successfully apply Sieve of Eratosthenes to each spoke separately.
(7) So we have to allocate both spoke 1 and spoke 5, yielding only a tripling of our reach:
1 7 13 19 25 31
2 8 14 20 26 32 << not allocated
3 9 15 21 27 33 << not allocated
4 10 16 22 28 34 << not allocated
5 11 17 23 29 35
6 12 18 24 30 36 << not allocated
And then we have to be able to apply Sieve of Eratosthenes to both spokes concurrently. Which is where I'm at now.
I see how it can be done, but I suspect that the calculations involved may not be worth the trouble.
Have any of you done this?





Hi PIEBALD,
PIEBALDconsult wrote: Have any of you done this?
no I have not done sieves for a long time. I have done some large prime stuff, where simple sieves are not appropriate.
Wheel sieves become effective only if you use several primes at once, so do not use just 2 and 3 to get 2 spokes out of 6; instead use say 2, 3, 5, 7, 11, 13, 17 and notice you will have a large number of spokes (equal to the product of the primes used, hence more than 1000) of which most don't need to be allocated.
The aim of the wheel stuff is to reduce memory needs, hence extending the possible range for a given
memory footprint; when implemented appropriately, I expect it to be somewhat faster than a regular sieve with the same range, since a number of divisions/remainder operations are avoided, at the expense of some bookkeeping code. Unfortunately the missing spokes correspond mopstly to the most obvious nonprimes, those that are recognized easily since they fail one or more of the very early division tests (with low divisors such as 2, 3, 5...) , so it is essentially the cheapest nonprimes that get skipped, resulting in a modest cycles gain; the memory gain however is significant.
One optimization aspect is about how to store the spokes; they should aim for a good locality of data
(hence reusing data that is already in cache), while not causing cache trashing, so if each spoke were an array, don't give array dimensions that are powers of 2, but rather go for some strange value (a prime is often a good choice for array sizes) in this respect.
Once you got the spoke logic correct and efficient, you can easily experiment and enlarge the collection of basic primes, yielding more spokes, and more cycles gained. You then can research the optimal set of spoke primes as a function of available memory...
Enjoy!





Luc Pattyn wrote: Once you got the spoke logic correct and efficient, you can easily experiment and enlarge the collection of basic primes
Right, that's where I'm at. And it looks like the tactic that seemed to work with six spokes won't work with thirty spokes so it's not a general solution.





PIEBALDconsult wrote: won't work with thirty spokes
So it seems a change of tactics is what you need.





Yes, it's time to use "the little grey cells".





Got it working* last night, with one through thirty spokes.
* Well, mostly working, when I ran it with thirty spokes up to 2^32 I got one more "prime" than I did with the old version, so I have to track that down.
P.S. The new one is correct, the old one missed 65537  SQRT(2^32)+1  so I need to fix the old one.
modified on Wednesday, November 26, 2008 3:29 PM





Hi,
1.
65537 is F4; failing to report it is something Fermat will not take lightly.
2.
The most popular bug in prime programs seems to be a "less than SQRT(n)" test ignoring rounding stuff;
I tend to make sure and do "less/equal 1+SQRT(n)" occasionally wasting some cycles but not generating false primes. But your mistake must have been something different, since you skipped a real prime.
3.
now may be the time to switch to long integers in order to extend the capabilities of your app.
After that Win64 and lots of memory may become relevant.
4.
By entering an article on the subject, you might be awarded a CodeProject Spokesperson Certificate...





I am stepping by two, I get to 65537 and stop the first loop but don't report it as a prime, then I begin the second loop by incrementing again. I'm not terribly happy with either solution I've tried.
I am using long integers, but until I got my wheel working I could only allocate enough memory for up to 2^32. I'll try for more later (overnight). 2^32 is taking about an hour and forty minutes (I think it's IObound).
Luc Pattyn wrote: Spokesperson
Not if they call it that.
Besides I don't want students to cheat by using my code.





PIEBALDconsult wrote: I think it's IObound
looking for primes is IObound?
Intel would better redirect its research efforts then.






OK,
tried a text file, took forever (I killed it after some 10 minutes, when it reached 4GB). It is both huge and slow, due to the string formatting stuff I guess.
tried a binary file, took 6 minutes all together for almost 2GB (storing more than 200 million longs); that seems acceptable (simplified model: 5 minutes calculation, then 1 minute writing, i.e. writing 30MB/sec; most primes are above SQRT and don't need much calculation).





I just finished a little experiment, calculating all primes < 2^32 in a simple Eratosthenes sieve. With long integers where appropriate, 1 bit per candidate (32 candidates in an int), half a gigabyte of working set, and without spokes, without threading, without assembly or SIMD instructions, it took less than 5 minutes (on a simple Intel T8300 at 2.4GHz) to find there are 203280221 primes, which got confirmed by http://www.pscconsulting.ca/fenske/cpjav18s.txt[^]. The only sophistication I applied was avoiding almost all divide/modulo operations.
And indeed listing them to a ListBox would dramatically increase the time (I did not wait for it).
With indices in C# being int (i.e. effectively 31bit), I could theoretically stretch this to 2G of longs, or 128G candidates; obviously I would run out of physical RAM long before that, without spokes. With spokes, not sure yet how far I could go.
[ADDED] One factor strongly in favor of spoked sieves is marking the smallest primes is what takes the longest, since the smallest primes have the largest number of multiples that must get marked. [/ADDED]





Pentium 4, 3GHz, 1GB RAM
Using an array of ulong s, without the IO...
Thirtyspoke : 203280221 primes found in 00:08:25.7221373.
Implied Twospoke: 203280221 primes found in 00:13:15.0680712.
Luc Pattyn wrote: The only sophistication I applied was avoiding almost all divide/modulo operations
I'll have to figure out how to do that. It's all these calculations that I was concerned about.





When you count primes, do you count them as you go (as I'm doing so far) or wait until the sieve is complete and then count them?
I'm considering changing my code to do all the sieving, then write the raw arrays to a file, and count or print them later.
I keep putting off working on threading.





Hi,
I do everything as I go, when the sieve is complete all has been done already, the only thing left is
to print a summary line (and close the output file, if any).
Threading is not helping that much, I got the code that fast that the only thing threading buys me
is having a second thread preparing the next buffer, shaving off some 30% of total elapsed time.
I see no way to gain more from threading than that; reason is in the end the problem is cachebandwidth limited and not CPUcycle limited.





Luc Pattyn wrote: shaving off some 30% of total elapsed time
I'd take that. With threading, mine was slower, I except because of all the thread creates.
I expect that I could create one or two threads that perform work as I provide it and otherwise sleep.
Luc Pattyn wrote: cachebandwidth limited
Which I'm probably having too because I can't perform all the work on one spoke, then go to the next; I have to do a small amount of work on one, go to the next, and continue cycling through all the spokes until they're all done.
...
I just had a thought... I could copy a vertical slice of the table into a separate array and work on it with many fewer (1/64?) iterations through my List<Spoke>! I can also make that part of the process quicker; I just realized how badly I implemented it.





PIEBALDconsult wrote: I could create one or two threads that perform work as I provide it and otherwise sleep.
Yep, I have the main thread, one backgroundworker to do all the work, and one more thread to help the BGW; it gets created once, and two AutoResetEvents are needed to synchronize with the BGW.
PIEBALDconsult wrote: I can't perform all the work on one spoke, then go to the next; I have to do a small amount of work on one, go to the next,
Yes, that is what I meant long ago when I said I would try to merge the data of the spokes. The spokes theory makes a nice picture, but it does not really show in my code.





I don't know anything about Wheel Factorization, but I did my playing around with a massive bit map. I only used the odd bits so the least significant bit was 1, then 3, 5, .... I created byte bounded masks for each of the smaller primes, then masked out blocks of bits through the array. I saved the lowest primes on a file, then I scanned the array for the first bit still set, then masked bigger and bigger blocks of bits and masked some more. Save a bit, make a mask, mask bits in the array, then find a new bit.
I also saved the masks in a file. My massive bit map was 2GB representing numbers to 4GB. When I had done that, I recalculated the start for each mask for the next 2GB of bits, and continued.
I finally gave up and just found a source for the first 15 million primes.
Dave.




