



This makes me sad as well.
Do you mean finger trees?





no finger search algorithm only!!!!!





Excuse me, first posted in the lounge, but Bill suggested I post here:
I have a range of values (voltage) over time (thousands of minutes, one value per minute). I am trying to chart these. Determining the length of my Y axis is quite a problem for me. If I take a minimum and maximum, and use that as the axis height, one or two zero values result in all the others being scrunched up at the top of the chart. If I remove zeroes, it looks much better, and for a chart, they aren't very important, I'll give all real values in a tabular report.
What I would like to do is determine the average height of the band of data points, sort of the space between the moving average of the low points and that of the heigh points. I figure to do that, I would need a median series, so I could determine a smoothed series of points above and below median, and make my Y axis 's' higher and 's' lower than those.
How do people normally do this?





Do you think of "standard deviation"?





Yes, I just can't place it quite right. The standard deviation during morning could be radically different from afternoon, so I would need a curve of standard deviation, not a single figure.





What about calculating average / standard deviation for hourly sets of data?





For my old computer I need an assembler which is able to take assembled code from a library and link it together in the smallest possible combination.
One 'speciality' of the old CDP1802 processor will force me to write the assembler and linker myself. There are two types of branching instructions: long branches and short branches. Long branches use full 16 bit addresses, but will cause timing issues with the graphics chip. This is an ancient hardware bug.
This is the reason why i must use short branches with short 8 bit addresses. The upper 8 bits are just assumed to remain the same as in the instruction's address. This way memory is segmented into 256 byte blocks. It's not a very strict segmentation as the code can run across the boundaries without any consequences, You just can't loop back with a short branch and long branches can't be used.
The linker will have to puzzle together snippets of code and data with this in mind. At the same time I must be sure that memory usage is as low as possible in the end. My old computer has only 4k RAM, and more than 16k is quite unusual.
The only thing I can think of is to make a memory map of each possible combination and take the one which needs the least amount of memory. There are easily hundreds of small code snippets to be linked and blindly testing every combination will be very slow and inefficient.
First thought: Build a tree with only valid options and then find the branch with the lowest byte count. This is alresy better than brute force, but I hope there is still a more elegant algorithm for this.
The language is JavaScript. that of Mordor, which I will not utter here
I hold an A7 computer expert classification, Commodore. I'm well acquainted with Dr. Daystrom's theories and discoveries. The basic design of all our ship's computers are JavaScript.





Assuming that you subdivide the code into N snippets, each terminated by an unconditional jump (e.g. procedures), you could test each possible sequence out of the N! possibilities.
Note that the maximum savings in bytes that you could achieve are the number of jumps that may be converted from 16bit form to 8bit form. If this number is smaller than the length of the smallest code snippet, you would not be able to use any sort of pruning of the search tree, but would be forced to evaluate all N! leaves of the tree, which might take a long time...
Borland's Turbo Assembler (for x86 processors) had an option whereby it attempted to optimize (conditional) jumps:
1. All jumps were written without qualifiers.
2. The assembler would make multiple passes through the code, applying the following algorithm:
a. If a jump target was within +127/128 bytes, output a short (2byte) jump.
b. If an unconditional jump target was outside that range, output a 3byte jump.
b. If a conditional jump target was outside that range, output a 5byte sequence  jump over the following jump (2 bytes) / jump unconditional to the target (3 bytes).
This was applied in a loop until either no more jumps could be optimized or a predetermined number of loops was reached. Typically, only 23 loops were necessary.
In addition to the automatic method given above, I would try to write each procedure so that the jumps are all 8bit forms. Optimizing a procedure by hand is likely to be much easier than attempting global optimization.
I hope that this helps.






Hi all there,
I need to understatnd that How .NET CLR understands different languages at once and make MSIL (universal code or common code) to ASP.NET engine(i mean in web app)
Thanks in advance
Pitchaiyan C





Wrong forum.
Please stop asking such questions here.
You have a misunderstanding about this  the CLR doesn't understand any languages  it's more like the languages understand the CLR, but even that is a misleading description.
Furthermore, most members here on CP (me anyway) are just ordinary developers who do not know (or care) anything about how the deep internals work. You would probbaly need to contact the developers at Microsoft to gain the level of detail you desire.
The things you are asking about are way beyond what is required for daytoday development of commercial and enterprise applications. If you truly want to understand how it all works, you will likely need a doctorate degree.





I can't find anything wrong to understand things that have developed successfully already, i need your help if you already know it then just you can share it that will increase your's knowledge level also Mr PIEBALDconsult and i need to tell you one thing Sir Isaac Newton didn't have a doctorate degree when he found the gravitational force at all, it can mean you that doctorate degree is not necessary at all to become a big man in knowledge like Sir Isaac Newton.
Thanks Mr.PIEBALDconsult.





Isaac Newton was a genius and figured out his findings by himself.






Hi friends, need help!! i am absloute beginner and need advise. below algorithm is an extrat from a text book but when i try to apply and solve the problem on paper i see that this algorithm will fail. as i get a remainder 0 every digit i key in till 8 (i took number 8 as an examole and applied below)...please advise....also i found that applying this algorithm on number 2 would result in 0 as well and if it is 0 then not prime, then how is this algorithm correct!
1 start
2 read the number num
3 [initialize]
i <2, flag <1
4 repeatnsteps 4 through 6 unitl i
modified 6Sep14 8:03am.





Did you type it over correctly? Steps 5 and 6 are obviously missing, and "until i" is not a condition.





4th step  repeat steps 4 through 6 unitl i <num or flag =0 5) rem <num mod i 6) if rem=0 then flag< 0 else i<i+1 7) if flag =0 then print number is not prime else prit number is prime 8) stop in this step if i use number 2 as an example it would result in 0 which will result it number being nonprime.
 modified 6Sep14 10:12am.





Ok, now it makes more sense. The problem here is that 2 is a special case, and they did not handle it. Simply add a step: if the number is two, it is prime.
So in short, yes, the book is wrong and you are right.
edit: ok now it makes less sense, why are most of the steps gone again? What I wrote above should still apply though.





Thank you , i was editing so the steps can be read properly....atleast i know that i was not wrong. i appricate you help and gives me confidence that peoople online can help me with my problems while i am learning.





also, i found that with this approach if i have number 99 the remainder will show as 1 and which would say it is a prime number, but again 99 is not a prime number.





What step is that at? Obviously you'd get a remainder of 1 for 2, but then in the next step you'd find that the remainder with 3 is 0, and therefore it's not a prime.





won't the remainder be 0 for 2, when you divided 2 by 2?





Sure, but you're not testing 2, you're testing 99, right? The remainder of 99 / 2 is 1, the remainder of 99 / 3 is 0





so assuming the number 2 in these steps in not properly defined and so if we move in the sequence and found another number to be divisible and remainder to be 0 (we should then in this case consider it to be correct) and therefore 99 would not show up as prime in when the algorithm runs, is that correct?





i found the soltion: The flag value can either be 1 or 0 in the course of this program. Flag is just a variable. This is tested in step 7 to determine if the number is prime or not prime. flag value is initially set to 1(during initialization). It may or may not change at step 6 depending on whether rem=0. The value of rem is 0 when the result of the calculation at step 5 is 0 (happens when the division produces no reminder). as you know by doing num mod i we divide num by the current value of i and check the reminder. If the reminder is 0 that means num is divisible by i. So num cannot be a prime. Whenever we find that reminder is 0 we set value of flag=0 which means the number is not prime.





the algorithm in plain english is
(0) initially set i=2 and flag=1
(1) take a number (num) which we want to test for being prime.
(2) Then we start dividing the number num successively by i= 2, 3, 4.....upto (num1) and check for the reminder. At each iteration we increase i by 1.
(3) At any stage, if we find that num is divisible by i then (num is not prime) set flag=0
(4) If we reach i= (num1) and still reminder is never 0 at any stage, then flag value remains unchanged at 1 and the number is prime
We test the flag value at the end of the program to decide whether num is prime or not.






As to the missing code, you need to use the Encode button so your angle brackets don't get interpreted as XML/HTML.
1 start
2 read the number num
3 [initialize]
i <2, flag <1
4 repeat n steps 4 through 6 unitl i <num or flag =0
5 rem <num mod i
6 if rem=0 then
flag< 0
else
i<i+1
7 if flag =0 then
print number is not prime
else
print number is prime
8 stop
See also: http://www.codeproject.com/Forums/1643/Java.aspx?fid=1643&tid=4943621[^]





I've prototyped a way to do pattern discovery using SQL, but I still have a poor understanding of where this method fits in the data mining vernacular. Being setbased, I'm not building a tree, although a functional tree does arise in the result set. The steps: 1) Build a "look up" table, by doing a cross join, yielding a combinatorial "dictionary" (a rainbow table) of ngram "words." 2) Get COUNT(*) > n , using SQL GROUP BY, matched against a large table of items 3) Further look for equivalent longer selfmatches within the result set. My seed table is 177 items, allowed to crossjoin itself 3x, for a final table 2.8 million 3gram words (takes about 35 seconds to build this table in Postgres). The actual itemset table is 10 million rows in series (serially numbered), although the actual number of itemsets might be considered smaller. I've recorded 35** seconds on the join between the two original tables, yielding all the simple repeating 3grams meeting group by's count(*) > x (that's the dictionary joined to the itemsets. That's the Q&D discovery step, and then subsequent steps simply apply a selfjoin for longerchained repeating series. These have been pretty quick, in the 50 millisecond range. My questions are: 1) What's the best way to describe this algorithm? Frequent pattern? Motif? 2) It's a simple enough method, but is it fast enough for general use? I.e. other data mining apps where performance requirements are different from my own? 3) I've wondered if SQL could be convinced to patternmatch like an LCS dynamic programming algorithm, by matching across gaps in the sequence, with maybe a lookup table of allowable variances & distance between matching values?
**Right now I'm seeing 50 seconds after the buffers load, but I reinstalled Postgres & my postgres.conf file apparently is all defaults now (the postgres process back to only 16 mb getting buffered, so it's suffering more I/O to the SSD drive). Thanks in advance,  Lee
 modified 5Sep14 0:17am.





Member 11060173 wrote: seed table is 177 items, allowed to crossjoin itself 3x, for a final table 2.8 million 3gram words
...
actual itemset table is 10 million rows Ehm, 2.8 million is about half of 177*177*177  that's quite a lot, but does still fit into memory (RAM). With 10 million rows, your trigram table will have some 10^20 rows, and that's beyond the memory of any machine nowadays, even beyond the capacity of any hard disk.
It won't work (already that factor of 10^14 applied to the present 35 seconds should tell you that).





Oh, I forgot to mention that the trigram dictionary is trimmed by abs(val1+val2+val3) <= 88 (it's a vector dataset of small int). But the 5.5m trigram dictionary might not slow things much given the use of covering indices (the access is all via btree indices, obviating the need for as much memory).
I looked into using a 4gram dictionary but it presented a very large table, much larger than the 3gram dictionary, and worse it made for more overlapping duplicates in building the equivalent of an FPtree (at least 1 extra overlap, whereas w/ the 3gram matches I'm always overlapping by n+2). Also I sense a trigrambased tree innately reflects the smallest useful vector of fromandto applicable.
One problem might be that in highfrequency datasets I could see an explosion of 3gram noise that doesn't always support better (longer) matches, bloating the output. I understand that in FPTree algo's there's a minimum support criterion that perhaps works around this. There may be a way to ameliorate this in SQL, such as checking for matching adjacencies in a manner that'll optimize via a correlated subquery, using ANY or [NOT] IN.
I haven't had enough time to fully experiment with various datasets, I've been going through a application language selection process** & am contemplating looking into PostGIS' geometric data & index features (RTree indices) as a way to get better, longer string matches, even perhaps supporting approximate or intermittent matches akin to the ability of LCS/cosine match algorithms (but on larger sets with more expressive syntax).
I'm prepared to start coding in C or Julia, but I'll avoid it if Postgres proves "fast enough." That's b/c as new data are imported to the DBMS I'll want to rerun pattern discoveries in the background against the main data store. My current 10 million rows are an exorbitant sample, outscaling anything I expect to encounter in the actual data (MIDI note vectors).
[Edit:]
Just found this:
https://www.academia.edu/5184801/SQL_Based_Frequent_Pattern_Mining_with_FPgrowth[^]
It's a very old paper (circa 2001?), but I'll probably be following their methodology. Maybe they gave up b/c DB/2 was too slow vs. algorithmic FPGrowth in C++.
Also, from a 2006 paper: http://webdocs.cs.ualberta.ca/~zaiane/postscript/adma05.pdf
"...In this work we presented COFIClosed, an algorithm for mining frequent
closed patterns. This novel algorithm is based on existing data structures FPtree
and COFItree. Our contribution is a new way to mine those existing structures
using a novel traversal approach. Using this algorithm, we mine extremely large
datasets, our performance studies showed that the COFIClosed was able to
mine efficiently 100 million transactions in less than 3500 seconds on a small
desktop while other known approaches failed..."
I know that's old hat by now (my 2008era Thinkpad T400 Celeron laptop w/ its 4GB RAM & SSD drive vs. his "small desktop"), but I'm in the ballpark.
**( As for fast application langs, esp. for other algorithmic jobs not best modeled in a SQL DBMS, I think I just found a winner: http://www.julialang.org[^] & http://juliawebstack.org/[^] )
modified 5Sep14 10:57am.





I've been playing around with binary search algorithms and have created a novel new variant that appears to be significantly faster than any traditional implementations I've seen.
int boundless_binary_search(int *array, int array_size, int key)
{
register int mid, i;
mid = i = array_size  1;
while (mid > 7)
{
mid = mid / 2;
if (key < array[i  mid]) i = mid;
}
while (i && key < array[i]) i;
if (key == array[i])
{
return i;
}
return 1;
}
I'm wondering if this is a novel binary search implementation, or has this approach been used by someone before?
Some variants and benchmark graphs:
https://sites.google.com/site/binarysearchcube/binarysearch[^]





Perhaps I'm not thinking about it right (I have a cold so I'm not particularly sharp right now), but it seems to me that if the item is in the second half of the array, it will essentially devolve into a linear search.





It switches to a linear search when roughly 8 elements are left. The i != 0 check is there in case the key is smaller than the value at index 0.





Yes ok, I guess the cold got to me. I was thinking something weird based around the assumption that mid started in the middle, which it obviously doesn't.
So it works, that's good. It seems closely related to the variant which keeps a "midpoint" and a "span" (here it's the next span (mid ), and the midpoint plus that next span (i )). Same pros and cons too (needs fixup at the end, but inner loop is simple), the "midpoint/span" variant is usually seen (when seen at all) in its "more complicated math in the inner loop"form which doesn't need fixup, but then what's the point.





Using a midpoint and a span is slower because it requires 2 assignments per loop opposed to 1.5 (on average) in my implementation.
I assume it's the fixup and assignment issue that left academics stumped for the past 60 years. There are also caching issues for larger arrays, for which I've created a variant that is mindful in that regard.





Gregorius van den Hoven wrote: requires 2 assignments per loop opposed to 1.5 (on average) in my implementation. Well you realize it'll be a conditional move, right? But only one, whereas regular binary search would have two (or an unpredictable branch, yuck).
So you'd get something like (from GCC output)
.L4:
sar edx
mov ecx, eax
sub ecx, edx
cmp [esi+ecx*4], ebx
cmovg eax, ecx
cmp edx, 7
jg .L4
In a quick test, GCC didn't feel like using cmov s for plain old "left and right bounds" binary search. You can easily measure a huge difference due to that sort of thing, and I'm not sure that's 100% fair, after all you could implement ye olde binary search with cmov s.





I'm not well versed in that area.
Looks like the compiler is doing something differently when you free up a registry space, but it's still slower.
int tailing_binary_search(int *array, int array_size, int key)
{
register int bot, mid, i;
if (key < array[0])
{
return 1;
}
bot = 0;
i = array_size  1;
mid = i / 2;
while (bot + 7 < i)
{
if (key < array[mid])
{
i = mid  1;
mid = (i  bot) / 2;
}
else
{
bot = mid;
mid += (i  bot) / 2;
}
}
while (key < array[i]) i;
if (key == array[i])
{
return i;
}
return 1;
}





Yea GCC at least makes some branches there in the loop, and it's using more complicated divisions by 2 (that work when negative).





I want to calculate the surface of an area on a sphere including projection effect.
For this I have a scanned image (which results in a 2D disc). On this image I can highlight an area and I have the index of each pixel of that area relative to the center of that disc (x, y coordinates). I also have the radius (in pixels and meters)
How can I calculate the correction needed for each pixel in order to take in account the projection effect (larger at the edges of the disc)?
Thanks.
[SOLUTION]
It's a pretty long solution, so I'll try to be as brief as possible.
1. You need to scale down the sphere to a unit sphere (radius=1)
2. With that you know that x²+y²+z² = 1 and thus z = sqrt(1x²y²) and the surface (of a full sphere) is 4*pi*r² = 4*pi, but you only have half the sphere so S=2*pi
3. What you need is the cosine of the angle of the pixel with the Zaxis. This is the z calculated in step 2 (the sqrt) divided by the radius which you reduced to one.
4. In the end you get this formula for the area per pixel: (1/r² * 1/(2*pi*sqrt(1x²y²)) ). If you loop through the pixels of the complete area you just need to to sum up all these pixels.
5. This results in a fraction of the surface compared to the sphere surface which you can then convert to any unit you like.
On first sight this "looks" correct
[/SOLUTION]
modified 8Aug14 7:50am.





I've been studying lock free algorithms, specifically the circular lock free ring buffer. I fully understand what they're doing, but am having trouble applying that to some of my code. It seems that the lock free ring buffer is meant for situations where the queue is neither starved nor saturated for any "significant" (in CPU terms) period of time.
For example, I recently had some code with multiple producers and a single consumer. When the server was running at a normal load, the lock free ring buffer would have been an excellent solution. However, when the server load was low it seems the consumer would just be spinning, using quite a bit of CPU doing nothing. Contrawise, when the server load was high, the producers would be spinning (in the current code, if this happens, the producers have a wait of a certain period at which point, they discard their data and grab the next set [losing data periodically was permissible since the next set would allow a full reconstruction, albeit with stutter]), but how long do they spin? Can't use just a count since time is important in this situation, but grabbing clock is expensive and simply looping the thread consumes a lot of CPU, depriving other producers of their quanta.
Am I missing something?





I posted a lockfree stack a few days ago. There would be no such issues of spinning/locking up the cpu. The only "problem" with it compared to the circular buffer you're using now is that the stack is FILO instead of FIFO.
A Fundamental LockFree Building Block  The LockFree Stack[^]
Hope it helps, lemme know if you have any questions.





Michael Gazonda wrote: There would be no such issues of spinning/locking up the cpu
This is the part I struggle with. If the stack is full, how does the CPU not spin? Likewise, if the stack is empty, pop returns false. Then what?





With a stack, there is no "full". It's like a singlelinked list. Each item points to the next, and carries with it the space required for doing so.
If the stack is empty, and returns false, then that's up to you to handle in whatever way is appropriate.
For myself, I wrote this to handle memory allocation (where I would manually allocate on false), or as a messaging system where false just meant there was no work to do and I would wait on a condition_variable.





Why not use one of the many open source message queues out there? This is exactly what they do. I'm assuming, of course, that you are trying to implement a circular buffer to solve a problem and not the other way around . * a message queue solves the producer / consumer problem for you * a message queue solves the distributed workload problem for you * a message queue can be configured to use either memory / database / files / etc. as a backing store, so you never lose a work request or response * clustering / scaling / fail over, etc. is all built in for you I used Apache ActiveMQ on a project with an arbitrary number of producers and about 400  500 consumers (VMWares) and it worked great. Producers would submit work requests and the 400  500 consumers would just cycle through the various queues looking for work they knew how to do. The cool thing about ActiveMQ is that it supports push notifications so you don't use any CPU time during idle time.





Because I'm trying to understand lock free algorithms.
(I former colleague evaluated ActiveMQ and found it didn't scale well. Neither did RabbitMQ. For the previous app I worked on, both would have been completely inadequate.)






SledgeHammer01 wrote: One complaint I do have about it is that it doesn't handle a heavy onslaught of large messages very well
Yup.
SledgeHammer01 wrote: On the other hand, if you have to deliver a system that's going to scale to 1000 clients and millions of messages / day in a few weeks, well...
What, again, was your definition of scalable?
SledgeHammer01 wrote: And nothing against you or your ability, but I'd be surprised if you are able to create a production system that can top one of the widely used message queues out there. At least not in a reasonable amount of time. I don't think I could either. That's all those guys do.
Actually, I have, though the use was very narrow and specialized. Again, our testing found that all the commercial solutions had performance bottlenecks or just plain fell apart when trying to use them with massive amounts of data.
BTW, in one app, switching from ZLib to LZ4 was a significant performance gain that overwhelmed any decrease due to slightly larger message sizes.



