Algorithms

27-Aug-14 10:24

I've been playing around with binary search algorithms and have created a novel new variant that appears to be significantly faster than any traditional implementations I've seen.

C++

int boundless_binary_search(int *array, int array_size, int key)
{
        register int mid, i;

        mid = i = array_size - 1;

        while (mid > 7)
        {
                mid = mid / 2;

                if (key < array[i - mid]) i -= mid;
        }
        while (i && key < array[i]) --i;

        if (key == array[i])
        {
                return i;
        }
        return -1;
}

I'm wondering if this is a novel binary search implementation, or has this approach been used by someone before?

Some variants and benchmark graphs:

https://sites.google.com/site/binarysearchcube/binary-search[^]

harold aptroot4-Sep-14 21:28

4-Sep-14 21:28

Perhaps I'm not thinking about it right (I have a cold so I'm not particularly sharp right now), but it seems to me that if the item is in the second half of the array, it will essentially devolve into a linear search.

Igor van den Hoven6-Sep-14 10:04

6-Sep-14 10:04

It switches to a linear search when roughly 8 elements are left. The i != 0 check is there in case the key is smaller than the value at index 0.

harold aptroot6-Sep-14 11:10

6-Sep-14 11:10

Yes ok, I guess the cold got to me. I was thinking something weird based around the assumption that mid started in the middle, which it obviously doesn't.
So it works, that's good. It seems closely related to the variant which keeps a "midpoint" and a "span" (here it's the next span (mid), and the midpoint plus that next span (i)). Same pros and cons too (needs fixup at the end, but inner loop is simple), the "midpoint/span" variant is usually seen (when seen at all) in its "more complicated math in the inner loop"-form which doesn't need fixup, but then what's the point.

Igor van den Hoven6-Sep-14 12:32

6-Sep-14 12:32

Using a midpoint and a span is slower because it requires 2 assignments per loop opposed to 1.5 (on average) in my implementation.

I assume it's the fixup and assignment issue that left academics stumped for the past 60 years. There are also caching issues for larger arrays, for which I've created a variant that is mindful in that regard.

harold aptroot6-Sep-14 12:52

6-Sep-14 12:52

Gregorius van den Hoven wrote:
requires 2 assignments per loop opposed to 1.5 (on average) in my implementation.

Well you realize it'll be a conditional move, right? But only one, whereas regular binary search would have two (or an unpredictable branch, yuck).
So you'd get something like (from GCC output)

ASM

.L4:
	sar	edx
	mov	ecx, eax
	sub	ecx, edx
	cmp	[esi+ecx*4], ebx
	cmovg	eax, ecx
	cmp	edx, 7
	jg	.L4

In a quick test, GCC didn't feel like using cmovs for plain old "left and right bounds" binary search. You can easily measure a huge difference due to that sort of thing, and I'm not sure that's 100% fair, after all you could implement ye olde binary search with cmovs.

Igor van den Hoven6-Sep-14 15:07

6-Sep-14 15:07

I'm not well versed in that area.

Looks like the compiler is doing something differently when you free up a registry space, but it's still slower.

C++

int tailing_binary_search(int *array, int array_size, int key)
{
        register int bot, mid, i;

        if (key < array[0])
        {
                return -1;
        }

        bot = 0;
        i = array_size - 1;
        mid = i / 2;

        while (bot + 7 < i)
        {
                if (key < array[mid])
                {
                        i = mid - 1;
                        mid -= (i - bot) / 2;
                }
                else
                {
                        bot = mid;
                        mid += (i - bot) / 2;
                }
        }
        while (key < array[i]) --i;

        if (key == array[i])
        {
                return i;
        }
        return -1;
}

harold aptroot6-Sep-14 20:25

6-Sep-14 20:25

Yea GCC at least makes some branches there in the loop, and it's using more complicated divisions by 2 (that work when negative).

Surface calculation (sphere) - projection effect. [SOLVED ?]

Alan Balkany5-Nov-15 0:21

Alan Balkany

5-Nov-15 0:21

This approach has been done before, e.g. the implementation of qsort in the C runtime library transitions from quicksort to a (linear) insertion sort when the partition size gets below a certain threshold.

V.3-Aug-14 22:31

3-Aug-14 22:31

I want to calculate the surface of an area on a sphere including projection effect.
For this I have a scanned image (which results in a 2D disc). On this image I can highlight an area and I have the index of each pixel of that area relative to the center of that disc (x, y coordinates). I also have the radius (in pixels and meters)

How can I calculate the correction needed for each pixel in order to take in account the projection effect (larger at the edges of the disc)?

Thanks.

[SOLUTION]
It's a pretty long solution, so I'll try to be as brief as possible.
1. You need to scale down the sphere to a unit sphere (radius=1)
2. With that you know that x²+y²+z² = 1 and thus z = sqrt(1-x²-y²) and the surface (of a full sphere) is 4*pi*r² = 4*pi, but you only have half the sphere so S=2*pi
3. What you need is the cosine of the angle of the pixel with the Z-axis. This is the z calculated in step 2 (the sqrt) divided by the radius which you reduced to one.
4. In the end you get this formula for the area per pixel: (1/r² * 1/(2*pi*sqrt(1-x²-y²)) ). If you loop through the pixels of the complete area you just need to to sum up all these pixels.
5. This results in a fraction of the surface compared to the sphere surface which you can then convert to any unit you like.
On first sight this "looks" correct Smile | :)

[/SOLUTION]

V.

(MQOTD rules and previous solutions)

modified 8-Aug-14 7:50am.

Re: Surface calculation (sphere) - projection effect. [SOLVED ?]

Supreme Master27-Jul-15 9:24

Supreme Master

27-Jul-15 9:24

open-source GIS libraries have solved this kind of problems in much more general form, iff u get the time to decipher them Sniff | :^)

Lock free algorithms

Joe Woodbury1-Jul-14 8:15

1-Jul-14 8:15

I've been studying lock free algorithms, specifically the circular lock free ring buffer. I fully understand what they're doing, but am having trouble applying that to some of my code. It seems that the lock free ring buffer is meant for situations where the queue is neither starved nor saturated for any "significant" (in CPU terms) period of time.

For example, I recently had some code with multiple producers and a single consumer. When the server was running at a normal load, the lock free ring buffer would have been an excellent solution. However, when the server load was low it seems the consumer would just be spinning, using quite a bit of CPU doing nothing. Contra-wise, when the server load was high, the producers would be spinning (in the current code, if this happens, the producers have a wait of a certain period at which point, they discard their data and grab the next set [losing data periodically was permissible since the next set would allow a full reconstruction, albeit with stutter]), but how long do they spin? Can't use just a count since time is important in this situation, but grabbing clock is expensive and simply looping the thread consumes a lot of CPU, depriving other producers of their quanta.

Am I missing something?

Michael Gazonda31-Jul-14 18:51

Michael Gazonda

31-Jul-14 18:51

I posted a lock-free stack a few days ago. There would be no such issues of spinning/locking up the cpu. The only "problem" with it compared to the circular buffer you're using now is that the stack is FILO instead of FIFO.

A Fundamental Lock-Free Building Block - The Lock-Free Stack[^]

Hope it helps, lemme know if you have any questions.

Joe Woodbury31-Jul-14 19:21

31-Jul-14 19:21

Michael Gazonda wrote:
There would be no such issues of spinning/locking up the cpu

This is the part I struggle with. If the stack is full, how does the CPU not spin? Likewise, if the stack is empty, pop returns false. Then what?

Michael Gazonda31-Jul-14 19:25

Michael Gazonda

31-Jul-14 19:25

With a stack, there is no "full". It's like a single-linked list. Each item points to the next, and carries with it the space required for doing so.

If the stack is empty, and returns false, then that's up to you to handle in whatever way is appropriate.

For myself, I wrote this to handle memory allocation (where I would manually allocate on false), or as a messaging system where false just meant there was no work to do and I would wait on a condition_variable.

SledgeHammer016-Sep-14 8:58

SledgeHammer01

6-Sep-14 8:58

Why not use one of the many open source message queues out there? This is exactly what they do. I'm assuming, of course, that you are trying to implement a circular buffer to solve a problem and not the other way around Smile | :)

.

* a message queue solves the producer / consumer problem for you
* a message queue solves the distributed workload problem for you
* a message queue can be configured to use either memory / database / files / etc. as a backing store, so you never lose a work request or response
* clustering / scaling / fail over, etc. is all built in for you

I used Apache ActiveMQ on a project with an arbitrary number of producers and about 400 - 500 consumers (VMWares) and it worked great. Producers would submit work requests and the 400 - 500 consumers would just cycle through the various queues looking for work they knew how to do.

The cool thing about ActiveMQ is that it supports push notifications so you don't use any CPU time during idle time.

Joe Woodbury6-Sep-14 9:50

6-Sep-14 9:50

Because I'm trying to understand lock free algorithms.

(I former colleague evaluated ActiveMQ and found it didn't scale well. Neither did RabbitMQ. For the previous app I worked on, both would have been completely inadequate.)

SledgeHammer016-Sep-14 11:35

SledgeHammer01

6-Sep-14 11:35

Yeah, that's why I was asking if you were trying to reinvent the wheel Smile | :)

.

Just as an aside with ActiveMQ, it actually scales EXTREMELY well, you just need to configure it exactly right and use it exactly right. They have a TON of configuration options which are pretty poorly documented IMO. Just mis-setting one of them can kill your performance.

One complaint I do have about it is that it doesn't handle a heavy onslaught of large messages very well. My original implementation tried to send 1MB to 2MB messages and it would crash often (probably a configuration issue as we wanted to slam messages through and didn't care if they got lost). I then started compressing the messages with LZ4 and they went down to about 400k to 500k and the system got pretty damn stable for only having one server and 400 to 500 clients and performance was much better. They are actually tweaking the system right now to send really small messages that point to file shares to pick up the work load.

And nothing against you or your ability, but I'd be surprised if you are able to create a production system that can top one of the widely used message queues out there. At least not in a reasonable amount of time. I don't think I could either. That's all those guys do.

If you are doing it as a learning process, well, that's a different story Smile | :)

.

On the other hand, if you have to deliver a system that's going to scale to 1000 clients and millions of messages / day in a few weeks, well... Smile | :)

Joe Woodbury6-Sep-14 11:49

6-Sep-14 11:49

SledgeHammer01 wrote:
One complaint I do have about it is that it doesn't handle a heavy onslaught of large messages very well

Yup.

SledgeHammer01 wrote:
On the other hand, if you have to deliver a system that's going to scale to 1000 clients and millions of messages / day in a few weeks, well...

What, again, was your definition of scalable? Smile | :)

SledgeHammer01 wrote:
And nothing against you or your ability, but I'd be surprised if you are able to create a production system that can top one of the widely used message queues out there. At least not in a reasonable amount of time. I don't think I could either. That's all those guys do.

Actually, I have, though the use was very narrow and specialized. Again, our testing found that all the commercial solutions had performance bottlenecks or just plain fell apart when trying to use them with massive amounts of data.

BTW, in one app, switching from ZLib to LZ4 was a significant performance gain that overwhelmed any decrease due to slightly larger message sizes.