Algorithms

18-Sep-09 0:04

Hi i have a problem.

Thing like this:

This is multi-thread program, 1 master thread processing the msgs generated by child threads(typical consumer-producer model). I create a large share buf, all the children will put their msgs into this buf if they can get the buf lock. When a child put its msg into the buf completely, it will notify the master there is a msg waiting for process, so master aquire the buf lock and hold it and begin to deal with the msgs.

The problem is when there are a lot of child threads startup, there will be a lot of msgs waiting for process, then master thread will hold the the buf lock for a while which is not acceptable Frown | :(

, as no child thread can put msg into it. This problem apparently causes poor performance of this program.

So what i try to do is to separete the large share buf to a array of small buf, and children can put their msgs into the available buf, and master can process msgs in each small share buf one by one.

But i don't know whether there is a good and mature algorithm or way to manage the share buf array.

Any good idea is welcome! Thanks.

BTW, i'm not sure whether i make the problem clear.(Because english is not my mother tongue, forgive me if it is not clear enough.) if you are interested in this problem, please feel free to contact me and ask me more details Smile | :)

by email or
MSN: donniehan@live.cn

Thanks again!

Luc Pattyn18-Sep-09 2:32

18-Sep-09 2:32

Hi,

if you have multiple producers and a single consumer, then the following seem logical:

1.
the normal way of communication would be through a single queue; all producers can enqueue messages, the consumer will dequeue the messages. Other means of communication could be used if there is a need for them, e.g. when message priorities have to be established.

2.
the queue (or other comm means) should be locked as short as possible; it is only while dequeueing that the consumer needs to lock the queue, the processing of a dequeued messages should happen with the queue unlocked (otherwise the queue would locked all the time).

3.
on average the consumer should be faster than each of the consumers, as it has to handle all the messages, whereas each producer only needs to handle its share of them. This can be implemented by giving the consumer a higher thread priority (tricky on Windows), or by limiting the size of the queue (or other comm means), so when the queue gets full, the producers get blocked.

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

donniehan18-Sep-09 4:06

18-Sep-09 4:06

Thanks for your reply and the idea. Smile | :)

But it seems not solve my problem. Firstly, i can't higher this master thread priority, actually the child threads are more important for me. Because each thread serves a client during its lifecycle. So as you said when the master hold the queue lock, it will lock the buf for a while, and it makes client thread waiting for a long time, which is my problem Frown | :(

Anyway thanks your answer! I will vote for you when i know how to vote!

Luc Pattyn18-Sep-09 4:15

18-Sep-09 4:15

donniehan wrote:
the child threads are more important for me

that may well be, but then you MUST solve the locked problem.

donniehan wrote:
Because each thread serves a client

that doesn't tell me much. If it were say serial ports where new data is coming in all the time, and risking data loss, then yes.

Whatever, if your consumer can't cope, the overall system will fail in the end, unless the lifespan of the app is limited, so you can just buffer sufficient data and have the consumer catch up after all the producers are done.

Could you show some code, I've got a hunch something is not the way it should be?

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

donniehan18-Sep-09 13:43

18-Sep-09 13:43

My code is based on the pg src(Postgres Sql).Do you know well about the pg source?

The child thread serves the client,
it means it has a lot of other works to do and has to response to the client. it is not acceptable when the child thread halt and no response for a long time. And what i try to do is to allocate serval share small buf for all children use. When any child needs to use the buf, it can find one available from buf array and just put the data into it, release the buf quickly and do its other jobs. Master thread can process all msgs in the buf array one by one. It takes very short time for each msg to be handled by master.And buf is large enough for their using. That's the situation.

Luc Pattyn18-Sep-09 14:48

18-Sep-09 14:48

Hi,

no I have no experience with Postgres SQL, but that won't matter much I think.

donniehan wrote:
It takes very short time for each msg to be handled by master.

That is very good. It will be key to your system working well.

My simple design would consist of:
- a single Message class that will be instantiated a number of times; the instances will be recycled without loading the garbage collector.
- an "emptyCollection" holding a number of empty message objects; think of it as a queue, although other kinds of collections could work too.
- an "emptyLock" object used to guard operations on emptyCollection;
- a "filledCollection" holding a number of filled message objects; again think of it as a queue.
- a "filledLock" object used to guard operations on filledCollection.
- emptyCollection+emptyLock and filledCollection+filledLock could be two instances of a single class MySafeCollection.

The initialization code would create a sufficient number of message instances and put them in the emptyCollection; the filledCollection is still empty.

Then the system really starts and performs these steps:
1. the producers obtain a message object quickly (getting one from emptyCollection, this must be protected by locking on emptyLock, which never will take long);
2. let the producer fill the message object;
3. let the producer put its filled message into the "filledCollection", using the filledLock, which again will not take long.
4. let the consumer obtain a filled message object from the filledCollection(again shortly using the filledLock).
5. let the consumer process the message; as this is a short operation and should never become a bottleneck I would consider doing this on a thread with slightly higher priority.
6. let the consumer return the message to the "emptyCollection" using the emptyLock.

Possible refinements:
A. You still need a signaling mechanism: each time a producer adds something to the filledCollection, it should send a signal to the single consumer, so it goes and fetches one (or more) messages from the filledCollection. This avoids the need for a polling loop in which the consumer would waste CPU cycles when no messages got filles.
B. A producer needing a message object and not finding one in the "emptyCollection" could create a new one; that would make the initialization unnecessary, however when things go terribly wrong, you may end up with a huge number of messages in the filledCollection.

I have implemented the above scheme numerous times inside embedded systems (i.e.using a proprietary operating system, not .NET or Windows). I know it works very well and is very performant.

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

donniehan20-Sep-09 18:03

Re: How to manage a share buf array? [modified]

20-Sep-09 18:03

Hi
thanks for your great idea.
But i still have a few problems:

1st, i can only use pure C to program(Well, it will not be the real problem,i can use structure instead Smile | :)

)

2nd, the size of each message is not fixed. a message maybe very small(16B) or very big(>10KB). if i apply your idea to my system, it will make tiny memory pieces in this share memory, so i must implement a function to recycle the memory pieces and it will also cost some time. When the system is very busy and producer create many msgs, the system must call recycle function frequently to avoid memory pieces problem and it will cost much time to do recycle work.---------->That's the big problem for me! Frown | :(

The Share Memory is a block of contiguous memory allocated at first system startup. The maxsize of this Share Memory is fixed. it is not allowed to allocate another free share memory after system startup.

It is a quite annoying problem, isn't it? D'Oh! | :doh:

Luc Pattyn21-Sep-09 1:17

21-Sep-09 1:17

Hi,

1.
language does not really matter, I've done what I suggested in C many times.

2.
when message size isn't fixed, I find it easiest to use a few different fixed sizes; i.e. I would define appropriate values size1<size2<size3 (with size3 large enough to hold any message); and preallocate a number of buffers for each of those; then use as many emptyCollections as there are sizes, and use only one filledCollection.

3.
Not sure why you would need contiguous memory, unless some DMA-based peripheral is involved.

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

modified on Monday, September 21, 2009 8:06 AM

donniehan21-Sep-09 2:00

21-Sep-09 2:00

hehe~~~

Thanks again for your help!

I almost solve this problem. Laugh | :laugh:

Luc Pattyn21-Sep-09 2:07

21-Sep-09 2:07

I fixed a HTML problem in my previous post.

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

Hough Circular transform

lcssiva16-Sep-09 1:38

lcssiva

16-Sep-09 1:38

Can anyone guide me how to write the Hough circular transform?

Alan Balkany16-Sep-09 3:27

Alan Balkany

16-Sep-09 3:27

This is used to recognize a circle in an image with incomplete data and/or noise. Every white pixel (assuming a black background) is evidence for a circle that goes through it. You go through all the white pixels and accumulate evidence for all the possible circles. The circle with the most evidence is the result of the Hough transform.

When you find a white pixel, it could be on many circles of radius r. In fact, every point that's distance r away from the white pixel COULD be the center of the circle. In other words, a white pixel is evidence for the center of the circle at all points r away from the pixel, i.e. a new circle with center at the white pixel and radius r.

So you start with an array with all cells set to zero. For every white pixel, increment all array cells at distance r from the white pixel. When you're done, the cell with the highest score is probably the center of the circle.

Luc Pattyn16-Sep-09 4:08

16-Sep-09 4:08

Great explanation. What would you suggest for finding circles of arbitrary size though, so radius r is unknown?

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

Alan Balkany16-Sep-09 4:24

Alan Balkany

16-Sep-09 4:24

In that case, you have a three-dimensional space for the circle parameters: center X, center Y, and r.

So instead of a two-dimensional array, you have a three-dimensional array where the extra dimension is for the radius.

When you hit a white pixel, you increment the cells in a circle at EACH level in the r dimension (with the radius given by the r level). As in the two-dimensional case, the cell with the highest score gives the parameters for the circle with the most evidence. But here the r level also gives you the radius.

Luc Pattyn16-Sep-09 4:39

16-Sep-09 4:39

I understand that, however now you are using a lot of memory (it has one more dimension than the original image) and incrementing a huge number of counters. Seems like it will take forever, I would hope there is something with better performance...

Smile | :)

Luc Pattyn

Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

Local announcement (Antwerp region): Lange Wapper? Neen!

Fatbuddha 116-Sep-09 5:02

Fatbuddha 1

16-Sep-09 5:02

This is a bit of a hack, but it worked fine for me and was quicker then a hough trans.
I was suppose to detect black framed circles in a microscopy image.
What I did was generatin a circle template, starting with 4 pixel in diameter and increase to the width of the image. At each step I correlated my circle template with the image. The highest scoring correlation output was stored. So in the end you know where and which size the circles are.

I did this in Matlab where it was actually easy to do and it was quicker then the hough trans detection and normally more accurate as well.

Cheers

You have the thought that modern physics just relay on assumptions, that somehow depends on a smile of a cat, which isn’t there.( Albert Einstein)

Alan Balkany16-Sep-09 5:17

Alan Balkany

16-Sep-09 5:17

These days memory is cheap, and it's possible to draw a circle quickly using only fast integer operations: http://en.wikipedia.org/wiki/Midpoint_circle_algorithm[^].

If you have bounds on the circle's radius, there are a couple of optimizations:

1. It limits the size of the 3D array in the r dimension, and
2. You can preprocess the image to remove convex blobs with radius greater than the minimum radius. These blobs are just noise, and eliminate lots of white pixels which reduces processing time.

Luc Pattyn16-Sep-09 5:33