Processing/calculating data in realTime

Question

4.27/5 (8 votes)

See more:

Hi,

I trying to process live stockMarket data and insert and update a data with the results. I'm using the consumer producer queue design pattern which I have threaded.

Some of the calculations are VERY intensive and degrading the performance of the database. I can't seem to figure how to go about processing, inserting/updating the database with the results.

Can some please give me advice on how to go about setting this up properly?

Thanks,
-Donald

Posted 28-Apr-11 10:01am

d.allen101

Add a Solution

Comments

Albin Abel 28-Apr-11 16:03pm

Good question. My 5

Nish Nishant 28-Apr-11 16:07pm

My 5 too.

AspDotNetDev 28-Apr-11 16:10pm

Take it step by step. Give us a specific example of something that is too slow. In general, make sure you have the right indexes and use the query plan to figure out problem areas.

Monjurul Habib 28-Apr-11 18:49pm

my 5.

3 solutions

Solution 3

For I have done similar things at university, I think I know where your problem is.
For instance, I did some testing (c#) on just a few hundred thousands of datasets on a sql developer machine. The performance was damn slow compared with a perl solution using in-memory and simple file based storage.

I remember, one weekend my multithreaded app was blocking the whole multicore system and university backbone. This perl program I wrote some time ago was fetching stock data from servers around the world comparing terabytes of data again and again, extracting, filtering, completing extrapolating data and even processing some images for visualization. One thing I can tell is, that a well-designed program with no database at all, interpreted by a well-chosen script compiler like perl (which is known for fast parsing capability), can outperform any precompiled high level managed code application easily. It's like choosing the right tools for a certain task.

From my current point of view, for this kind of application (high data, high access, complex operations - I call it hidaco - and in my case image processing), a standard approach of database programming is a NO-GO! Personally I think Database performance is well overestimated. Though financial manners are most often taken into transaction models because of reliability, this is fatal choice when it comes to performance considerations. Well, my approach was to reduce database activity to the minimum (means zero, I wrote my own). For you, that means doing some caching and maybe kind of creating your own database, or better, consider using an in-memory database (see Google). For recurs computations like neural networks and ai (like aforge or opencv) are much more intense than (well defined and deterministic) financial math, computation is (IMHO) not a bottleneck, nor is managed code. Any SQL may become a bottleneck very easy. Try at least two in-memory databases (see imdb on wiki for a list). If your performance increases, you should redesign your sql statements to get to the max. Well, I bet it will tremendously increase, but if it does not, take the c++ way (use an externally financial math library with c# wrapper) for performance testing.

Another approach would be to expand your SQL-Server / Database capabilities. There is a YouTube video about YouTube’s sizing problems during different periods of growth out there - just a hint, but takes me to the last point ;-)

One last word on common pit falls. I assume that processing live stock data means fetching data over any kind of network!? Please be aware of any limits on connection handling starting with maximum simultaneous connections/ sockets/ ports, bandwidth issues, packet-/session-timeouts and misconfiguration (even on the physical side -> network) and whatever may come.

And last but not least, let us know.

Posted 16-Jan-12 14:11pm

Oliver Bleckmann

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Nish Nishant · Accepted Answer · 2011-04-28T10:06:00

Solution 1

This is a very general idea I am throwing in. If, and it's an important "if", part of the slowness is due to the managed code, you may want to move some of the more intensive calculations into a fast C++ written library. You could call into it via COM or C++/CLI (among other options).

Posted 28-Apr-11 10:06am

Nish Nishant

Comments

Albin Abel 28-Apr-11 16:16pm

My 5, good alternative

Nish Nishant 28-Apr-11 16:23pm

Thanks (comment threading is all messed up)

Nish Nishant 28-Apr-11 16:23pm

Thank you, Albin.

Sergey Alexandrovich Kryukov 28-Apr-11 17:01pm

Makes sense, a 5.
What do you think about my idea? Something tells me it can be more effective. It depends on those calculation and the rest of architecture and business logic though.
Please see my answer.
--SA

Nish Nishant 28-Apr-11 17:03pm

Already saw it, voted 5 too. Up to the OP to think of these approaches though.

Monjurul Habib 28-Apr-11 18:48pm

my 5.

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-04-28T10:59:00

Solution 2

I can see that your heavy calculation part could compromise the total throughput of the system, but I don't see why it has to degrade the performance of the database. What is the bottleneck: the calculations themselves or additional transactions for intermediate results? If the transaction make a bottleneck you need to cash the data. I cannot believe you can do correct calculation of an ever-changing database anyway.

If you already developing the consumer/producer queue approach you can more or less easily move big part of processing onto another machine. I would suggest you dedicate a separate tier just for your calculation part. It can run on a separate machine and increase parallelism.

—SA

Posted 28-Apr-11 10:59am

Sergey Alexandrovich Kryukov

Updated 25-Apr-12 4:39am

v2

Comments

Nish Nishant 28-Apr-11 17:00pm

Voted 5!

Sergey Alexandrovich Kryukov 28-Apr-11 17:01pm

Thank you, Nishant.
How could you be so fast?
--SA

Nish Nishant 28-Apr-11 17:02pm

:-)

yesotaso 28-Apr-11 17:17pm

Voted 5. I was thinkg same:"Intense calculation <-?-> Degrade database performance" :) Anyway, observing a producer filling bottomless buffer or a consumer eating endless data may show where performance problem lies.

Sergey Alexandrovich Kryukov 28-Apr-11 20:57pm

Thank you very much. Agree with you.
Actually, observing/profiling how much CPU is used by each tier is not enough. A work flow can be badly unbalanced with defeats parallelism. I guess you're describing a case like that.
--SA

yesotaso 30-Apr-11 11:39am

Indeed I am. For instance, you have some horses, carriages and a loading dock. To solve low performance you need to know how good are your horses, how balanced are your carriages, how good is your crane operators. If your operator is drinking at work or a horse is running to death with empty carriage or a huge carriage kiling horses you have a problem...

Sergey Alexandrovich Kryukov 1-May-11 1:31am

Absolutely right. I can clearly see what are you talking about, especially after this morning when I worked with real horses a bit, no problems though... :-)
--SA

d.allen101 28-Apr-11 17:26pm

Nishant I'm actually using your blocking queue class. I have 2 tiers that are using the blocking queue class - logic and database. the problem (bottle neck) is in the logic tier which is running on it's own thread. the processing in this thread causes my cpu resources to go above 90%

Sergey Alexandrovich Kryukov 28-Apr-11 20:54pm

Donald, are you talking to Nishant or to me? What you say confirms my idea. Blocking queue is a very good way of synchronization with data flow, something with I hope you use, but between thread of the same process. You can do the same between processes, on the same machine or different one (so, in a scalable way) using sockets or remoting/WCF.
How many CPUs/Cores are you using? You can improve it, too. Are you close to memory limitation? In this case a lot of burned could be on memory swapping...
--SA

Monjurul Habib 28-Apr-11 18:47pm

my 5.

Sergey Alexandrovich Kryukov 28-Apr-11 20:48pm

Thank you, Monjurul.
--SA

Reza Ahmadi 25-Apr-12 9:09am

my 5!

Sergey Alexandrovich Kryukov 25-Apr-12 10:40am

Thank you, Reza.
--SA