Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# socket TCP/IP
Hello There .
 
I have multithreaded TCP IP server which must send 100 messages per second to its clients.
I am testing it on 500 clients and the size of each message around 2kb.
 
I developed my server to be a multithreaded application using SMART Thread Pool.
 
The multithreading is working fine. I am using socket .beginsend method to send messages to clients.
 
While testing I face a delay problem:
If we said 100 message per second to 500 clients this means 50000 message per second this takes 18 seconds to be sent to clients.
 
I test to see if the problem is with my threading application execution of threads but I found that when I comment the beginsend line I find that the execution does not need more than 2 seconds reaching the sending method.
 
Is there is a way to speed up the execution of beginsend?
Is this behavior correct ?
Posted 4-Nov-10 4:20am
hazem12564
Edited 8-Nov-10 22:49pm
Dalek Dave432.9K
v2
Comments
Dalek Dave at 9-Nov-10 3:49am
   
Edited for Grammar, Spelling and Syntax.
Eddy Vluggen at 9-Nov-10 6:07am
   
How are you assigning this BeginSend to the SMART-threadpool? Sounds like you're running into the maximum number of threads from the threadpool, where the application has to wait until there's a new free thread available. Also, it does cost time to start a thread, and starting threads cannot be speed up.
hazem12 at 9-Nov-10 7:49am
   
thanks for your comment.
First of all I was sending each message to an client as a job assigned to thread pool and I made IWorkItemsGroup for each client to make sure that all messages sent to client in order but on about 100 client with the rate less than 100 message per second I faced problem that the CPU always more that 95 % I made some search about that and I found that the sending job to one client is fast job and it maybe cost less than assigning the job to thread pool so. I modified the code to make the job to be sending to list of clients and the number in the list depends on a number I assigned in the program configuration so if we have 100 clients and the number I assigned is 5 so each list we have 25 clients. But also on this way I faced a little problem on 500 clients with assigned number about 32 so each list has 16 clients we have the problem I mentioned and the CPU be always more than 95% but if I modified the number to 500 I still have delay as I mentioned in my problem but the CPU reduced to 60% also even if I have 1000 client with 500 number assigning .but the delay problem still there . and I set the maximum number in the thread pool to high number about 500 thread and the all threads application not more 170 thread according to task manager.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

This is very simplified - I ignore packet overhead, acks, response turnround, switch backplane capacity, and unicode.
 
2KB = 2096 * 8 bits.
100 messages per second = 100 * 2096 * 8.
500 clients = 500 * 100 * 2096 * 8.
= 838,400,000 bits per second.
 
Are you using a gigabit network? Because if you aren't there is your bottleneck, right there.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

I believe that even if you have enough bandwidth to cope with the amount of data you want to send, there's too much overhead and latency with sending that many packets. When sending over TCP you need to send the packet which will have a small delay as it gets sent onto the actual network (regardless of the time it takes to actually travel to the other computer) and then you need to wait for an acknowledgement that the client has received the packet before continuing or if no ack is received, send the packet again.
 
Also with using packet sizes of 2kb you're packets may be being fragmented as they don't actually fit within one frame of data, which will obviously take a little longer to send. Also if you have a small maximum packet size then the messages will actually be split into separate TCP packets which is going to incur quite a lot more overhead.
 
Page about MTU[^]
 
The problem your having may be more to do with the limitations of the network rather than the performance of the code. What you might like to try is sending the same amount of much smaller messages and see how it copes.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

Yes I am using Gigabit network. I forget to mention that when I comment the sending statement also the CPU usage reduced to half of usage when the statement uncommented
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

Hello i tried to send small message there is no big change the time reduced to 2 minutes and this is not good .
what i can do to reduce latnecy .
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

Hello agian,
I cant still find an answer to my question here .
i need to know if you please . Am i reach the TCP/IP limit ? in my case or still the TCP/IP can affored more than that ?
how I can solve my problem?
  Permalink  
Comments
Dave Kreskowiak at 22-Aug-13 23:51pm
   
Dude. Go back and reread these answers. If you're using a Gigabit Ethernet NIC, YOU'RE EXCEEDING THE LIMITS OF YOUR NETWORK HARDWARE!
 
There may be hardware solutions that MAY help you do this and, depending on your network infrastructure, may also be hindering you.
 
First, NIC teaming. Where multiple NICs in your server (you ARE running this on Windows Server, correct?) are setup in a team configuration to increase available bandwidth. The switch your server is connected to must also support teaming!
 
Second, get a faster NIC. Just based on the simple math of the number of messages you're sending you are exceeding the limits of Gigabit Ethernet. If you want to send that kind of data you'll need a faster NIC. The problem is that there are not too many 10Gb NICs out there and your switch hardware will also have to be upgraded to handle this speed.
 
Server cluster. Redesign your app to run on a cluster of servers, spreading the traffic load out across multiple servers. What will limit you here will be your network infrastructure switching capacity.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 6

Another back-of-the-envelope estimate:
 
You probably understand that context switches are typically more than 2 orders of magnitude slower than a function (or method) invocation. Perhaps your efforts with the thread pool and number of clients are reducing the number context switches, but I think they won't be enough.
 
As a single data point: on my Dell, with dual core Athlon 2, running Ubuntu 12.04, and reporting it's best bogoMIP as 5210.77 for each processor:
 
--> a single pthread-mutex-semaphore enforced context switch benchmarks to 14 us.
 
i.e. (50,000 * 14 us) adds to 700 ms, or 0.7 seconds, or 70% of a single cores bandwidth, and none of your code has yet run, none of your data has yet transferred .
 
While multicore can speed up some of this, you should consider rethinking how to use threads.
 
---
I suggest you temporarily scale back to a few clients (like 1 or 2), then measure the durations of the various sections of your server code. Find the bottle necks, don't guess.
 
For instance, wouldn't commenting out beginsend() eliminate all data transmits? Perhaps it would be better to try and measure the duration of a functioning beginsend(), instead of trying to guess what it means to eliminate it and all the events it triggers.
 
I use the following for measurement, perhaps this might help :
uint64_t start_us    = getSystemMicrosecond();
// ... code to measure goes here 
uint64_t duration_us = getSystemMicrosecond() - start_us;
 
And uint64_t getSystemMicrosecond(void) is trivially constructed from Linux function
int clock_gettime(CLOCK_REALTIME, &ts)
  Permalink  
v2
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 7

A few more thoughts (concurring with Dave K. Solution 5 comments).
 
Some good rule-of-thumb for real-world throughput is:
 
10baseT = 1 megabyte /sec (8 mbps)
100baseT = 10 megabytes/sec (80 mbps)
1000baseT = 30 megabytes/sec (240 mbps)
 
( see "http://www.codinghorror.com/blog/2005/07/gigabit-ethernet-and-back-of-the-envelope-calculations.html" )
 

Also, what would drive you to perhaps review your packet size choices is to consider SAR (segmentation and reassembly) and MTU (maximum transmission unit). Understanding a little bit more of these 2 key ethernet ideas might help.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 195
1 ProgramFOX 130
2 Maciej Los 105
3 Sergey Alexandrovich Kryukov 105
4 Afzaal Ahmad Zeeshan 82
0 OriginalGriff 6,564
1 Sergey Alexandrovich Kryukov 6,048
2 DamithSL 5,228
3 Manas Bhardwaj 4,717
4 Maciej Los 4,150


Advertise | Privacy | Mobile
Web03 | 2.8.1411022.1 | Last Updated 25 Aug 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100