Click here to Skip to main content
Email Password   helpLost your password?

Requirements

The article expects the reader to be familiar with the C++, Winsock API 2.0, MFC, Multithreading.

Windows NT/2000 or later: Requires Windows NT 3.5 or later
Windows 95/98/Me: Unsupported

Motivation

This article which attempts to deal with the thorny issue of using Completion Ports with Windows Sockets. It also addresses some concerns of previous readers from the last article. Portions of the code and been reengineered so its worth downloading again if you've haven't already done so

The article expects the reader to be familiar with the Winsock API 2.0, MFC, Multithreading. 

I have recently been working on a project that required me to develop a high performance TCP/IP server, typically a server similar to a Web Server, where a large amount of clients can connect and exchange data. 

The initial design of my server was developed with a 1 thread per TCP/IP client interface, I initially thought this was a good solution until I read an  article on High-load servers which suggested that the server could get into a state of "Thread Thrashing" as  the threads awake to service the client connection and the operating system could possibly run out of system resources. Another problem, I was using WSAAsyncSelect for each client, the problem here Winsock is limited to 64 event handles - whoops.  The solution to the problem was to develop a server with I/O Completion Ports.  

During my research into I/O Completion ports, I found very few articles and code samples on real world applications, especially demonstrating writing data back to a client. This prompted me into writing this article.

Design

Instead creating 1 thread per client - hence 1000 clients a 1000 threads, we create a Pool of worker threads to service our I/O events, I will discuss the Worker Threads more later in the article. 

To begin using completion ports we need to create a Completion Port which in turn creates a number of concurrent threads (threads that exist with the Completion Port - Not to be confused with Worker Threads)  that you specify. See function prototype below. 

HANDLE CreateIoCompletionPort ( HANDLE FileHandle, // handle to file

HANDLE ExistingCompletionPort,  // handle to I/O completion port

ULONG_PTR CompletionKey,        // completion key

DWORD NumberOfConcurrentThreads // number of threads to execute concurrently );

Specifying zero for the NumberOfConcurrentThreads will create concurrent threads as there are CPUs on the system. You can change this value to experiment with performance, but for the purpose of this article and code we will use the default value zero. 

Once the Completion Port has been created, the next step is to associate all accepted sockets with the Completion Port. The call to do this is CreateIoCompletionPort, this is somewhat confusing and its probably better to call a function like AssociateSocketWithCompletionPort to do the job for you. Here's what AssociateSocketWithCompletionPort looks like: 

BOOL CClientListener::AssociateSocketWithCompletionPort(SOCKET socket, 
                                                        HANDLE hCompletionPort, 
                                                        DWORD dwCompletionKey)
{
	HANDLE h = CreateIoCompletionPort((HANDLE) socket, hCompletionPort, dwCompletionKey, 0);
	return h == hCompletionPort;
}

You'll notice that AssociateSocketWithCompletionPort requires a Completion key. A Completion key is essentially an OVERLAPPED structure with any other data you want to associate with the completion port and socket. Examine the class below:

struct ClientContext 
{
OVERLAPPED m_Overlapped;
LastClientIO m_LastClientIo;
SOCKET m_Socket;

// Store buffers

CBuffer m_ReadBuffer;
CBuffer m_WriteBuffer;

// Input Elements for Winsock

WSABUF m_wsaInBuffer;
BYTE m_byInBuffer[8192]; 

// Output elements for Winsock

WSABUF m_wsaOutBuffer;
HANDLE m_hWriteComplete;

// Message counts... purely for example purposes

LONG m_nMsgIn;
LONG m_nMsgOut; 
};

The reason why a ClientContext is associated with a socket and completion port, is so we can keep a track of the socket when the I/O is dequeued in the Worker Threads.

Now that the socket ha been attached/associated with the Completion Port, we can discuss the Worker Threads in detail.

We create the worker threads during the creation of the completion port, the worker threads handles are closed upon creation as they are not needed.

The worker threads now wait on GetQueuedCompletionStatus. When an I/O is request and been serviced it is queued in the Completion Port the last Worker thread to issue a GetQueuedCompletionStatus  is woken and the I/O can be processed. See GetQueuedCompletionStatus  below, notice it returns a Completion Key, with this we can keep track of our associated socket.

BOOL GetQueuedCompletionStatus(
HANDLE CompletionPort,      // handle to completion port

LPDWORD lpNumberOfBytes,    // bytes transferred

PULONG_PTR lpCompletionKey, // file completion key

LPOVERLAPPED *lpOverlapped, // buffer

DWORD dwMilliseconds        // optional timeout value

);

A rule of thumb for the number of Worker threads = 2 * CPU on the system, this is a heuristic value and is explained in detail by Jeffery Richter in "Programming Server Side Applications for Windows 2000". I've included in the source code sample a dynamic thread pooling algorithm (This is not implemented in the example), but you can experiment with the following values (Remember to adjust the NumberOfConcurrentThreads accordingly).

m_nThreadPoolMin  // The minimum threads in the pool

m_nThreadPoolMax  // The maximum threads allowed in the pool

m_nCPULoThreshold // The CPU threshold when unused threads can removed from the Worker ThreadPool

m_nCPUHiThreshold // The CPU threshold when a thread can be added to the Worker ThreadPool

Now we have the process in place, its time to show the Completion Port architecture in diagram form below: 

The worker threads must issue a IO Request either by a WSARead or WSAWrite, they then wait on GetQueuedCompletionStatus for the IO complete. Once the IO is completed GetQueuedCompletionStatus returns and the data can be processed.

So on a dual processor box we could quite comfortably handle 2000+ (Depending on data throughput and workload etc.) clients with only 4 threads.

In my IOCP_Server example I have a class CListener, which accepts TCP/IP clients and associates with a Completion Port, CListener also holds a list of ClientContexts (for stats/referencing).

I have created my own data protocol for incoming/outgoing data packets, this is a 4 byte (integer) header (containing the size of the packet) and the actual packet.  e.g. 0500HELLO. This protocol is used to exchanged data to and from the client.

The Project

Included in the project for completeness is a CBuffer class to hold incoming and outgoing data, a CCpuUsage class for the ThreadPool allocation/Deallocation.

Our code includes map to route the requests to function handlers, see below:

// Here we use the natural (well...) way to neatly handle each queued status

BEGIN_IO_MSG_MAP()
IO_MESSAGE_HANDLER(ClientIoInitializing, OnClientInitializing)
IO_MESSAGE_HANDLER(ClientIoRead, OnClientReading)
IO_MESSAGE_HANDLER(ClientIoWrite, OnClientWriting)
END_IO_MSG_MAP()

bool OnClientInitializing (ClientContext* pContext, DWORD dwSize = 0);
bool OnClientReading (ClientContext* pContext, DWORD dwSize = 0);
bool OnClientWriting (ClientContext* pContext, DWORD dwSize = 0);

Example Project

Well the best thing to do is fire up the examples and play with it. There's plenty of comments littered throughout the code.

For example set the client up so it sends 99999 "Test Item " messages, it takes around 3 seconds and the CPU usage hardly flinches. Wow.

The example MFC project contains the Server code which displays clients accepting/connecting and any incoming data read from the IO Port. It also allows data to be sent to a specified connected client.

Also included is a MFC Client. which sends and receives data and has a flood option or sending the same string repeatedly.

This should be a good jumpstart for anybody wanting to create a High performance Client/Server application for Windows NT/2000.

The server listens on port 999, please change in the client/server program, if this conflicts with your system.

Any corrections,  enhancements or suggestions please don't hesitate to contact me.

Credits

Firstly I like to thank Ulf Hedlund for taking time to fix some of the subtle problems with the code, and I'd also like to thank many other readers you have sent in comments and suggestions.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralA BUG ! "pContext->m_hWriteComplete" not initialize .
baddons
3:16 23 Feb '10  
hello Norm .net:

Thanks fou your share! But I find your code has a bug.
And the bug :

"bool CIOCPServer::OnClientWriting(ClientContext* pContext, DWORD dwIoSize)
{ .......
SetEvent(pContext->m_hWriteComplete);
.......
}"
the problem is:
"pContext->m_hWriteComplete" has not use "CreateEvent( )" initialize.

So When you run the demo and "IOCP_Server.exe" check the "Echo Mode" Button and "TestClient.exe" check "Flood" Button,
there are a Error "WSAEnumNetworkEvents error 10038 " in a while.

At last, Wish you have time to renew the code.

baddons from China
2010.02.23
GeneralWhere is the new version? memory leak!
Eric_Zhou
0:19 29 Jul '09  
OVERLAPPEDPLUS* pOverlap = new OVERLAPPEDPLUS(NC_IO_READ);//memory leak

thx

Eric
General[Message Deleted]
it.ragester
23:00 2 Apr '09  
[Message Deleted]
Questionplz give me information ...
priya_chauhan09
0:57 12 Mar '08  
i am developing a chat server in vc++ 6.0, which will be accessed by
multiple clients simultaneously. The problem is I have to keep a port always in the listening state and when the client is requesting for connection.

I am not able to connect with multiple clients with single port number
maintained at server. What should I do if need to do this? Could anyone help me in this issue.


i have use non blocking ansynchronized socket using wsaasyncselect () i don't want to use thread i want to use without threads so plz provide help how to solve this plz........



priyanka

priyanka chauhan
gujarat india

AnswerRe: plz give me information ...
mirtu
20:29 12 Mar '08  
use Getpeername() it help you.
GeneralNew Release?
Losyz
7:27 9 Nov '07  
where is the new release?thanks
QuestionNumber of supported clients
p1000
13:28 31 Oct '07  
Your project looks really great! I am trying to build a high performance server, not in terms of how much data is sent from the server to a small number of clients. Rather, I want to send small amounts of data to a very large number of connected clients.

Do you think that the ideas from this project can be adapted for that kind of application? How many concurrent connections do you think a server could maintain on a normal home PC?

Thanks!
P1000


AnswerRe: Number of supported clients
norm .net
22:40 1 Nov '07  
p1000 wrote:
I want to send small amounts of data to a very large number of connected clients.

That should be no problem, in fact the code was an early prototype for a Flight information system, this was pushing data out to a large number of clients.

WPF - Imagineers Wanted
Follow your nose using DoubleAnimationUsingPath

Questionhow to change data protocol ?
jone_lion
22:02 11 Sep '07  
sizeof(int) = 4

how to change data protocol ?

THS

Generalhow to make it send/get more faster?
jone_lion
17:47 11 Sep '07  
I found the demo send/get data to slow, how can make it deal faster?

THSBig Grin

qwe

Generalit does not even compile
rompelstilchen
3:10 10 Sep '07  
lol, what a s@#t, it seems to be missing CBuffer things
anyways ...
GeneralThe number of bytes returned by GetQueueCompletionStatus is Ok but nothing in my buffer
Vincent Thomas
4:06 7 May '07  
Hi,

I have a problem with the WSARecv Call. At some point after some read/write operation, GetQueueCompletionStatus awake one of my worker thread when I send data to a socket attached to the completion port. It indicates that a certain amount of bytes were transfered (by the way it is the good one, ie the same amount I sent to this completion port) BUT the buffer I gave when I called WSARecv is empty and any further call to WSARecv is useless. I keep getting ERROR_IO_PENDING just after a call and if I resend data on this socket my worker thread is awaken, and in theory the number of bytes received is ok but nothing in my buffer.

Have you ever encountered this behaviour ?

Thanks in advance

Vincent
GeneralRe: The number of bytes returned by GetQueueCompletionStatus is Ok but nothing in my buffer
Vincent Thomas
23:53 9 May '07  
Never mind, I made a stupid mistake preventing my server to send data to the client.

IOCP works like a charm now

Vincent
GeneralGetQueuedCompletionStatus returning false
soongteck
19:10 5 Feb '07  
can anyone explain why GetQueuedCompletionStatus keeps on returning false when i increase the value of HUERISTIC_VALUE above 1900. Is there a limit to the number of threads permitted?
GeneralRe: GetQueuedCompletionStatus returning false
binjuny
17:16 6 Nov '07  
the value of HUERISTIC_VALUE is should equal to 2,

I'm God!

QuestionClient crashes [modified]
DumbMonkey
6:42 17 Dec '06  
Hi. This example is exactly the sort of thing i need for my app, so firstly many thanks.
I have a question though, what if one of the client crashes out (non-gracefully). The server thinks it is still connected. Now when the client restarts and reconnects from the same ip, it is added as another client. This means that there is now one old connection (which isnt actually connected) and a new correct one. Is there anyway around this? To maybe remove the old connection when the client tries to reconnect?
Many thanks.


-- modified at 15:09 Sunday 17th December, 2006
AnswerRe: Client crashes
charfeddine_ahmed
3:27 22 Jan '07  
use timers, my friend:
the server expects evry client to send him a regular message evry period of time such if one client doesn't do it, then the server closes the connection.
QuestionAnother help request regarding data stream corruption
zubair_ahmed
22:13 20 Oct '06  
Still struggling with issue of data stream corrution though I have introduced a Critical section in every client context and thread can send or recive if they have that critical section but still I didn't get it working.

Thanks in Advance


Z.A

AnswerRe: Another help request regarding data stream corruption
zubair_ahmed
19:59 31 Oct '06  
See Below Topic: Streaming Fails Reason.

Z.A

QuestionStreaming Fails Reason
zubair_ahmed
6:53 18 Oct '06  
Thanx to the information provided by John M. Drescher, i have come to know that my data stream is getting corrpted by simaltaneous read/write on a single socket, that can be attributed to lack of thread syncronization.

Can someone please guide me how to solve this problem(data stream corruption), currently i have modified the server to have just one pending read for a client, therefore solving the out of order packets problem.

Thanx in Advance

Z.A

AnswerRe: Streaming Fails Reason [modified]
zubair_ahmed
19:42 31 Oct '06  
This solution is tested when you have just one pending receive or your receive operations are completing in order issued.

In OnClientReading find and replace the following line.
if (nSize && pContext->m_ReadBuffer.GetBufferLen() >= nSize)

with this one.

if (nSize && (pContext->m_ReadBuffer.GetBufferLen()-sizeof(int) >= nSize))

This reads the buffer at correct boundries and stops data corruption when message are being transferred very frequently in variable size chunks.




-- modified at 1:00 Wednesday 1st November, 2006

Z.A

QuestionWhere is the new release?
onirps
1:29 5 Oct '06  
Where can I download the new-release to fix the bugs???

Regards
GeneralProblems with stress testing this server
Daniel92009
8:24 21 Jul '06  
Hi,

I downloaded and stress-tested this server.

I immediately ran into trouble with WSAENOBUFS errors. If one simply ignores the errors the server still sort of works, but it leaks the OVERLAPPEDPLUS structures associated with the failed WSARecv function call. I tried immediately freeing that structure... this resulted in the server crashing on completion. I did a kluge to put pointers in a large FIFO and free them when the FIFO was 90% full (and freeing the rest at exit). That stopped the memory leak and the crash.

I compared to the Microsoft SDK example. The biggest difference seems to be that this server allocates an OVERLAPPEDPLUS structure for each operation. The Microsoft SDK sample uses one OVERLAPPEDPLUS structure (they use a different name for the structure) for each client and they re-use it for each operation for a client (which means they don't read and write at the seme time for the same client). The Microsoft sample executes my stress test (send/echo/verify/send...) 3 times as fast as this server and it does not have the problem with the ESAENOBUFS error.

My guess is that the problem with this server that causes the WSAENOBUFS error is allocating the OVERLAPPEDPLUS object for each operation. Apparently in a situation with lots of packets coming and going this can result in a lot of OVERLAPPEDPLUS objects floating around and produce this error. Also, memory allocations are expensive. I'm guessing that the new and delete operations cause the 3X difference in performance in my stress test, but that could also be a result of re-doing operations that fail as a result of the WSAENOBUFS error.

-Daniel Hale
Sigh


AnswerRe: Problems with stress testing this server
patricklavoie
2:58 1 Dec '06  
Good morning,

The problem is not that the server allocates an OVERLAPPEDPLUS structure for each operation. The problem is that it does WSARecv after *each* operation. After the IOInitialize, it does a read (that makes sense). After a IORead it also does wait for another read (which also makes sense), but after a write operation is also does a read OMG , and that's where the problem is -- it should not issue a new read because there is already one read waiting in the queue. So after a short period of time, you end up having way too many reads waiting in the queue.

-Pat
AnswerRe: Problems with stress testing this server
Daniel92009
11:55 3 Dec '06  
Pat,

Thanks for contributing some more insight to this problem!

I wonder if it would be possible for you to post a modified version of this code which does not exhibit the WSAENOBUFS error.

It will be interesting to see how the code performs (compared to the SDK code) once this problem is fixed. Perhaps the slower performance is almost all due to the coding error and not due to repeated allocations of the OVERLAPPEDPLUS structure.

-Daniel


Last Updated 23 Sep 2001 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010