Click here to Skip to main content
15,895,746 members
Articles / Desktop Programming / MFC
Article

IOCPNet - Ultimate IOCP

Rate me:
Please Sign up or sign in to vote.
3.95/5 (50 votes)
17 Jan 20068 min read 279.8K   17.3K   138   88
Easy to use, high performance, large data transfer by using IO Completion Port.

Introduction

There are many articles on IOCP (Input/Output Completion Port). But they are not easy to understand because the IOCP technique itself has some arcane things and it doesn't have related standard documents which have enough explanation or code samples. So, I decided to make an IOCP sample (OIOCPNet) in high performance and write a document that deals with the way IOCP operates and its related key issues.

Objectives

I focused on:

  1. More than 65,000 concurrent connections (the maximum port number (unsigned short(65535)) of IP version 4).
  2. Function to transfer more than thousand bytes through the network.
  3. Easy method for users of the OIOCPNet class.

Key ideas to achieve the objectives

IOCP

Yeah, the first thing is IOCP. Well, why should we use IOCP? If we use the well known select function (with FD_SET, FD_ZERO, ...), we can't help looping to detect socket events which means that the socket has some received or sent data packets. And when we develop a game server or a chat server, a socket is used as an ID of the user action. So to find the user data on the server, we use a finding loop or hash tables with the socket number. Loops are very serious in making the server slow when the number of users is more than tens of thousands. But with IOCP, we need not do these loops. Because IOCP detects socket events at the kernel level and IOCP provides the mechanism to associate a socket (i.e. completion port) with a user data pointer directly. In short, with IOCP we can avoid loops and get the user data on the server side faster.

AcceptEx

By using Accept (or WSAAccept) we get WSAENOBUFS (10055) error, when the number of (almost) concurrent connections is more than 30,000 (it depends on the system resource). The reason for the error is that the system can't catch up with preparing the system resource for a socket structure as fast as connections are made. So we should find a way to make socket resources before we use them, and AcceptEx is the answer. The main advantage of AcceptEx is just this - preparing sockets before use! The other features of AcceptEx are pesky, and not understandable. (See MSDN Library.)

Static memory

The use of static memory (or pre-allocated memory) on server side applications is somewhat natural and crucial. When we receive or send packets, we must use static memory. In OIOCPNet, I use my own class (OPreAllocator) to get the pre-allocated memory area.

Sliced data chunk

Have you ever met with a situation where you had to sent a large data packet (more than thousand bytes) using one function call (like WriteFile, WSASend or send) and then the receiver didn't get the data packet you had sent? If you have met, then you might have met with the problem of network hardware (routers, HUBs, and so on) and buffer - MTU (Most Transfer Unit). The least MTU of network hardware is 576 bytes, so it is better that the large packet is sliced into many smaller packets less than the least MTU size. In OIOCPNet, I have defined the unit data block size as BUFFER_UNIT_SIZE (512 bytes). If you need a bigger one, you can change it.

Don't spawn many threads

If your server logic has some kind of IO operations, it may be better to spawn many threads. Because threading is meaningful only if the environment has IO operations. But don't forget 'the more threads, the more efforts of CPU for thread scheduling'. If there are more than 10,000 threads and they are running, the operating system and the processes can't hold their normal running state, because CPU pumps all its capability into finding which thread runs next time - scheduling or context switching. For reference, OIOCPNet has two (experimental value) threads per CPU and doesn't spawn any more.

OIOCPNet - the Key

OIOCPNet is the class applied with the above ideas. The operation steps of OIOCPNet are the following:

  1. OIOCPNet prepares its resources like pre-allocated memory area, completion port, other handles and so on.
  2. OIOCPNet makes a listening socket.
  3. OIOCPNet pre-generates sockets (65,000, but I defined it as 30,000 in IOCPNet.h for OS not Win 2003, change MAX_ACCEPTABLE_SOCKET_NUM depending on your needs) and its own buffered sockets, and then puts them into acceptable mode by using AcceptEx.
  4. When a user tries to connect to the server, OIOCPNet accepts it.
  5. When a socket reads data packets, OIOCPNet puts them into its pre-allocated reading slots and then puts an event for use of the server logic.
  6. When the sever logic writes data packets, OIOCPNet puts them into its pre-allocated writing blocks and then calls PostQueuedCompletionStatus so that a worker thread sends the data packets.
  7. When a user closes the connection, OIOCPNet closes the socket but it doesn't release the memory of the buffered socket, just re-assigns it.

The following picture shows the entire mechanism of OIOCPNet. It is very simple:

Image 1

Key points when writing the code

LPOVERLAPPED parameter

GetQueuedCompletionStatus and PostQueuedCompletionStatus lack the parameter to present the result of the IO operation. Besides the default parameters of GetQueuedCompletionStatus (or PostQueuedCompletionStatus), OIOCPNet needs more parameters for classifying the type of IO operation and a little additional information. So I used the LPOVERLAPPED parameter of GetQueuedCompletionStatus and PostQueuedCompletionStatus as my custom parameter like the thread parameter (LPVOID lpParameter, the fourth parameter) of CreateThread. OVERLAPPEDExt is the extended type of OVERLAPPED structure and it has more information. See the definition code below:

struct OVERLAPPEDExt
{
  OVERLAPPED OL;
  int IOType;
  OBufferedSocket *pBuffSock;
  OTemporaryWriteData *pTempWriteData;
}; // OVERLAPPEDExt

Life time of a variable used by an asynchronous function

In OIOCPNet, WSASend and WSARecv operate in an asynchronous way. So take care of the life time of the variables passed to the asynchronous functions.

// pTempWriteData will be freed when send IO ends.
pTempWriteData = (OTemporaryWriteData *)
m_SMMTempWriteData.Allocate(sizeof (OTemporaryWriteData));

...

// the size of pData 
// (the second parameter of GetBlockNeedsExternalLock)
// does not be over BUFFER_UNIT_SIZE.
m_pWriteBlock->GetBlockNeedsExternalLock
  (&pBuffSockToWrite, pTempWriteData->Data, 
  &ReadSizeToWrite, &DoesItHaveMoreSequence);

...

try
{
  ResSend = WSASend(pTempWriteData->Socket, 
    &pTempWriteData->DataBuf, 1, 
    &WrittenSizeUseless, Flag, 
    (LPOVERLAPPED)&pTempWriteData->OLExt, 0);
}

In the above code snippet, pTempWriteData is allocated for being used by WSASend. WSASend returns immediately, but pTempWriteData must be alive until the real sending operation of WSASend at the OS level is over. When the sending operation is over, then release pTempWriteData like this:

if (0 != pOVL)
{
  if ((IO_TYPE_WRITE_LAST == 
    ((OVERLAPPEDExt *)pOVL)->IOType 
    || IO_TYPE_WRITE == 
    ((OVERLAPPEDExt *)pOVL)->IOType))
  {
    if (0 != ((OVERLAPPEDExt *)pOVL)->pTempWriteData)
    {
      m_SMMTempWriteData.Free(
        ((OVERLAPPEDExt *)pOVL)->pTempWriteData);
    }    
    continue;
  }
}

The uniqueness of socket

A normal SOCKET number itself is unique. But the OS assigns the socket number arbitrarily, the latest closed socket number could be re-assigned to a new socket connected right next to it. So it could be that:

  1. A socket is assigned with a socket number 3947 (as an example) for new connection.
  2. The server logic reads data packets using the socket.
  3. The socket is closed suddenly for user closing while the server logic doesn't know about that fact.
  4. A different socket is assigned with the same socket number 3947, (the resurrection of that socket number).
  5. The server logic writes data packets to the socket, the server meets with no problem to do so. But the data packets might be sent to a different user as a result.

To prevent this troublesome situation, OIOCPNet manages its own socket number SocketUnique, a member of OBufferedSocket.

How to use OIOCPNet

Usage

The usage of OIOCPNet is simple. See the following code snippet:

int _tmain(int argc, _TCHAR* argv[])
{
  ...

  WSAStartup(MAKEWORD(2,2), &WSAData);

  pIOCPNet = new OIOCPNet(&EL);
  pIOCPNet->Start(TEST_IP, TEST_PORT);
    
  hThread = CreateThread(0, 0, LogicThread, 
    pIOCPNet, 0, 0);

  ...
  
  InterlockedExchange((long *)&g_dRunning, 0);
  WaitForSingleObject(hThread, INFINITE);

  ...

  pIOCPNet->Stop();
  delete pIOCPNet;

  WSACleanup();

  return 0;
} // _tmain()

DWORD WINAPI LogicThread(void *pParam)
{
  ...
  
  while (1 == InterlockedExchange((long *)&g_dRunning, 
    g_dRunning))
  {
    iRes = pIOCPNet->GetSocketEventData(WAIT_TIMEOUT_TEST,
      &EventType, &SocketUnique, &pReadData, 
      &ReadSize, &pBuffSock, &pSlot, &pCustData);
    if ...
    else if (RET_SOCKET_CLOSED == iRes)
    {
      // release pCustData.
      continue;
    }

    // Process main logic.
    MainLogic(pIOCPNet, SocketUnique, pBuffSock, 
      pReadData, ReadSize);
        
    pIOCPNet->ReleaseSocketEvent(pSlot);
  }

  return 0;
} // LogicThread()

void MainLogic(OIOCPNet *pIOCPNet, DWORD SocketUnique,
  OBufferedSocket *pBuffSock, BYTE *pReadData, DWORD ReadSize)
{
  pIOCPNet->WriteData(SocketUnique, pBuffSock, 
    pReadData, ReadSize); // echo.
} // MainLogic()

We can set the IP address and port number with Start which prepares the necessary resources. In logic thread we can get the data packets with GetSocketEventData and we can send data packets with WriteData. After using the data, release pSlot has the pointer (pReadData) that indicates the data packet with ReleaseSocketEvent. Finally, when the main logic ends, call Stop to that OIOCPNet which releases its resource. That's all.

Take care of read and write at client side

OIOCPNet slices a large data packet into smaller packets. It adds 4-bytes packet length information to the original data packet. But the slicing and assembling operation is abstracted by GetSocketEventData and WriteData of OIOCPNet. So, we need not care about it. But you should use TCPWrite and TCPRead (see TCPFunc.h, TCPFunc.cpp in NetTestClient project) to communicate with OIOCPNet when you make the client side application connect to the server.

Test

My report

I compiled OIOCPNet in .NET 1.1 environment. (also VC++ 6.0, blocking #include "stdafx.h"). And I located the server (IOCPNetTest) in Windows 2003 Enterprise Edition and located the test clients (NetTestClient) in several machines. The specification and performance result:

  • Test Server - OS: Windows 2003 Enterprise Edition
  • Test Server - CPU: Intel 2.8GHz (x 2)
  • Test Server - RAM: 2GB
  • Test Client: Windows XP (3~5 machines used, changing thread number)
  • Result: about 15% ~ 20% CPU Usage (when established TCP connection number is 65,000)

Other tips

When a client can't generate more than 5,000 (~ 2,000) connections to the server, check the registry. The checking step includes:

  1. Run regedit
  2. Open 'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters'
  3. Add 'MaxUserPort' as DWORD value and set the value (maximum value is 65534 in decimal number).

If you need to increase the thread number of your test client to more than 2,0xx, revise the function stack size of the client application using compile option '/STACK:BYTE' or a parameter of CreateThread. Before you run the test server and test client, set TEST_IP and TEST_SERVER_IP with the IP address of your server. To see the connection number, use performance monitor or 'netstat -s' in command prompt.

History

  • August, 2005
    • IOCPNet first version.
    • Fixed a bug during the ending process.
    • Added a new demo and src, using Windows thread pool. (Because there've been some requests for the sample uses BindIoCompletionCallback.)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Chief Technology Officer
Korea (Republic of) Korea (Republic of)
Hi everyone.
I am posting things at http://sleepyrea.blogspot.com/

Comments and Discussions

 
Questionupdated??? please give a update comments?? Pin
xiudo7-Nov-05 15:26
xiudo7-Nov-05 15:26 
Answerand can you give a function to send data to all the online client??? Pin
xiudo7-Nov-05 15:33
xiudo7-Nov-05 15:33 
QuestioniRes = TCPRead(Sockets[Index], 0, 0, READ_TIMEOUT_TEST, &ErrorCode);??? Pin
xiudo6-Nov-05 3:08
xiudo6-Nov-05 3:08 
AnswerRe: iRes = TCPRead(Sockets[Index], 0, 0, READ_TIMEOUT_TEST, &ErrorCode);??? Pin
xiudo7-Nov-05 4:54
xiudo7-Nov-05 4:54 
QuestionFixup the OIOCPNet::m_ActiveConnectionNum will less than zero???? Pin
xiudo5-Nov-05 5:47
xiudo5-Nov-05 5:47 
Generalgreat job,need to make progress Pin
tiger9992-Nov-05 21:10
tiger9992-Nov-05 21:10 
GeneralRe: great job,need to make progress Pin
xiudo3-Nov-05 22:04
xiudo3-Nov-05 22:04 
GeneralYou probably don't want to do it like this... Pin
Len Holgate29-Oct-05 4:25
Len Holgate29-Oct-05 4:25 
I have a few issues with this article. I'm sorry that this comment appears a little harsh but there is just too much that is just plain wrong in the article to allow it to go unchallenged.

1) AcceptEx - As far as my tests have shown, there's no difference in the resource limits applied to WSAAccept or AcceptEx. With WSAAccept the failure occurs when you call the function because it's a synchronous call, with AcceptEx the failure is reported via the IOCP that is assigned to the socket that you're calling AcceptEx on. Both face the same non-paged pool limits. It might have been useful to present both styles of server so that you could prove your belief that AcceptEx doesn't suffer these problems. I have run tests with my IOCP framework's Accept and AcceptEx based servers and both can achieve 64,000+ concurent connections. Also, the failure isn't because the system can't keep up, it's because there's a finite limit of "non-paged memory" and each socket uses some (as do drivers, etc). When you use it all up you can't create any more sockets or perform most overlapped operations on them. See Network Programming for Microsoft Windows for more details of the limits that you might face when writing servers that need to service lots of connections.

3) Static memory - I think you need to define your terms here. What do you think the term "static memory" means? Preallocating the resources used by 65000 connections isn't really a good idea. First there's the start up time of the server to take into consideration. Second there's the effect that such a program has on other programs trying to run on the same machine. Third the server 'wastes' resources if it isn't always 100% loaded (ie when there's only 10000 connections we're wasting 55000 socket resources...) Fourth non-paged pool is a fixed sized per machine resource; allocating lots of it because you might need it later is a bad idea. Fifth your server can only run on a machine with enough memory to handle the preconfigured maximum number of connections... Personally I'd do it differently. I'd possibly preallocate some resources and then cache resources as they're released so that they can be reused. You can then tune the caching to take into account your expected usage.

4) Sliced data chunk - This is just wrong when you're talking about TCP connections (which are what the sample code creates). You may run into the problem stated when using large packets for UDP sockets but you're not. You can usually tune the UDP packet size if you need to.

5) Don't spawn many threads - This is the KEY point about using IOCP. It restricts the number of threads you need and it restricts the context switching that occurs. The whole point of overlapped IO is that you don't NEED to spawn more threads for your socket IO or for any IO that can use IOCP. For non IOCP enabled IO (such as database connections) it's probably best to use a thread pool rather than spawning a thread for each operation...

6) The uniqueness of socket - What? a SOCKET is an OPAQUE DATA TYPE. It's like a handle. You, as an application programmer need never know or worry about the value. If a socket closes "suddenly" you're told about it and the socket isn't actually closed until YOU close it. If the remote side of the connection closes their side then you'll likely be told about it (assuming you have a read pending, or you try to write to it). The socket that YOU have is valid until YOU close it. The OS isn't going to reuse the socket until you have closed it. Once you close it you shouldn't reuse the SOCKET because it's not yours anymore...

I think that's enough to be going on with for now. I have some comments on the code too but I always have comments on the code...

I would strongly suggest that anyone planning to base a server on this code read some of the other IOCP articles on here first. If dealing with lots (10000+) of concurrent connections is important to you then I suggest you read at least the scalability chapter of Network Programming for Microsoft Windows.

Discuss...

Len Holgate
www.jetbyte.com
The right code, right now.
GeneralRe: You probably don't want to do it like this... Pin
sleepyrea (new)30-Oct-05 3:36
sleepyrea (new)30-Oct-05 3:36 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate30-Oct-05 7:16
Len Holgate30-Oct-05 7:16 
GeneralRe: You probably don't want to do it like this... Pin
tiger99931-Oct-05 18:50
tiger99931-Oct-05 18:50 
GeneralRe: You probably don't want to do it like this... Pin
sleepyrea (new)31-Oct-05 18:54
sleepyrea (new)31-Oct-05 18:54 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate31-Oct-05 20:47
Len Holgate31-Oct-05 20:47 
GeneralRe: You probably don't want to do it like this... Pin
tiger9991-Nov-05 5:55
tiger9991-Nov-05 5:55 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate1-Nov-05 6:18
Len Holgate1-Nov-05 6:18 
GeneralRe: You probably don't want to do it like this... Pin
tiger9991-Nov-05 15:05
tiger9991-Nov-05 15:05 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate1-Nov-05 21:26
Len Holgate1-Nov-05 21:26 
GeneralRe: You probably don't want to do it like this... Pin
tiger9992-Nov-05 13:59
tiger9992-Nov-05 13:59 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate2-Nov-05 21:17
Len Holgate2-Nov-05 21:17 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate3-Nov-05 11:30
Len Holgate3-Nov-05 11:30 
GeneralRe: You probably don't want to do it like this... Pin
tiger9993-Nov-05 14:11
tiger9993-Nov-05 14:11 
GeneralRe: You probably don't want to do it like this... Pin
tiger9993-Nov-05 14:15
tiger9993-Nov-05 14:15 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate3-Nov-05 21:34
Len Holgate3-Nov-05 21:34 
GeneralRe: You probably don't want to do it like this... Pin
Len Holgate3-Nov-05 21:26
Len Holgate3-Nov-05 21:26 
GeneralRe: You probably don't want to do it like this... Pin
tiger9994-Nov-05 14:45
tiger9994-Nov-05 14:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.