The following source was built using Visual Studio 6.0 SP5 and Visual Studio .NET. You need to have
a version of the Microsoft Platform
Note that the debug builds of the code waste a lot of CPU cycles due to the the debug trace output.
It's only worth profiling the release builds.
Writing a high performance server that runs on Windows NT and uses sockets to communicate with the
outside world isn't that hard once you dig through the API references. What's more most of the code is
common between all of the servers that you're likely to want to write. It should be possible to wrap
all of the common code up in some easy to reuse classes. However, when I went looking for some classes
to use to write my first socket server all of the examples and articles that I found required the user
to pretty much start from scratch or utilise "cut and paste reuse" when they wanted to use the code
in their own servers. Also the more complicated examples, ones that used IO completion ports for example,
tended to stop short of demonstrating real world usage. After all, anyone can write an echo server...
The aim of this article is to explain the set of reusable classes that I designed for writing socket
servers and show how they can be used with servers which do more than simply echo every byte they
receive. Note that I'm not going to bother explaining the hows and why's of IO completion ports etc,
there are plenty of references available.
What does a socket server need to do?
A socket server needs to be able to listen on a specific port, accept connections and read and write
data from the socket. A high performance and scaleable socket server should use asynchronous socket IO
and IO completion ports. Since we're using IO completion ports we need to maintain a pool of threads to
service the IO completion packets. If we were to confine ourselves to running on Win2k and above we could
QueueUserWorkItem api to deal with our threading requirements but to enable us to run on the
widest selection of operating systems we have to do the work ourselves.
Before we can start accepting connections we need to have a socket to listen on. Since there are many
different ways to set such a socket up, we'll allow the user's derived class to create this socket by
providing a virtual function as follows:
virtual SOCKET CreateListeningSocket(
unsigned long address,
unsigned short port);
The server class provides a default implementation that's adequate in most circumstances. It looks
something like this:
unsigned long address,
unsigned short port)
SOCKET s = ::WSASocket(AF_INET, SOCK_STREAM, IPPROTO_IP, NULL, 0,
if (s == INVALID_SOCKET)
CSocket::InternetAddress localAddress(address, port);
Note that we use a helper class,
CSocket, to handle setting up our listening socket. This class
acts as a "smart pointer" for sockets, automatically closing the socket to release resources when it
goes out of scope and also wraps the standard socket API calls with member functions that throw
exceptions on failure.
Now that we have a socket to listen on we can expect to start receiving connections. We'll use
WSAAccept() function to accept our connections as this is easier to use than the higher
AcceptEx() we'll then compare the performance characteristics with
AcceptEx() in a later article.
When a connection occurs we create a
Socket object to wrap the
We associate this object with our IO completion port so that IO completion packets will be generated for
our asynchronous IO. We then let the derived class know that a connection has occurred by calling the
OnConnectionEstablished() virtual function. The derived class can then do whatever it wants with the
connection, but the most common thing would be to issue a read request on the socket after perhaps
writing a welcome message to the client.
const std::string welcomeMessage("+OK POP3 server ready\r\n");
Since all of our IO operations are operating asynchronously they return immediately to the calling code.
The actual implementation of these operations is made slightly more complex by the fact that any outstanding
IO requests are terminated when the thread that issued those requests exits. Since we wish to ensure that
our IO requests are not terminated inappropriately we marshal these calls into our socket server's IO
thread pool rather than issuing them from the calling thread. This is done by posting an IO completion
packet to the socket server's IO Completion Port. The server's worker threads know how to handle
4 kinds of operation: Read requests, read completions, write requests and write completions. The request
operations are generated by calls to
PostQueuedCompletionStatus and the completions are generated
when calls to
WSASend complete asynchronously.
To be able to read and write data we need somewhere to put it, so we need some kind of memory buffer.
To reduce memory allocations we could pool these buffers so that we don't delete them once they're done
with but instead maintain them in a list for reused. Our data buffers are managed by an allocator which
is configured by passing arguments to the constructor of our socket server. This allows the user to set
the size of the IO buffers used as well as being able to control how many buffers are retained in the
list for reuse. The
CIOBuffer class serves as our data buffer follows the standard IO
Completion Port pattern of being an extended "overlapped" structure.
As all good references on IO Completion Ports tell you, calling
your thread until a completion packet is available and, when it is, returns you a completion key,
the number of bytes transferred and an "overlapped" structure. The completion key represents 'per device'
data and the overlapped structure represents 'per call' data. In our server we use the completion key
to pass our
Socket class around and the overlapped structure to pass our data buffer. Both our
Socket class and our data buffer class allow the user to associate 'user data' with them. This
is in the form of a single unsigned long value (which could always be used to store a pointer to a larger structure).
The socket server's worker threads loop continuously, blocking on their completion port until work
is available and then extracting the
CIOBuffer from the completion
data and processing the IO request. The loop looks something like this:
DWORD dwIoSize = 0;
Socket *pSocket = 0;
OVERLAPPED *pOverlapped = 0;
m_iocp.GetStatus((PDWORD_PTR)&pSocket, &dwIoSize, &pOverlapped);
CIOBuffer *pBuffer = CIOBuffer::FromOverlapped(pOverlapped);
case IO_Read_Request :
case IO_Read_Completed :
case IO_Write_Request :
case IO_Write_Completed :
Read and write requests cause a read or write to be performed on the socket. Note that the actual
read/write is being performed by our IO threads so that they cannot be terminated early due to the
thread exiting. The
WriteCompleted() methods are called
when the read or write actually completes. The worker thread marshals these calls into the socket
server class and the socket server provides two virtual functions to allow the caller's derived class
to handle the situations. Most of the time the user will not be interested in the write completion,
but the derived class is the only place that read completion can be handled.
virtual void ReadCompleted(
CIOBuffer *pBuffer) = 0;
virtual void WriteCompleted(
Our client can provide their own worker thread if they wish, it should derive from the socket server's
worker thread. If the client decides to do this then we need to have a way for the server to be configured
to use this derived worker thread rather than the default one. Whenever the server creates a worker thread
(and this only occurs when the server first starts as the threads run for the life time of the server) it
calls the following virtual function:
virtual WorkerThread *CreateWorkerThread(
If we want to provide our own implementation of the worker thread then we should override this function
and create our thread object and return it to the caller.
The send and receive sides of a socket can be closed independently. When a client closes the send side of
its connection to a server any reads pending on the socket on the server will return with a value of 0. The
derived class can opt to receive a notification when the client closes the send side of its connection by
overriding the following virtual function.
virtual void OnConnectionClientClose(
The server will only receive this notification once even if it had multiple reads outstanding on the socket
when the client closed it. The server will not receive 0 length read completions. The closure of the client's
send side of the socket, the server's receive side, does not prevent the server from sending more data to the
client it simply means that the client has no more data to send to the server.
The server can shutdown the connection using the
Shutdown() method on the
class. Like the underlying
WinSock2 function, shutdown takes a value that indicates which parts
of the connection should be shutdown. If the server has finished sending data then it can call
SD_SEND to terminate the send side of the connection,
SD_RECEIVE to terminate the receive side, or with
SD_BOTH to terminate both sides.
Shutting down the send side of the server's connection will only actually occur once all outstanding writes have completed. This allows the server developer to write code such as that shown below without having to worry about there being a race condition between the write being processed by the IO thread and the shutdown occurring.
The socket is closed when there are no outstanding reads or writes on the socket and no references to the socket
are held. At this point the derived class is notified by a call to the
OnConnectionClosing() virtual function.
virtual bool OnConnectionClosing(
The server's default implementation of this simply returns false which means that derived class doesn't want to
be responsible for closing the socket. The socket server class then closes the socket after first turning off the
linger option. This causes an abortive shutdown and sent data may be lost if it hasn't all been sent when the close
occurs. The derived class may elect to handle socket closure itself and, if so, should override
OnConnectionClosing() and return true to indicate that it has handled the closure. Before returning
the socket should either have been explicitly closed by the derived class or
AddRef() should be
called and a reference held so that the socket can be closed later. The derived class will only get one chance to intervene in the closure of the socket, if it fails to close the socket then the socket will be abortively closed without further notifications when the final reference to the socket is released.
OnConnectionClosing() is only called if the connection isn't explicitly closed by
the server. If the server wishes to explicitly close the connection then it can do so by simply calling the
AbortiveClose() methods of the
A simple server
We now have a framework for creating servers. The user simply needs to provide a class that is derived from
CSocketServer and handles the
situations. The class could look something like this:
class CMySocketServer : CSocketServer
unsigned long addressToListenOn,
unsigned short portToListenOn);
virtual void OnConnectionEstablished(
virtual bool OnConnectionClosing(
virtual void ReadCompleted(
have already been presented above. Which leaves us with the implementation of our socket server's
ReadCompleted() method. This is where the server handles incoming data and, in the case
of a simple Echo server ;) it could be as simple as this:
YAES - Yet another echo server
A complete echo server is available for download
JBSocketServer1.zip. The server simply echoes the incoming byte stream back to the client. In addition to
implementing the methods discussed above the socket server derived class also implements several
'notification' methods that the server calls to inform the derived class of various internal goings on.
The echo server simply outputs a message to the screen (and log file) when these notifications occur
but the idea behind them is that the derived class can use them to report on internal server state
via performance counters or suchlike.
You can test the echo server by using telnet. Simply telnet to localhost on port 5001 (the port that
the sample uses by default) and type stuff and watch it get typed back at you. The server runs until a
named event is set and then shuts down. The very simple Server Shutdown program, available
here, provides an
off switch for the server.
A slightly more real world example
Servers that do nothing but echo a byte stream are rare, except as poor examples. Normally a server
will be expecting a message of some kind, the exact format of the message is protocol specific but two
common formats are a binary message with some form of message length indicator in a header and an ASCII
text message with a predefined set of 'commands' and a fixed command terminator, often "\r\n". As soon
as you start to work with real data you are exposed to a real-world problem that is simply not an
issue for echo servers. Real servers need to be able to break the input byte stream provided by the
TCP/IP socket interface into distinct commands. The results of issuing a single read on a socket
could be any number of bytes up to the size of the buffer that you supplied. You may get a single,
distinct, message or you may only get half of a message, or 3 messages, you just can't tell. Too often
inexperienced socket developers assume that they'll always get a complete, distinct, message and often
their testing methods ensure that this is the case during development.
Chunking the byte stream
One of the simplest protocols that a server could implement is a packet based protocol where the
first X bytes are a header and the header contains details of the length of the complete
packet. The server can read the header, work out how much more data is required and keep reading until
it has a complete packet. At this point it can pass the packet to the business logic that knows how to
process it. The code to handle this kind of situation might look something like this:
pBuffer = ProcessDataStream(pSocket, pBuffer);
done = true;
const size_t used = pBuffer->GetUsed();
if (used >= GetMinimumMessageSize())
const size_t messageSize = GetMessageSize(pBuffer);
if (used == messageSize)
pBuffer = 0;
done = true;
else if (used > messageSize)
CIOBuffer *pMessage = pBuffer->SplitBuffer(messageSize);
done = false;
else if (messageSize > pBuffer->GetSize())
Output(_T("Error: Buffer too small\nExpecting: ") + ToString(messageSize) +
_T("Got: ") + ToString(pBuffer->GetUsed()) + _T("\nBuffer size = ") +
ToString(pBuffer->GetSize()) + _T("\nData = \n") +
DumpData(pBuffer->GetBuffer(), pBuffer->GetUsed(), 40));
done = true;
The key points of the code above are that we need to know if we have at least enough data to start
looking at the header, if we do then we can work out the size of the message somehow. Once we know that
we have the minimum amount of data required we can work out if we have all the data that makes up this
message. If we do, great, we process it. If the buffer only contains our message then we simply process
the message and since processing simply involves us posting a write request for the data buffer we return
0 so that the next read uses a new buffer. If we have a complete message and some extra data then we
split the buffer into two, a new one with our complete message in it and the old one which has the extra
data copied to the front of the buffer. We then pass our complete message to the business logic to handle
and loop to handle the data that we had left over. If we don't have enough data we return the buffer and
Read() that we issue in
ReadCompleted() reads more data into the same buffer,
starting at the point that we're at now.
Since we're a simple server we have a fairly important limitation, all our messages must fit into
the IO buffer size that our server is using. Often this is a practical limitation, maximum message
sizes can be known in advance and by setting our IO buffer size to be at least our maximum message size
we avoid having to copy data around. If this isn't a viable limitation for your server then you'll need
to have an alternative strategy here, copying data out of IO buffers and into something big enough to
hold your whole message, or, processing the message in pieces...
In our simple server if the message is too big then we simply shutdown the socket connection, throw
away the garbage data, and wait for the client to go away...
So how do we implement
obviously it's protocol dependant, but for our packet echo server we do it like this:
size_t CMySocketServer::GetMinimumMessageSize() const
size_t CMySocketServer::GetMessageSize(CIOBuffer *pBuffer) const
size_t messageSize = *pBuffer->GetBuffer();
Reference counted buffers?
You may have noticed that in the case where we had a message and some extra data we called
SplitBuffer() to break the complete message out into its own buffer, and then, once we'd
dealt with it, we called
Release(). This is a little of the implementation of the
socket server's buffer allocator poking through. The buffers are reference counted. The only time we
need to worry about this is if we create a new buffer using
SplitBuffer, or if we decide
AddRef() on the buffer because we wish to pass it off to another thread for
processing. We'll cover this in more detail in the
but the gist of it is that every time we post a read or a write the buffer's reference count goes up
and every time a read or write completes the count goes down, when there are no outstanding references
the buffer goes back into the pool for reuse.
A packet echo server
A packet based echo server is available for download
JBSocketServer2.zip. The server expects to receive packets of up to 256 bytes which have a 1 byte header.
The header byte contains the total length of the packet (including the header). The server reads
complete packets and echoes them back to the client. You can test the echo server by using telnet, if
you're feeling clever ;) Simply telnet to localhost on port 5001 (the port that the sample uses by
default) and type stuff and watch it get typed back at you. (Hint, CTRL B is 2 which is the smallest
packet that contains data).
A real internet RFC protocol
Some of the common internet protocols, such as RFC 1939 (POP3), use a crlf terminated ASCII text
stream command structure. An example of how such a server might be implemented using the
classes presented here can be found in
The classes presented here provide an easy way to develop scalable socket servers using IO completion
and thread pooling in such a way that the user of the classes need not concern themselves with these
low level issues. To create your own server simply derive from
CSocketServer and handle connection
establishment and the byte stream chunking and business logic. You can also opt to handle any of the
notifications that you require. Running your server is as simple as this:
"+OK POP3 server ready\r\n",
INADDR_ANY, 5001, 10, 10, 1024);
Your code can then do whatever it likes and the socket server runs on its own threads. When you are
finished, simply call:
And the socket server will shutdown.
In the next article we
address the issue of moving the business logic out of the IO thread pool and
into a thread pool of its own so that long operations don't block the IO threads.
- 21st May 2002 - Initial revision.
- 27th May 2002 - Added pause/resume functionality to all servers and the server shutdown program. Use CSocket to protect from resource leaks when creating the listening socket. Refactored the Socket and CIOBuffer classes so that common list management code is now in CNodeList and common user data code is now in COpaqueUserData.
- 29th May 2002 - Linting and general code cleaning
- 18th June 2002 - Removed call to ReuseAddress() during the creation of the listening socket as it not required - Thanks to Alun Jones for pointing this out to me.
- 28th June 2002 - Adjusted how we handle socket closure and added the graceful shutdown and socket closure sections to the article.
- 30th June 2002 - Removed the requirement for users to subclass the worker thread class. All of the work
can now be done by simply subclassing the socket server class.
- 15th July 2002 - Socket closure notifications now occur when the server shuts down whilst there are active connections. SocketServer can now be set to ensure read and write packet sequences.
- 12th August 2002 - Removed the race condition in socket closure - Thanks to David McConnell for pointing this out.
Derived class can receive connection reset and connection error notifications. Socket provides a means to determine if
send/receive are connected. General code cleaning and lint issues.
Other articles in the series
A reusable socket server class
Business logic processing in a socket server
Speeding up socket server connections with AcceptEx
Handling multiple pending socket read and write operations
Testing socket servers with C# and .Net
A high performance TCP/IP socket server COM component for VB