Solution for TCP/IP client socket message boundary problem

Alex Talazar

4.64/5 (14 votes)

Oct 10, 2005

4 min read

127038

3623

Solution for unprotected TCP/IP message boundary problem.

Download source - 22.6 Kb

Introduction

One of the biggest pitfalls for novice network programmers is using message-oriented communication with TCP connections. The most important point to remember about using TCP for network communication is that TCP does not respect message boundaries. This article demonstrates how to solve this problem.

A Typical TCP Unprotected Message Boundary

The typical novice network programmer, having just read about the wonderful benefits of TCP programming, proceeds to create a client/server application that passes messages back and forth between two devices on the network using TCP sockets. However, not realizing the inherent disregard of message boundaries suffered by stream protocols such as TCP, the programmer writes a series of Send() methods on the client, along with a corresponding series of Receive() methods on the server.

The critical drawback to using this way to communicate is that you are not guaranteed that the data from each individual Send() method will be read by each individual Receive() method.

All of the data read by the Receive() method is not actually read directly from the network. Rather the Receive() method reads data from the TCP buffer internal to the system. As new TCP packets are received from the network, they are placed sequentially into the TCP buffer. When the Receive method is called, it reads all the data available in the TCP buffer, not just the first packet’s worth.

Solving the TCP Message Problem

There are common ways to distinguish messages sent via TCP:

Always send fixed-sized messages
Send the message size with each message
Use a marker system to separate messages

The easiest but most costly way to solve the TCP message problem is to create a protocol that always transmits messages of fixed size. By setting all messages to the same size, the receiving TCP program can know without doubt when the entire message has been received from the remote device.

Another way to improve control when sending TCP messages is a message marker system. This system separates each message by a terminating character to specify the end of the message. As messages are received from the socket, the data is checked character by character for the occurrence of the marker character.

In this article I want to talk about using the message size solution.

There are many ways to include the message size within the message packet. The simplest way is to create a text representation of the message size and append it to the beginning of the message. For example, “My message” would be 10 bytes long and it would need the text value 10 placed before the message. This would make the transmitted message look like this:

000010My Message

The “000010” in the front of the message indicates how many bytes are in the actual message. The receiving device can read the first 6 bytes of the message and instantly know how many bytes to read for the complete message.

The client program issues the number of messages and places them into the TCP buffer to be sent to the server.

The TCP driver sends all of them one after another to the target machine.

The server’s driver receives some of them and places them into the system buffer. The system buffer might not be big enough to accommodate all of the messages, so it holds only those messages it can place into a buffer and all the other messages or parts of the messages wait to be placed in the buffer.

The program reads all the available information from the system buffer and parses it. Every message has a header with a length field so it can know exactly where the message begins and where it ends. All the messages are placed into a queue. If the parser finds that the system buffer holds partial messages, it places them in a special storage. Finally it returns to program the first message in the raw.

Every time when the program calls Receive(), it checks the availability of messages in a queue and returns one.

Once the queue is empty, the program reads the system buffer and receives all the information it could accommodate the last time. It appends the storage contents in front of the information held in the system buffer and parses it.

The sample application is based on a Socket class taken from the internet. Unfortunately, I can not provide more information about the source of the Socket class.