Click here to Skip to main content
Click here to Skip to main content
Go to top

TCP/IP Protocol Design: Message Framing

, 25 Jun 2009
Rate this:
Please Sign up or sign in to vote.
Why a single "Send" does not necessarily result in a single "Receive", and how to compensate.

(This post is part of the TCP/IP .NET Sockets FAQ.)

The Problem

One of the most common beginner mistakes for people designing protocols for TCP/IP is that they assume that message boundaries are preserved. For example, they assume a single "Send" will result in a single "Receive".

Some TCP/IP documentation is partially to blame. Many people read about how TCP/IP preserves packets - splitting them up when necessary and re-ordering and re-assembling them on the receiving side. This is perfectly true; however, a single "Send" does not send a single packet.

Local machine (loopback) testing confirms this misunderstanding, because usually when client and server are on the same machine, they communicate quickly enough that single "sends" do in fact correspond to single "receives". Unfortunately, this is only a coincidence.

This problem usually manifests itself when attempting to deploy a solution to the Internet (increasing latency between client and server) or when trying to send larger amounts of data (requiring fragmentation). Unfortunately, at this point, the project is usually in its final stages, and sometimes the application protocol has even been published!

True story: I once worked for a company that developed custom client/server software. The original communications code had made this common mistake. However, they were all on dedicated networks with high-end hardware, so the underlying problem only happened very rarely. When it did, the operators would just chalk it up to "that buggy Windows OS" or "another network glitch" and reboot. One of my tasks at this company was to change the communication to include a lot more information; of course, this caused the problem to manifest regularly, and the entire application protocol had to be changed to fix it. The truly amazing thing is that this software had been used in countless 24x7 automation systems for 20 years; it was fundamentally broken and no one noticed.

The Solution, Part 1 - Understanding

First, one must understand the abstraction of TCP/IP. From the application's perspective, TCP operates on streams of data, never packets. Repeat this mantra three times: "TCP does not operate on packets of data. TCP operates on streams of data."

There is no way to send a packet of data over TCP; that function call does not exist. Rather, there are two streams in a TCP connection: an incoming stream and an outgoing stream. One may read from the incoming stream by calling a "receive" method, and one may write to the outgoing stream by calling a "send" method. If one side calls "send" to send 5 bytes, and then calls "send" to send 5 more bytes, then there are 10 bytes that are placed in the outgoing stream. The receiving side may decide to read them one at a time from its receiving stream if it so wishes (calling "receive" 10 times), or it may wait for all 10 bytes to arrive and then read them all at once with a single call to "receive".

Sending data to the TCP stream is rather easy; all one has to do is call "send", and the appropriate bytes are queued to the outgoing stream. Receiving data from the TCP stream is a bit more tricky, because the "receive N bytes" operation will wait until at least one byte and at most N bytes arrive on the incoming stream before it returns. Note that the "receive N bytes" operation will complete even if it doesn't read all N bytes, giving the application a chance to act on partial data while the rest of the data bytes are in transit. In the real world, very few programs can process partial receives; almost all programs need a buffer to store partial receives until they have enough data to do meaningful work.

To repeat: TCP operates on streams, not on packets. However, most application protocols are based on the idea of "messages"; for example, a client may send a "Lookup X" message to the server, and the server will respond with an "X Data" or "X Not Found" message. Since TCP operates on streams, one must design a "message framing" protocol that will wrap the messages sent back and forth.

The Solution, Part 2 - Design

There are two approaches commonly used for message framing: length prefixing and delimiters.

Length prefixing prepends each message with the length of that message. The format (and length) of the length prefix must be explicitly stated; "4-byte signed little-endian" (i.e., "int" in C#) is a common choice. To send a message, the sending side first converts the message to a byte array and then sends the length of the byte array followed by the byte array itself.

Receiving a length-prefixed message is harder, because of the possibility of partial receives. First, one must read the length of the message into a buffer until the buffer is full (e.g., if using "4-byte signed little-endian", this buffer is 4 bytes). Then one allocates a second buffer and reads the data into that buffer. When the second buffer is full, then a single message has arrived, and one goes back to reading the length of the next message.

Delimiters are more complex to get right. When sending, any delimiter characters in the data must be replaced, usually with an escaping function. The receiving code cannot predict the incoming message size, so it must append all received data onto the end of a receiving buffer, growing the buffer as necessary. When a delimiter is found, the receiving side can apply an unescaping function to the receiving buffer to get the message. If the messages will never contain delimiters, then one may skip the escaping/unescaping functions.

A Brief Security Note

Whether using length-prefixing or delimiters, one must include code to prevent denial of service attacks. Length-prefixed readers can be given a huge message size; delimiting readers can be given a huge amount of data without delimiters. Either of these may result in an OutOfMemoryException, so one must include a maximum message size "sanity check" in the socket reading code.

The Solution, Part 3 - Code

A code sample for using length-prefixing is in its own blog post at http://nitoprograms.blogspot.com/2009/04/sample-code-length-prefix-message.html.

Another decent code example of length prefixing is on Jon Cole's blog, although he assumes all the messages are just ASCII strings.

Yet another example of length prefixing is in the Nito.Async library: the Nito.Async.Sockets.SocketPacketProtocol class can be used to send or receive length-prefixed binary messages. It is written to use the Nito.Async socket classes, but the same code concepts translate well to the .NET Socket class.

(This post is part of the TCP/IP .NET Sockets FAQ.)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Stephen Cleary
Software Developer (Senior) Nito Programs
United States United States
Stephen Cleary is a computer programmer living in Northern Michigan. He enjoys reading and is actively involved in his church. On October 4th, 2008, he married Amanda Snell, and he is happy about that. Smile | :)
 
Programming blog: http://nitoprograms.blogspot.com/
(including the TCP/IP .NET Sockets FAQ: http://nitoprograms.blogspot.com/2009/04/tcpip-net-sockets-faq.html )
CodePlex project leads on:
Nito Async Library: http://nitoasync.codeplex.com/
Nito MVVM Library: http://nitomvvm.codeplex.com/
Personal home page: http://www.stephencleary.com/

Comments and Discussions

 
GeneralMy vote of 5 PinmemberShakti Misra1-Jan-12 20:56 
GeneralMy vote of 5 PinmemberKim Togo28-May-11 20:12 
GeneralGood advice PinmemberDiamonddrake15-Oct-09 14:03 
Generalexcellent article! Pinmemberpaulstandard25-Jun-09 5:46 
Generali liked this post Pinmembercs_dilo25-Jun-09 4:10 
GeneralThanks a million PinmemberJean-Paul Mikkers24-Jun-09 8:37 
GeneralAnother 'gotcha'--"send" may not send everything Pinmembersupercat922-Jun-09 10:14 
It is common (and wrong!) to assume that if a request to send 'n' bytes doesn't return an error, it actually sent 'n' bytes. When a socket (or serial port for that matter!) indicates that it is ready to send, it really means that it is ready to send one byte. It may be ready to send much more than that, but there is no guarantee. Code like:
  mySock.Send(somedata, 4, SocketFlags.None)
  mySock.Send(moredata, 8, SocketFlags.None)
will usually send 12 bytes, but if there are only 7 bytes of data available in the transmit buffer, the last five bytes will get dropped. It's thus often useful to have a wrapper method which will loop until all data is actually transmitted (e.g. something like:
Function DoSend(mySock as Socket, dat() as Byte, ByteCount As Integer)
  Dim bytesSent As Integer = 0
  While(ByteCount > 0)
    Ret = mySock.Send(dat, bytesSent, byteCount, SocketFlags.None)
    If Ret <=0 Then Throw New WhateverException
    bytesSent += Ret
    byteCount -= Ret
  End While
  Return bytesSent
End Function
if no special timeout handling is required). A nuisance, but better than having weird and bizarre problems with missing data.
GeneralRe: Another 'gotcha'--"send" may not send everything PinmemberKevinAG22-Jun-09 15:56 
GeneralRe: Another 'gotcha'--"send" may not send everything PinmemberStephen Cleary23-Jun-09 3:22 
GeneralRe: Another 'gotcha'--"send" may not send everything Pinmembersupercat925-Jun-09 5:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140905.1 | Last Updated 25 Jun 2009
Article Copyright 2009 by Stephen Cleary
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid