Introduction
This is a proprietary VoIP project to send and receive audio data by TCP. It's an extension of my first article Play or Capture Audio Sound. Send and Receive as Multicast (RTP). Unlike this application streams the audio data not by multicast but by TCP. So you can be sure there are no data lost and you can transfer them over subnets and routers away. The audio codec is U-Law. The samplerate is selectable from 5000 to 44100.
Note !!! This is a proprietary project. You can't use my servers or clients with any other standardized servers or clients. I don't use standards like RCTP or SDP.
Background
Because of network traffic and time clock differences, you have to use Jitter-Buffers to compensate data transfer. You can set the Jitter-Buffer for each server, so all clients will use the same amount. One Jitter-Buffer represents one data-packet, included in a TCP-Stream. The server starts playing, when the Jitter-Buffer reaches the half of maximum. You can watch this in the progressbar which is shown for each client. The more Jitter-Buffers you set, than more delay will occur. You can run the TCPStreamer as client or as server. One server can connect to one or more clients.
Note !!! Use the same sound-settings for client and server (SamplesPerSecond).
The TCPStreamer as Client

Running as client you can connect to a server instance. Choose your microphone and listen device. Click on the microphone or speaker buttons to mute. After the client is connected, the speaker combobox changes to a progress bar showing the value of incoming data.
The TCPStreamer as Server

Running as a server you can wait for one or more clients. Choose your microphone and listen device. Each client can be muted exclusive. (Speaker and Micro)
Using the Code
There are the following assemblies:
- TCPStreamer.exe (main application)
- TCPClient.dll (TCP client wrapper helper)
- TCPServer.dll (TCP server wrapper helper)
- WinSound.dll (sound recording and playing)
I could send data direct from soundcard to network. But I decided to put them into a Jitter Buffer first, because some sound devices (especially on laptops) are not able to get the sound data in equal time periods. With a Jitter Buffer I ensure sending data every 20 ms. But the disadvantage is a bigger delay.
private void OnDataReceivedFromSoundcard_Server(Byte[] data)
{
int bytesPerInterval = WinSound.Utils.GetBytesPerInterval((uint)
m_Config.SamplesPerSecondServer,
m_Config.BitsPerSampleServer, m_Config.ChannelsServer);
int count = data.Length / bytesPerInterval;
int currentPos = 0;
for (int i = 0; i < count; i++)
{
Byte[] partBytes = new Byte[bytesPerInterval];
Array.Copy(data, currentPos, partBytes, 0, bytesPerInterval);
currentPos += bytesPerInterval;
WinSound.RTPPacket rtp = ToRTPPacket(partBytes,
m_Config.BitsPerSampleServer, m_Config.ChannelsServer);
m_JitterBufferServerRecording.AddData(rtp);
}
}
When creating a RTP packet most information like CSRC Count or Version are the same. After every sent RTP packet I have only to increase the SequenceNumber and Timestamp. Before this i translate the linear datas to a compressed U-Law format to avoid network traffic.
private WinSound.RTPPacket ToRTPPacket(Byte[] linearData, int bitsPerSample, int channels)
{
Byte[] mulaws = WinSound.Utils.LinearToMulaw(linearData, bitsPerSample, channels);
WinSound.RTPPacket rtp = new WinSound.RTPPacket();
rtp.Data = mulaws;
rtp.CSRCCount = m_CSRCCount;
rtp.Extension = m_Extension;
rtp.HeaderLength = WinSound.RTPPacket.MinHeaderLength;
rtp.Marker = m_Marker;
rtp.Padding = m_Padding;
rtp.PayloadType = m_PayloadType;
rtp.Version = m_Version;
rtp.SourceId = m_SourceId;
try
{
rtp.SequenceNumber = Convert.ToUInt16(m_SequenceNumber);
m_SequenceNumber++;
}
catch (Exception)
{
m_SequenceNumber = 0;
}
try
{
rtp.Timestamp = Convert.ToUInt32(m_TimeStamp);
m_TimeStamp += mulaws.Length;
}
catch (Exception)
{
m_TimeStamp = 0;
}
return rtp;
}
private void OnJitterBufferServerDataAvailable(Object sender, WinSound.RTPPacket rtp)
{
Byte[] rtpBytes = rtp.ToBytes();
List<NF.ServerThread> list = new List<NF.ServerThread>(m_Server.Clients);
foreach (NF.ServerThread client in list)
{
if (client.IsMute == false)
{
client.Send(m_PrototolClient.ToBytes(rtpBytes));
}
}
}
In order to send and receive data by TCP, I use a simple proprietary protocol. Before each data block I write a 32 bit data length information. So later, when i receive the data stream, I know how to interpret the data.

public Byte[] ToBytes(Byte[] data)
{
Byte[] bytesLength = BitConverter.GetBytes(data.Length);
Byte[] allBytes = new Byte[bytesLength.Length + data.Length];
Array.Copy(bytesLength, allBytes, bytesLength.Length);
Array.Copy(data, 0, allBytes, bytesLength.Length, data.Length);
return allBytes;
}
The reverse path is to get the data by network. In this case for every connected client. In the first step I have to extract the packets from the whole stream with help of my own protocol.
private void OnServerDataReceived(NF.ServerThread st, Byte[] data)
{
if (m_DictionaryServerDatas.ContainsKey(st))
{
ServerThreadData stData = m_DictionaryServerDatas[st];
if (stData.Protocol != null)
{
stData.Protocol.Receive_LH(st, data);
}
}
}
With help of the length information I know when a packet starts and ends.
public void Receive_LH(Object sender, Byte[] data)
{
m_DataBuffer.AddRange(data);
if (m_DataBuffer.Count > m_MaxBufferLength)
{
m_DataBuffer.Clear();
}
Byte[] bytes = m_DataBuffer.Take(4).ToArray();
int length = (int)BitConverter.ToInt32(bytes.ToArray(), 0);
if (length > m_MaxBufferLength)
{
m_DataBuffer.Clear();
}
while (m_DataBuffer.Count >= length + 4)
{
Byte[] message = m_DataBuffer.Skip(4).Take(length).ToArray();
if (DataComplete != null)
{
DataComplete(sender, message);
}
m_DataBuffer.RemoveRange(0, length + 4);
if (m_DataBuffer.Count > 4)
{
bytes = m_DataBuffer.Take(4).ToArray();
length = (int)BitConverter.ToInt32(bytes.ToArray(), 0);
}
}
}
Before playing the data to soundcard I put them into a further Jitter Buffer. This is necessary because of the irregular network traffic, especially over internet. The more the amount of Jitter Buffer the more the delay.
private void OnProtocolDataComplete(Object sender, Byte[] bytes)
{
WinSound.RTPPacket rtp = new WinSound.RTPPacket(bytes);
if (rtp.Data != null)
{
JitterBuffer.AddData(rtp);
}
}
Finally the data are ready to be played to soundcard. Before that i translate the U-Law data back to linear data, because a sounddevice can only play linear one.
private void OnJitterBufferDataAvailable(Object sender, WinSound.RTPPacket rtp)
{
if (IsMuteAll == false && IsMute == false)
{
Byte[] linearBytes = WinSound.Utils.MuLawToLinear(rtp.Data, BitsPerSample, Channels);
Player.PlayData(linearBytes, false);
}
}
I implemented my own Jitter Buffer as a queue of RTP packets. The data can be added and then are handled by a high frequently timer function. (20 ms)
public void AddData(RTPPacket packet)
{
if (m_Overflow == false)
{
if (m_Buffer.Count <= m_MaxRTPPackets)
{
m_Buffer.Enqueue(packet);
}
else
{
m_Overflow = true;
}
}
}
The Jitter Buffer handles the data every 20 milliseconds. To get such a exact timer you can't use the normal .NET Timers. So I used the timer functions from the Win32 kernel32 and Winmm library. Before starting a timer, i set the precision as best the system can offer. This can differ from 1 to more milliseconds. Better than 1 millisecond is not possible with windows.
[DllImport("Kernel32.dll", EntryPoint = "QueryPerformanceCounter")]
public static extern bool QueryPerformanceCounter(out long lpPerformanceCount);
[DllImport("Kernel32.dll", EntryPoint = "QueryPerformanceFrequency")]
public static extern bool QueryPerformanceFrequency(out long lpFrequency);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeSetEvent")]
public static extern UInt32 TimeSetEvent(UInt32 msDelay, UInt32 msResolution,
TimerEventHandler handler, ref UInt32 userCtx, UInt32 eventType);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeKillEvent")]
public static extern UInt32 TimeKillEvent(UInt32 timerId);
[DllImport("kernel32.dll", EntryPoint = "CreateTimerQueue")]
public static extern IntPtr CreateTimerQueue();
[DllImport("kernel32.dll", EntryPoint = "DeleteTimerQueue")]
public static extern bool DeleteTimerQueue(IntPtr TimerQueue);
[DllImport("kernel32.dll", EntryPoint = "CreateTimerQueueTimer")]
public static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer, IntPtr TimerQueue,
DelegateTimerProc Callback, IntPtr Parameter, uint DueTime, uint Period, uint Flags);
[DllImport("kernel32.dll")]
public static extern bool DeleteTimerQueueTimer(IntPtr TimerQueue,
IntPtr Timer, IntPtr CompletionEvent);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeGetDevCaps")]
public static extern MMRESULT TimeGetDevCaps(ref TimeCaps timeCaps, UInt32 sizeTimeCaps);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeBeginPeriod")]
public static extern MMRESULT TimeBeginPeriod(UInt32 uPeriod);
[DllImport("winmm.dll", SetLastError = true, EntryPoint = "timeEndPeriod")]
public static extern MMRESULT TimeEndPeriod(UInt32 uPeriod);
The Jitter Buffer is designed to handle the data, when the half of maximum is reached. After an overflow or underflow the buffer tries to get back to this value.
private void OnTimerTick()
{
if (DataAvailable != null)
{
if (m_Buffer.Count > 0)
{
if (m_Overflow)
{
if (m_Buffer.Count <= m_MaxRTPPackets / 2)
{
m_Overflow = false;
}
}
if (m_Underflow)
{
if (m_Buffer.Count < m_MaxRTPPackets / 2)
{
return;
}
else
{
m_Underflow = false;
}
}
m_LastRTPPacket = m_Buffer.Dequeue();
DataAvailable(m_Sender, m_LastRTPPacket);
}
else
{
m_Overflow = false;
if (m_LastRTPPacket != null && m_Underflow == false)
{
if (m_LastRTPPacket.Data != null)
{
m_Underflow = true;
}
}
}
}
}
This project does not use overheaded libraries or extensions, so it can be used to learn the basics of manipulating sound data and network operations. Feel free to extend and improve it for your needs.
History
- 31.05.2012 - Added.
- 03.05.2013 - Added duplex connections. Removed File-Player
- 09.05.2013 - Changed tip to article