The fast spread of Voice over IP (VoIP) technology can be sharply predicted today. You can find VoIP solutions in more and more fields. One of the possible use patterns of these VoIP solutions is the build of a simple Internet telephony program. Because of these facts, I decided to build my own VoIP telephone application based on the following knowledge and requirements:
The used code should support the latest stable .NET technologies, the use of C# programming language and easy obfuscation. 
The two essential protocols of VoIP calls are SIP  and the H323  protocol. Both protocols are capable of creating audio-visual communication between the participants with the use of other protocols.
I decided to use the SIP protocol because it is easy to implement and the understanding of communication processes is also easier. Moreover, the SIP protocol does not inherit anything from the features of PSTN network. Detailed comparisons have also dealt with this topic.
So after the design decision it was obvious to use SIP, SDP, RTP and RTCP protocols: SIP (Session Initiation Protocol) is used creating sessions between the parties. SDP (Session Description Protocol) is used for describing multimedia communication sessions. RTP (Transport Protocol for Real-Time Applications) defines the delivery of mediadata. The communication process is built up by using SIP, SDP then RTP protocols.
In the first phase of my experiment, I decided to write an own Softphone. It started with the minimal implementation of SIP protocol, then I developed the minimal representation of SIP messages (in other words, I developed the SIP Headers that are included in an average SIP Message just like Via, Contact, From, To, Call-ID). After that, I successfully establish a call on INVITE level.
Until this point, I could go on easily and quickly but then I had to face two problems: Once the invite messages that arrived in certain situations had no effects. The other problem was that the SDP protocol that is for the reconciliation of the media was missing or waiting for implementation, and the RTP Protocols that are responsible for the media communication were missing, as well. Then, as a consequence, the following software architecture was imagined:
| UserAgent  |
|SIP/SDP | RTP  |
| Network layer |
Since only two fifth of this software architecture was ready, I started to search for components on the Internet. There are fine SDP implementations available on the Internet but the RTP implementations also include network communication. This fact would make the standardized use difficult. I was working on the first problem when I found a SIP guide called "A Hitchhiker's Guide to the Session Initiation Protocol" . That was the point when I decided to give up using my own written components to realize the required application.
In the second phase of my experiment, I was looking for outside components that give complex solution for treating SIP protocol. Most of the SDKs available on the Internet do not meet the above requirements. They cannot be used properly, it is difficult to use them, or they require too much technical knowledge or they are Wrapped COM objects.
As the result of my search, I found the solution offered by Ozeki. Ozeki VoIP SIP SDK provides an easy to use interface; furthermore, it helps testing with a Mock Softphone object. In the MockUp part of the software development life cycle, it makes testing of the appropriate components and models much easier. Furthermore, the random events match reality and they create realistic situations without making a real phone call.
In the third phase of my experiment, I got to know and started to use the selected component. Now I would like to summarize the results and experiences. Since the aim of this article is not the presentation of Windows sound management, the sound management will be presented in a simple and striking way. On the basis of the sample code presented on Ozeki VoIP SIP SDK website , you can get a transparent, easy-to-use and simple code with the help of a component that is able to handle VoIP calls pragmatically.
Ozeki VoIP SIP SDK
To make it simple, I am going to show you a program which ignores the implementation of GUI and the handling of the technicality of the audio device, as well. The problems deriving from these details are easily solved by showing SDK usage in a console application. Sound handling problems are also solved by the instant return of the received audio data. Thus, the code focuses on the handling of events and on the introduction of constructional objects.
In order to do so, we need to be familiar with the available tools. In the middle of the abstraction, there is the
IPhoneCall. You can find more information about
IPhoneCall in the documentation available on the website of Ozeki. To the
Phonecall objects, a listener can be attached that is similar to the
Observer pattern. Although, the attaching and detaching needs to be done by the programmer with the help of
DetachListener extension methods. Additionally, all event types are unifiedly handled with the help of
public interface IPhoneCallListener
void CallErrorOccured(object sender, VoIPEventArgs<CallError> e);
void CallStateChanged(object sender, VoIPEventArgs<CallState> e);
void DtmfReceived(object sender, VoIPEventArgs<DTMF> e);
void MediaDataReceived(object sender, VoIPEventArgs<VoIPMediaData> e);
void PlainMediaDataReceived(object sender, VoIPEventArgs<EncodedMediaData> e);
During the active lifecycle of a representative telephone call object situations can happen. These situations are listed in
IPhoneCallListener. These function names speak for themselves so they will not be discussed here. The example shown below can give you guidance.
class PongCallListener : IPhoneCallListener
public void DtmfReceived(object sender, VoIPEventArgs<DTMF> e)
var dtmf = e.Item;
var call = (PhoneCall)sender;
In this example, we are creating a simple
PhoneCallListener object. It will send some of the received information to the other party as soon as it is received. For this DTMF signal sending is an example.
public void CallErrorOccured(object sender, VoIPEventArgs<CallError> e)
var call = (PhoneCall)sender;
Console.WriteLine("Call error occurred: " e.Item);
If an error occurs during the configuration of the call, the purpose of the error will be written on the screen.
public void MediaDataReceived(object sender, VoIPEventArgs<VoIPMediaData> e)
var call = (PhoneCall)sender;
A data has arrived in pure PCM format. This means that SDK can handle not just audio but other media type data as well. Here, we simply send back the received data to the sender and by this, we cause a big surprise for him.
public void CallStateChanged(object sender, VoIPEventArgs<CallState> e)
var call = (PhoneCall)sender;
Console.WriteLine("Call state changed: " e.Item);
if (e.Item > CallState.InCall)
The status of the call may change, if the status is different than the
InCall, namely it ended somehow, we put the phone down, or the other party put it down. If these situations happen, than it is worth removing the
PhoneCallListener object from the
PhoneCall object, with the above mentioned
public void PlainMediaDataReceived(object sender, VoIPEventArgs<EncodedMediaData> e)
If the data arrives in an encrypted form from the caller, then we used the
IPhoneCall.PlainMediaData property while the application was running. In this case, we need to do nothing.
Accordingly, the device we mostly need to deal with is the
IPhoneCall interface implements object, and the
IPhoneCallListener objects that are attached to it. In this way, we get an appropriate creative freedom.
We also need to create telephone calls. In order to do this, we need a program.
static Dictionary<string,IPhoneCall> Calls;
static PongCallListener FunnyCallListener;
The program contains the active calls in a
Dictionary, and we only use one
static void Main(string args)
ISoftPhone SoftPhone = new SoftPhone("", 5000, 8000, 5060);
SoftPhone.IncommingCall = (SoftPhone_IncommingCall);
IPhoneLine PhoneLine = null;
FunnyCallListener = new PongCallListener();
Calls = new Dictionary<string, IPhoneCall>();
We instance a
SoftPhone object that handles the calls. If we are bored of testing our application with real calls, then we can use the received
ArbSoftPhone, that creates random situations.
SoftPhone is completed, we subscribe to incoming call events, that is thrown when there is an incoming call just like its name suggests. In parameters, the calling object is found alone.
We also need a telephone line, this is the
IPhoneLine interface. Here, I would like to add that SDK can handle multiple parallel lines, with multiple parallel callings on them. Then, we instance our
CallListener object that we are going to attach to every call, during the running of the program.
Calls dictionary assigns
Call strings to the call objects.
Console.Write("Display name: "); string displayName = Console.ReadLine();
Console.Write("Username: "); string username = Console.ReadLine();
Console.Write("Register name: "); string registerName = Console.ReadLine();
Console.Write("Register password: "); string registerPassword = Console.ReadLine();
Console.Write("Domain server: "); string domainServer = Console.ReadLine();
We read the registration information from the user.
string domains = domainServer.Split(':');
int port = 5060;
if (domains.Length == 2)
port = Int32.Parse(domains);
SIPAccount account = new SIPAccount(true, displayName, username, registerName,
registerPassword, domains, port);
PhoneLine = SoftPhone.CreateAndRegisterPhoneLine(account);
We check if there was a port in the given domain, if not, then we use the default 5060 one during the registration. To do so, we need to create a
SIPAccount object, then by using this object, we request an
IPhoneLine object from the
SoftPhone. On this
IPhoneLine object, we start the registration procedure.
Then, comes the fun part...
string statement = Console.ReadLine().Trim();
...until we are bored of it. From the keyboard, we read a
string. If it is "exit" we quit, if it is something else...
if (PhoneLine.RegisteredInfo == PhoneLineInformation.RegistrationSucceded)
IPhoneCall Call = SoftPhone.CreateCallObject(
PhoneLine, statement, FunnyCallListener
Call.CallStateChanged = (Call_CallStateChanged);
... we check, if we have attached call object to the received
string, if yes then nothing happens, if no we check our telephone line has successfully registered or not. If it did, we created a phone object with
SoftPhone, and it will only start calls on our one telephone line. These calls will be created to the typed phone number and it will be contained in the property of
We attach the typed phone number to created object, we sign for the call state transition, for the case when the otherside put down, we get a notification and we can take it out from the Calls.
Then we start the call. We keep on repeating this until we are bored of it.
foreach (IPhoneCall call in Calls.Values)
When quit, we put down every active call.
After that, we also need event handlers. These are for handling the incoming calls. Also, they are responsible for the removal of the calling object dictionary at the specific call ending.
static void SoftPhone_IncommingCall(object sender, VoIPEventArgs<IPhoneCall> e)
e.Item.CallStateChanged = (Call_CallStateChanged);
We immediately attach our
CallListener for the incoming calls. We add it to Calls. Then we add the change state transition function as well. We automatically accept the incoming call.
static void Call_CallStateChanged(object sender, VoIPEventArgs<CallState> e)
if (e.Item > CallState.InCall)
IPhoneCall call = sender as IPhoneCall;
if (call == null)
call.CallStateChanged -= (Call_CallStateChanged);
If the received
CallState is greater than the
InCall, then the call is ended, and this is the event we are interested in. We remove the call object from Calls, according to dial info, and then we remove
CallListener from it, just like
Call_CallStateChanged event handler.
CallState is an
enum, that is a sorting along the call statuses. For example, it grows from
Completed, that is why the comparative operators can be used on it, everything that signals the ending of the call is larger than the
The example shows how simply and quickly a rarely complex application can be developed that is able to handle phone calls. Expendation only depends on the
To summarize, in order to implement your own VoIP
SoftPhone is time consuming and it requires a lot of energy. Therefore, it is efficient to use previously written components. After the study of many SDKs, the most understandable was the solution given by Ozeki. As it was shown in the examples, written in interfaces, implemented in classes it handles phone calls in the easiest way. Like Albert Einstein said:"Everything should be made as simple as possible, but no simpler."
You do not need to worry about the implementation details. Everything that the programmer can do with the telephone call, this stands in the center, and its situation is defined in one place. Therefore, your plans can be easily achieved, because you do not need to get lost in technical details. It is as easy as one, two, three. I ask for a telephone, and for one or more lines and I call or I am called. However, the article greatly refers to the documentation out on the web page from which more information can be earned. I can recommend this solution to everyone.
- 18th March, 2011: Initial post