Introduction

I often use RDP or VNC to remotely manage computers within a closed network, but what if for example I wanted to remotely log in to my at home computer from my place of work? I couldn't risk commonly known ports such as 3389 or 5900 open at the internet side of the home router, or programs on those ports that showed their presence by returning data. Even when the home router has no active ports open to the internet, the logs show external port scan attempts that run for days at a time.

One of the solutions is to use PuTTY and openssh server to tunnel the traffic over port 22. A user requires the correct password or private key to gain access. By reducing the open ports to just one port it reduces the area of vulnerability but it shows everyone the running version of SSH server, as a connection to port 22 will receive the SSH identification string. This might not be a problem for long user passwords and a secure SSH server. However a user has to be careful which version they run and be up to date on patches as different SSH implementations have different security issues, see for example: SSH.html. Security vulnerabilities have also been found in similar tunneling programs (example: Stunnel.html). For some older versions underlying OpenSSL implementations have memory corruption and other security problems (and the OpenSSL source code is fairly complicated and hard to follow).

What is written here is an implementation of secure authenticated tunneling which I have tried to make as simple as possible. There are no dependencies on third party libraries and the protocol does not make use of complicated PKI techniques.

Implementation Description

The tunneling protocol designed for this program makes use of a selected AES key shared between the server and client. There can be multiple separate keys stored in the server and only one key per client. Both the client and the server must prove to each other that they possess this key. When a client first connects to the server, the server requires the immediate transmission of some authentication data as the first step in establishing a session key. If no data is sent within an allocated time frame, the server passes the connection to another thread which sends a RST to forcibly close the connection after a period of time. The reason for doing this is by turning off SO_LINGER before calling closesocket the local sockets are not left in a TIME_WAIT state and to the outside it looks more like a valid connection was never made when RST rather than FIN is sent. That helps prevent sockets being consumed by a large number of remote connection attempts, and waiting for a few seconds before closing may help slow down the incoming rate since the remote peer usually waits for acknowledgement. It is the next best thing to not sending any notification by dropping the connection, if that were possible. It is important to give out as little information as possible until a connection has been authenticated.

Header Format

The header placed at the beginning of a data packet consists of an 8 byte checksum of the header, an 8 byte checksum of the payload data after the header, a 1 byte version number, 5 bytes of random padding, a 2 byte TCP port value, a 4 byte error code and a 4 byte payload data length. The header format is common across all data transmissions for programming simplicity. The checksums are positioned so that the initial shared key is not encrypting known plaintext for the first encryption block (because the encryption initialization vectors start at known values). By using a header dividing the data into packets of definite length the server side can be certain it has received all of the data to be sent on to the local connection, synchronized to the sequence of data reads by the client. The client program allows for up to 1 MB in a single recv from an application and it is assumed this is adequate for any data being tunneled. Whatever data that is queued locally up to this amount is read, and no length information is read from within the data being tunneled which is treated as opaque without any interpretation.

8 bytes	8 bytes	1 byte	5 bytes	2 bytes	4 bytes	4 bytes
Header Checksum	Data Checksum	Version Number	Random Padding	Port Number	Error Code	Data Length

Header Checksum - checksum of the header excluding first 8 bytes
Data Checksum - checksum of data following header
Version Number - version of the header format
Port Number - port that the client is requesting the server connect to locally
Error Code - any local socket connection error
Data Length - length of the data following header in bytes

Key Negotiation Sequence

The first data transfer between client and server is the key negotiation sequence, sent in the payload data:

Client Sends: Encrypted(sha256(challenge1) + challenge1 + padding, Shared Key)
Server Verifies: challenge1 is not in previous list and client time is within +/- 1 day of current time
Server Sends: Encrypted(sha256(challenge1 + challenge2) + challenge2 + padding, Shared Key)
Client/Server Key Derivation Function: Session Key = KDF(challenge1 + challenge2 + Shared Key)

Authentication Process

The Key Negotiation Sequence is the first step in the authentication process. The client first appends a random challenge of 24 bytes in length to the 8 byte FILETIME structure of the current date and time on the computer. A SHA-256 digest of this 32 byte challenge is then created and the challenge appended to produce a 64 byte authentication value. Random data is appended to this value to pad the length to 480 bytes. The reason for the padding is to allow for extending data in future versions without signaling to an eavesdropper that the transmitted data has been extended. Also by adding random data when the encryption mode is CBC alters initialization vectors so only the first data transmission of the key negotiation could potentially be successfully replayed at a later time back to the server, when this random data is added from the server side.

The 480 byte value generated for authentication is then appended to the header, and the 512 byte result is encrypted using AES in CBC mode. Prior to session key generation the initial key which is used to decrypt and authenticate the header and data is the shared 256-bit AES key.

The server decrypts the header first and verifies the header checksum. This ensures the server is not potentially reading a large amount of data before authenticating the data length. Then the rest of the data is read according to the length value in the header, and a final checksum calculated over the data and matched with the value in the header. The connection is forcibly closed if the checksums do not match.

Having retrieved and verified the data following the header, the server checks that the 24 byte random challenge sent by the client is not from a list of previous challenges which have been verified, and that the date and time value are within one day either side of the current date and time. The reason for this is to prevent a replay of data from a previous session to the server. By fixing a portion of the client challenge to the date and time a small list of prior challenges can be stored and compared against for replay attacks, rather than requiring a huge list. Also, by checking the random part of the challenge along with the date and time, this enables comparison of the time component of previous challenges to a day resolution, rather than milliseconds which would require near perfect time synchronization between client and server.

Once the client data has been verified, the server generates its own random 32 byte challenge which is then appended to a digest of both the 32 byte client and server challenge together. This reply packet constructed and padded to 512 bytes as before is sent back encrypted to the client. This proves to the client that the server has decrypted the current client challenge, by including it in the digest input. The session key between the client and the server is then calculated from the challenges produced by the client and server plus the current shared encryption key.

The session key derivation function is formed by concatenating the client and server challenges and the current encryption key into a 96 byte value. By analogy this forms a kind of keyed password value used in password based key derivation functions. A SHA-256 digest of this value is then created. This initial value is then input into a loop, where the SHA-256 digest of the previous value is created and XOR-ed with the previous value to produce the next value for 1000 iterations. The intermediate XOR step is to prevent degeneracy of values arising from the repeated application of a digest function on a digest (not that this is expected to occur for the SHA-256 algorithm however).

The 256-bit session key is good for the lifetime of the socket connection. The remainder of the authentication process is to verify the header and data checksums match each received packet decrypted by the session key. The data encryption mode is AES CBC, with CFB mode used for any data not on an encryption block boundary (ECB mode is not used since it may reveal large scale bit sequence patterns in the encrypted output and doesn't secure the order of encryption blocks). There are two separate CBC initialization vectors, one for each of the send-receive and receive-send channels, which are also chained between transmissions to prevent the re-ordering of encrypted packets to the client or server.

The server program is protected by having to know a shared symmetric key in order to make a successful connection. There can be multiple symmetric keys stored in the server depending on the chosen configuration. When a connection is made the server iterates through its list of known keys until it finds a match on the checksum of the decrypted header (this is a fast process). For multiple keys, the theoretical key strength of the server key is reduced by a factor equating to the number of these keys stored in the server. A limit is set on how many unsuccessful connection attempts can be made within a set period from a particular IP address by storing a blacklist IP address counter on the server, which protects the server key from guessing attempts.

Access to the client program running in the operating system environment is somewhat protected by asking for a password when a socket connection is made to it. The client is also better protected when set to listen for incoming connections on a loop-back interface. An option is given to suppress any further password prompts for the identified process instance (executable name and process identifier) that has initiated the socket connection, where possible, or the source IP address, which is useful for programs that automatically retry a broken connection like Microsoft Terminal Services Client.

Comparing this symmetric key application to a public key system, there is little difference in protecting access to the server because access to an unprotected client private key pair would gain access to the server in a similar way access to a symmetric key does. The difference is the potential to impersonate the server to clients if a key shared between clients is compromised. The convenience gained in having a server public key is that the server can identify itself without exposing its key, however if a key configuration is chosen where each client has its own symmetric key, then each client already knows the identity of the server using that shared key, which is trusted by the client, and does not require a server public key or PKI framework to validate the server (assuming the shared key is not compromised by the client in an insecure environment). Of course it easy to imagine a situation where a client is used in an insecure environment and the shared key is somehow copied out, so from then on the client using the same shared key is vulnerable to a man-in-the-middle setup where internet traffic is re-routed to a bogus server using that shared key, but whether this is a realistic scenario or not is another thing (if the client was compromised in that way then possibly any client software application would be compromised in that environment, the main difference is that with a symmetric key in software the compromise can extend beyond the insecure environment). On the server side, in the case of using PKI to verify clients, verifying clients against a CA still requires revocation lists to be maintained which has greater complexity than having a simple list of permitted client keys. If it assumed that access to the server is fully secured, then it is easier to maintain a symmetric client key list on the server. If the server happened to be secured against write access but not secured against read access by unauthenticated users, then there would however be a definite security advantage to using public keys to verify clients. In the interests of simplicity, the program does not control per user permissions on accessing the server but simply tunnels protocols such as RDP or VNC that can be used to gain access and manage keys stored in the server.

The current "key rollover" option available to change server keys is that the user can add or remove keys and restart the installed service while connected remotely through the client. This is done by tunneling a remote desktop to the server and using the key generation function in the server dialog to generate a new key and copy that across if required to the client from the remote desktop. Or a new key can be copied from the client to the server dialog within remote desktop. Once a new key is copied to the client tunneling program any new connections made through the client use the new key, and the existing connection to the server is still preserved. This enables the user to restart the installed service from the server dialog within the remote desktop session to load new keys, whereupon all existing connections are terminated. In the case of RDP the terminal services client will simply reconnect to the server via the client, but most versions of VNC clients will just drop the connection once it is closed. Alternatively, multiple instances of the installed service in separate directories with different key sets can coexist on the same computer (listening on different ports) and the connection switched over once the appropriate access rule for a new port is set on the server firewall or router (the program does not require identical external and internal NAT ports).

Program Description

The program is a single Win32 application used for both client and server, which can be compiled in either Visual Studio 6.0 or .NET. The application uses the visibility of the "Window Station" to detect whether it is running as a server service or as the client application. The client is used to either edit the server configuration or as an application to remotely access a server depending on where it is run from.

Double-clicking to run the program brings up the client application, which shows a log view of client activity:

Client Log View

The file menu provides the options to edit configuration and install the tunneling service locally:

Client View Server Configuration Disabled

In this case the options related to server configuration are shown as disabled because the run as administrator option was not chosen.

Using the server configuration options enables server keys to be created, which the clients use to establish a connection, and to set the name of the service and the listening address and port. Also the destination IP address for outgoing server connections is set in the server configuration:

Server Configuration Dialog

The random data for key generation is obtained mainly from the CryptGenRandom function and mouse cursor position. The random key which is generated by clicking the key icon button is XOR-ed with previous keys in the list and the hex encoded value is outputted to the edit box, where it can be modified before adding to the list of server keys. It is tempting to modify the output of any random number sequences that don't look random where repeating numbers are seen. For example, just like the black and red of a roulette wheel, there can be short sequential runs that don't appear evenly spread but the long term 0 and 1 distribution is still even and otherwise random (consistently even distributions across short samples are actually less random).

When the application is used to remotely access a server, the client configuration options enables the shared access key to be set, along with the external IP address and port of the remote server, the local port that applications use to establish a connection with the client and the port forward that traffic gets sent to from the server at the other end:

Client Configuration Dialog

An access password can also be set to encrypt the client configuration and to prompt for a password when a socket connection is made to the client application. If a password is typed in the password box above then the user will be asked for a password whenever starting the program as the client application, and whenever a new socket connection is made if the password prompt option is checked. The password connection prompt shows either the executable name or source IP address that has initiated the socket connection:

Password Prompt Dialog

Using the Code

Most of the code is in one C++ file, with files for the AES and SHA-256 routines and precompiled headers. Encryption code is from www.matrixssl.org licensed under the GPL. Most of the C++ code could be described as written in C making use of C++ syntax, since functions are global and there are no class patterns.

The handling of socket connections is on a per thread basis rather than per object. In the code each thread contains a single socket handle and the thread lifetime is related to the lifetime of each socket connection. As opposed to a more complex object model with arrays of socket handles which may be more appropriate for a server expected to handle a large number of simultaneous sockets.

The main entry point to the application is WinMain. This function calls RegisterClassEx to register a Window Class Name that is unique to every executable path location so the application window for an application instance can be found when limiting the program running as the client to a single instance:

const char * path = GetCommandLine();
...
strncat(szWindowClass, path, sizeof(szWindowClass) - strlen(szWindowClass) - 1);
...
sha256Digest((unsigned char *)szWindowClass, strlen(szWindowClass), digest);
BytesToHexLower(digest, 32, szWindowClass);

Some command line parameters are processed to display options related to passing the access password on the command line, which comes in handy when auto-starting the application as the client when logging on:

if(strnicmp(p, "pass", 4) == 0 || strnicmp(p, "password", 8) == 0)
{

The following determines from the Window Station whether the application is running as the client or as an installed service, and requires that the installed service have the "allow service to interact with desktop" option cleared in order to work as expected:

USEROBJECTFLAGS uof = {0};
GetUserObjectInformation(GetProcessWindowStation(), UOI_FLAGS, &uof, sizeof(USEROBJECTFLAGS), NULL);
if (uof.dwFlags & WSF_VISIBLE)
{

All of the configuration parameters are stored concatenated together and, in the case of the client data file client.txt, encrypted under a key derived from a user password, where the key is 1000 iterations of the SHA-256 digest of the password XOR-ed together. The server data is encrypted under a Crypto-API machine key. The format of storage of the configuration parameters is a means to an end, since the data is in encrypted form and not designed to be read outside of the application. If it was stored in clear text then a structured format like XML would be better for external editing, however including a large XML library could create difficult to debug problems, if a library happened to corrupt heap memory for example.

There is a preprocessor definition USE_WEAK_CHECKSUM in the code, which switches the data checksum calculation from the first 8 bytes of a SHA-256 digest to a simple add without carry checksum. This decreases CPU load as opposed to having a cryptographically stronger digest to more effectively protect against injection of plaintext.

Running the application client

If running the application as the client the following is used to determine on a per directory basis if there is another running instance of this application, and if so the program un-hides the window of the previous running instance and then exits. This is required for each application to store its own configuration file client.txt in the application directory:

handle = CreateMutex(NULL, FALSE, szWindowClass);
exists = GetLastError() != ERROR_SUCCESS;

After reading the encrypted configuration data from the client.txt file if one exists, a test is made to see if it can be decrypted using a blank password or the password passed to the command line, if it can't the user is prompted for the correct password. Processing will only continue after the client configuration file is decrypted. The actual decryption test is done within the dialog procedure of the DialogBoxParam function, so errors can be displayed to the user for incorrect password entry while still keeping the password dialog visible. The data is read into the g_clientparams variable. If the configuration file server.txt exists then the name of the installed service (if it exists) is read from the file. The server.txt file is encrypted under a machine key using CryptProtectData so it can be read by the installed service and doesn't require a password to access. This is why the function for encrypting the client data is called EncryptClientConfigData, while the same function for the server data is called EncodeServerConfigData .

Once this is done, the main window is created along with a client socket listener thread and the main Windows message loop started. The function of the visible dialog window is to provide a way to edit some configuration data, restart the service if installed, and view a running log of any error messages related to the operation of the client socket listener thread and any client threads that are connected to a server instance (as the server uses OutputDebugString to log output, the server log can be accessed by running the free DebugView program from sysinternals – noting that DebugView must be run as administrator to order to capture global debug output on Windows 7). When running the application as the client under Windows 7, it is necessary to choose the run as administrator option (elevated privilege) when performing any actions related to administration of an installed server, such as editing server configuration or restarting the service. This is because elevated privileges are required to control a service, and if the application has been copied to a path under program files, to write the configuration. If running under program files as the client without elevation Windows 7 will save client.txt to a virtualized location (under the same location in Windows XP a restricted user will simply be unable to save the file). The application is run with reduced privilege by default since the manifest does not contain requestedExecutionLevel = requireAdministrator.

The client socket listener thread is the ClientListenerThread function which spends most of the time waiting for a socket to be returned from the accept function. When the ClientListenerThread thread is started, it hides the main application dialog and activates an icon in the taskbar if the listening socket can be created for the configured address and port, with the port appearing in the tooltip. The ClientListenerThread function checks the name and process identifier of the process associated with a socket connection using the undocumented Windows function GetExtendedTcpTable. This function is loaded dynamically as it is not available on all operating system versions. If the user has presented the password for this process instance and selected the remember password option, or has deselected the password prompt option, then no password is presented prior to initiating a connection to the remote server. Otherwise a password prompt is presented which also gives the user the option to cancel the connection. If the connection is allowed a ClientSocketThread thread is created to handle the incoming connection.

The ClientSocketThread thread count is stored in global variable g_clientthreadcount to provide a way to monitor the threads without having to maintain a synchronized list of thread handles. The InterlockedIncrement and InterlockedDecrement functions are used to update this thread count from within the constructor and destructor of an object created at function scope in ClientSocketThread.

Running the application service

If running the application as an installed service the server parameters are read from the server.txt file if it exists. The data is read into the g_serverparams variable. If server.txt does not exist then a random server key is generated and written to the file encrypted under the local machine key. The StartServiceCtrlDispatcher function is called to start the service, the ServiceMain function is the entry point for the service which accepts socket connections from the client application component and checks the service stop signal via an event handle. Provided an incoming rule has been creating in the Windows Firewall for the application the installed service should be able to receive incoming connections from a client.

The server socket listener ServiceMain spends most of the time waiting for a socket to be returned from the accept function. The ServiceMain function checks incoming connection IP addresses against a blacklist and also clears this blacklist around every 24 hours. Clearing the entire list at intervals is simpler than entry tracking based on an individual timer value for each IP address. If an incoming IP address is not found in the blacklist and data is available on the socket then a ServerSocketThread thread is created to handle the connection. Otherwise a ServerCloseSocketThread thread is created to forcibly close the connection after a period of time has elapsed. (Note that CreateThread is used instead of _beginthreadex in the code, this means no compile time error is issued if the singlethreaded CRT is selected by mistake in the project settings). If the service receives a shutdown signal, the hStopServiceEvent event is set followed by the closing of the hSocketListener socket handle. This triggers the accept call in the ServiceMain function to return an error and the state of hStopServiceEvent indicates to ServiceMain to exit.

The ServerSocketThread thread count is stored in global variable g_serverthreadcount to provide a way to monitor the threads without having to maintain a synchronized list of thread handles. The InterlockedIncrement and InterlockedDecrement functions are used to update this thread count from within the constructor and destructor of an object created at function scope in ServerSocketThread. The ServiceMain function only calls the socket accept function when g_serverthreadcount indicates the number of threads is below a certain value.

Problems and Limitations

There is an option which can be set in the client to automatically reconnect on loss of connection, but this is usually not recommended because usually the state of whatever protocol is being tunneled gets out of synchronization with the network connection and after a new connection is established an application may become unresponsive. For example, the RDP Terminal Services Client application by itself seems to silently handle network connection problems such as zero bytes received with no winsock error, but which the tunneling program cannot otherwise distinguish from a close on the socket connection. If the Terminal Services Client application is notified the connection has been broken by closing the local socket connection then the user is notified and reconnected to the server, but silent reconnection without that notification does not work as expected.
In the VNC protocol the client first waits for server data to be sent when a connection is made, whereas the RDP Terminal Services Client sends data from the client end first. This application was written initially for making a secure connection over the internet just using RDP. Consequently the client part of the program would not make a connection to the server until it had some data to transmit and this worked for RDP. After switching to VNC and wondering why the client would sit around and never make a connection, looking up the protocol specification for VNC showed this mistake. Now the program makes a connection immediately and so can receive the necessary data from the server that allows the VNC client application to work and the tunneling program able to handle other protocols as well.
Initially in the ClientSocketThread and ServerSocketThread functions which are the main send-receive functions for handling data transmission, a non-zero timeout (around 1ms) was set in the WaitForSingleObject calls for the main data polling loops. However while the WaitForSingleObject function does not switch the thread and enter a wait state if the event is not signaled when the timeout is set to zero, it will still indicate in the return value if the event is signaled, so this means a zero timeout can be used. A non-zero timeout just adds an overhead of a thread context switch and a delay. Since the RDP protocol is efficient in minimizing data transfer, the overhead was not very noticeable, but when tunneling VNC the tiling updates were slowed considerably. So the WaitForSingleObject timeout was set to zero, and due to the increased responsiveness with more time allocation on the socket select function performance was improved with very little increase in CPU load.
The compiler sets a limit on the stack size for a thread, stored in the executable header and by default 1MB. This can also be controlled via a parameter in CreateThread. Initially the program allocated all of the data buffers for a socket on the stack, which worked until buffer sizes were increased to 1MB. In fact, had buffer size been increased to a value less than but close to 1MB, the program was in danger of mysterious failure somewhere down the track if this stack limit was exceeded by another function call. So the fixed sized buffers were replaced with an auto-allocating heap memory buffer class called CharHeapBuffer. Where much less than 1MB of stack memory is required, such as in the ServerCloseSocketThread function, the STACK_SIZE_PARAM_IS_A_RESERVATION option of CreateThread is used to set stack size to 65K.
To handle data not on an AES block boundary, rather than using a plaintext padding scheme any remainder data is encrypted in CFB mode, therefore within the program any partial decryption or encryption of data must be on AES block boundaries. Consequently for the decryption of the header for checksum validation, the header size must be a multiple of the AES block size 16.
Protocols such as FTP that contain IP address and port information within the data transfer are not handled correctly.
Doesn't support IPV6 due to the way the winsock functions have been used.

History

4 September 2013 - Removed use of IsUserAnAdmin function for updating file menu
3 September 2013 - Key generation dialog: fixed incorrect buffer index range when xor-ing new key
3 September 2013 - First release