With increasing levels of online data surveillance, user activity tracking and user profiling, threats to online data security and user privacy continues to worry most of "us", normal users. But, can we do anything to protect our privacy when even the most widespread tools seem to fail this goal? The answer is "yes". The following video inspired me in trying to contribute:
ProtonMail, Min. 8:50
After some months of hard work, the result is the Android App shown below, a solution which actually works!
..and to be honest, it works really well ! ..for almost no money, besides a few bucks needed for cables and adapters:
AC4PGC (Audio Chat for Pretty Good Concealing) - "SECRET Audio Chatting" While Videoconferencing!
AC4PGC (Audio Chat for Pretty Good Concealing) - Screenshot of Loop Test
Steganography, a "game changer" in this context has not yet spread out. The practice of steganography (data hiding in an innocuous carrier-medium) has lots of potential to contribute in a meaningful way towards mitigating some of these concerns. The present idea embodies audio steganography methods for embedding a covert message within a digital audio signal transmitted over an analog channel used as a data-diode. Two devices, A and B, are only connected e.g., over their speaker and microphone interfaces (speaker1 --> mic2, mic1 <-- speaker2), resulting in two separate physical channels, each allowing controlled unidirectional data transmission, behaving like a data-diode; thus providing the basis for independent transmission and reception channels which will not allow any infection or leakage of data.
Figure 1: Device Connection
Warning: Connecting audio interfaces as shown in Figure 1 may damage your device! Although I experienced no problems when doing all sorts of things with several different devices, care shall be taken considering the allowed input levels of the microphone/line-in interfaces. Modern devices adapt their impedance automatically depending on the detected audio signal level, so in the general case no problems will occur.
The Idea in Short
Low rate, full-duplex bidirectional communication over a covert channel is established by transmitting data in superposition over voice, the carrier signal. While data is superposed over voice in the time domain, in frequency domain it is allocated in an own frequency-range, complementary/non-overlapping to the frequencies used by voice. Thus constructing a side channel with own characteristics, e.g., low signal-power (which allows placing the coding signal at noise level). The following arrangement shows the general idea in a basic configuration:
Figure 2: Basic Configuration
Devices B and C, possibly separated by a long distance, communicate with each other transmitting bidirectional audio information over an open network (over a public switched telephone network (PSTN), the internet, or any other network) and using standard communication protocols like VoIP. Devices B and C could be two laptops making a videoconference or teleconference using Skype, or even better, using qTox. Devices B and C could also be mobile phones or landline-telephones. Devices A and D could be two smartphones. The main assumption is that all these devices make use of standard interfaces and technologies and are vulnerable to exploits of all sorts. Concealed communication between devices A and D is achieved by “plugging” them between the audio peripherals (e.g. speaker and microphone) and devices B and C, as shown in Figure 2, thus being able to use the audio carrier as a transmission media for secret messages. In addition, the secret messages can be encrypted. Devices A and D have each a display to show the secret messages and an input interface to enter them (e.g. a smartphone touch-screen combining both). Before connection establishment, devices A and D shall be put offline in order to prevent any information leakage. This can be done, e.g., by turning off the following interfaces if available:
While offline, devices A and D offer “almost” Air-Gap conditions increasing security. These devices are connected only via analog audio interfaces to B and C, thus implementing two data-diodes which prevent any attacks from outside or leakage of information. That is, during session establishment, the generated or typed keys cannot be sent to an attacker. This is a kind of “enhanced”-end-to-end encryption which is one of the major advantages of this method.
Alternatively, assuming the possibility to connect other “trusted networks or devices”, devices A and D can be used to receive and send the message from/to other sources, offering for example the following services:
- Repeater (digital-audio with embedded message converted to analog audio).
- Gateway (data adapter, socket-communication with message converted to analog-audio with the embedded message.
This could be used to extend existent chat applications).
- Tunnel (protocol embedder, low-rate-protocol to analog-audio. In this case, the low-rate-protocol is the message).
- Adapter (proprietary protocol to convert Digital Files stored in media into analog-audio.
In this case, the binary file is the message.
Each of these alternatives have several advantages:
- Increased anonymity, geographical range extension
- Increased anonymity, geographical range extension, reuse of existent applications (chat)
- Increased anonymity, geographical range, reuse of existent solutions
- Increased anonymity, reuse of existent solutions (file system)
The following diagram shows an example:
Figure 3: Extended Configuration
In the example presented in Figure 3, the stego-device D offers services (repeater, gateway, tunnel, adapter) to device E located in a trusted network. Depending on the service used, the “enhanced” end-to-end encryption is realized between devices A and E or between devices A and D. In this example, the source and recipient of secret messages are devices A and E, or the “users” working on these devices.
In AC4PGC, the "Gateway Mode" is already implemented, allowing to forward messages received over sockets and vice versa. That is, the following connections are supported:
- audio <--> audio
- audio <-- (gateway) --> sockets (as shown in Figure 3 between devices A, D and E)
- sockets <-- (gateway) -- (gateway) --> sockets
- sockets <--> sockets
Due to possible loss, corruption or repetition of data during transmission and reception, as well as the need to exchange a public key in each session, a simple communication protocol is required. Besides the data field, the telegrams of the stego-protocol consist at least of the following fields:
- telegram type (types: key-exchange, chat, acknowledge,..)
When required, the chat information will be retransmitted. In idle-mode, when no data is transmitted, the protocol transmits instead dummy-telegrams with pseudo-random bits as a countermeasure against steganalysis. On correct reception of data, the receiver will reply with a positive acknowledge. On timeout of a retransmission timer or reception of a “negative” acknowledge, the sender will retransmit the last telegram. The complete communication protocol is transmitted as embedded steganographic data, hidden in the carrier. In order to increase the defense against steganalysis, the telegram bits can be “scrambled” every time according to a pseudo-random generator, which is initialized with the session key calculated after exchanging the public key at startup. This feature will keep the statistical distribution of the audio signal at “normal” levels. The public cryptographic keys can be exchanged during connection establishment using Diffie-Hellman. The session key is the same on each side obtained as a function of the public key and the own private key: K = g^ab mod(p) = g^ba mod(p) Everything is based on the previous agreement on a prime (p) and a generator (g) upon which the private keys are generated and the public keys are calculated.
The following diagram shows the details of message embedding:
Figure 4: Architecture Overview
In Figure 4, a rough overview of the stego-embedding and encryption is presented. The input message mA is entered with a keyboard or touch-screen and then immediately encrypted with the encryption key (Key 4). The result is mAe. The encrypted message mAe is then encoded applying the channel/stego-algorithm which uses Key 3. Key 3 consists of the specific “settings” used for stego-embedding (like threshold values). The values of Key 3 and Key 4 may be derived from the session key exchanged during connection establishment. A simple steganographic algorithm based on multiple-FSK (Frequency Shift Keying) converts each of the bits in the input message (mAe) to different frequencies within the bandwidth of the carrier signal. In order to avoid distortions introduced by the carrier signal a “dedicated/exclusive” and sufficiently small frequency range can be used which does not overlap to the frequencies used by the voice. Because the bandwidth used by the voice is completely used, this requires that the carrier signal gets the embedding-frequency-range removed/filtered with a band-stop filter FcA. Advanced embedding-techniques may take the carrier signal cA in consideration in the process of embedding. That is, the embedding process may depend on the current value of the carrier signal. The correction factor SA (Stego-Amplitude) is selected according to the expected or measured channel conditions, especially depending on the Signal-to-Noise ratio (SNR):
The SA value is selected so the embedded-stego-signal is close to the noise level. Depending on the technique used, the embedded-stego-signal can even be below the noise level. Then, with help of a correlation function, the message can be recovered. For standard applications without “a-priori” knowledge of the channel conditions, a value of
SA = MAX_AMP/1000 shows good results. That is, with this value, the embedding is not perceptible by the human ears and it can still be recovered out of the noise present in the carrier signal. As explained before, the carrier signal cA is filtered with a “band-stop” filter FcA which removes the frequency range used for coding the stego message. Then, it is multiplied by the factor (1-SA) which adapts the signal to such an amplitude that, when added with the stego-signal, it can never exceed the maximum level and saturate. With this, the signal output to the speaker interface of the device is:
XA = cAf*(1-SA) + mAeS*SA
When considering in addition some noise added in the communication channel, we have:
YA = cAf*(1-SA) + mAeS*SA + nA
It is important to note that the channel noise is an advantage and a disadvantage at the same time. If well implemented, the steganographic modifications embedded at noise-level will survive the transmission and will be detected correctly at the recipient. In that case, the channel noise offers a good concealment so the attacker is not able to distinguish between natural noise and embedding noise. On the other side, the channel noise may be too high or there may be other channel disturbances, which affect the hidden communication.
In that case, we rely on the steganographic protocol described above, which takes care to retransmit data if required. We don't have to forget that all carrier signals will inevitably have some added noise, being the most usual the “pink-noise”.
This “indistinguishable” noise has not been explicitly shown in Figure 4 and is just considered to be part of cA. On the top of Figure 4, the inverse process (decoding and extraction) is shown which demonstrates how the message can be recovered.
The “band-pass” filter FYB-pass will give as a result YBf, which contains only the frequency-range of the input signal
yB, where the embedded information was transmitted by the other device:
YB = cBf*(1-SA) + mBeS*SA + nB
YBf = mBeS*SA + nB’
Then, YBf is multiplied by the factor 1/SA giving back mBeS’, a “very approximate” version of mBeS. mBeS’ can be then stego-decoded and decrypted to give back the original message mB. As mentioned before, the actual implementation will not transmit mB directly, as shown in the simplified overview, but it will instead transmit a telegram containing mB. The telegram will support error detection and correction assuring consistency, data integrity and timeliness. On error detection, the recipient can send back a “negative acknowledge” or simply do nothing and wait for the retransmission timeout on the sender side to expire. Finally, the „optional“ use of filter FYB-stop will result in YB‘ ~ Voice B which is output to the speaker. Even without the filter FYB-stop, a human user is not able to perceive the embedded-signal or the missing frequencies in the voice B.
Data Throughput vs. Channel Capacity
This idea considers a full-duplex audio communication over VoIP with 64kpbs in each direction, which is the typical communication link used by most systems based on VoIP. As a reference, it can be considered that the current App version is working with 16-bit telegrams transmitted in 341ms -> 16/0.341 ~ 47 bps. This results in a fraction of 1361 of the total bandwidth/capacity being used for steganographic information, which is a usual figure for stego-applications. In fact, the stego-capacity is even lower, with 64000/2722 it is only 23 bps of data-payload that are actually transmitted in every second, that is, 1 bit of stego data for every 2722 bits of audio data. This is an inevitable characteristic of this method, which is why it is most appropriate for applications where data transmission has a low rate.
As explained above, in the stego-devices, we shall avoid the use of standard digital interfaces like WLAN, LAN, Bluetooth and USB, all of which are known to be vulnerable against exploits. This dramatically reduces the number of measures required when compared with devices which are online. Ironically, in the era of "digitalization", a solution based on the "good old analog communication" seems to overcome many of the security problems we face when dealing with digital communication. The following figure presents the layers involved in this idea and shows some of its advantages. Layers 1 up to 7 are realized in the stego-device, offering a multiplicity of keys. Layers 7 to 9 are in the host device, which is unaware of all layers "below". As far as it concerns, it only transports the voice (audio carrier).
Figure 5: Overview of Layers
As usual, we shall assume that all protocols, even the "proprietary" stego-protocol, are open and only the “keys” are secret.
That is, security shall only depend on the keys and not on the algorithms.
- Key 1: end-to-end encryption (example: Skype)
- Key 2: stego bits are scrambled according to a pseudo-random generator
- Key 3: stego embedding according to “fine-tuned” settings agreed on communication setup (which is a kind of "key")
- Key 4: “enhanced” end-to-end encryption of message
In short, some of the main points of the method presented in this article are:
- Additional device “plugged” between audio peripherals and “unsafe” communication device
- Simple communication protocol with error detection and correction
- Bits of telegrams of communication protocol scrambled as a measure against steganalysis
- Stego/channel-encoding based on multi-FSK in a reduced frequency range:
- Audio carrier pre-filtered to remove components in coding-frequency-range
- Audio carrier added to stego-signal under consideration of relative amplitudes (depending on SNR)
The proposed embedding technique is for sure not the "strongest" against steganalysis, but it works well, providing a good compromise between complexity, real-time behavior, audio quality and robustness.
This, and many other aspects of this idea can be improved in future.
Why Do We Need Something like AC4PGC ?
Providers of chat applications offer “end-to-end” encryption as the ultimate measure against violation of privacy. Unfortunately, end-to-end encryption is only as secure as the end-nodes, and most of the end-nodes suffer under massive vulnerability problems. Infections with simple exploits like „key-loggers“ or even tools which make periodical „screen-shots“ give easy access to the initial encryption key, thus making the end-to-end encrypted communication useless. By combining audio steganography, „enhanced“-end-to-end encryption, based on the use of additional hardware, and two separate physical audio interfaces hosting a communication protocol, a concealed data exchange is achieved which not only protects the information itself but also the fact that a communication is being held.
- The following cannot be compromised during session:
- Geolocation of users
- IP of users
- Identity of users
- Quantity of communication
- Message length
- The fact that „this technique“ is being used
- Solution based on „ubiquitous“ and cheap technologies (audio interfaces) making it accessible for everyone.
- Solution can be used virtually with any communication device which has an audio interface like e.g. telephone, mobile-phone, laptop, desktop PC, tablet.
- In Germany, a large amount of unused and outdated smartphones (from a total of over 100 million) are available for use as additional hardware. Therefore, besides the low-cost audio cable and the chat application, no investment is needed. The user will sure be happy to give the old nice device a meaningful use.
- Additional features like, e.g., Gateway functionality allow an increased flexibility and reuse of existent infrastructure and extension of current chat applications.
Using the Code
This information will be available soon, together with the code, in Part II of this article.
Points of Interest
Borrowing some words from Andy Yen (see link above):
"What we have here is just the first step, but it shows that with improving technology privacy doesn't have to be difficult, it doesn't have to be disruptive. Ultimately, privacy depends on each and everyone of us. And we have to protect it now because our online data is more than just a fractions of ones and zeros. It's actually a lot more than that. It's our lives, our personal stories, our friends, our families, and in some ways also our hopes and aspirations. So, now it's the time for us to stand up and say: yes, we do want to live in a world with online privacy. And yes, we can work together to turn this vision into reality!".
- 2019.07.08: Part I of article posted