14,978,482 members
Articles / Programming Languages / C#
Article
Posted 22 Jan 2005

71.4K views
37 bookmarked

# Steganography 14 - What Text Lists, GIF Images, and HTML Pages have in common

Rate me:
A simple way to hide binary data in any kind of list

## Introduction

What do a GIF image, an HTML page, and a shopping list have in common? They all are or contain lists that don't require a specific order of elements. The colours in the GIF image's palette can have any order, just as the attributes in an HTML tag, or the words on your shopping list on the kitchen table.

In every list with `count` elements, we can hide `count`-1 bits just by sorting the items. No tricks, no complicated formulas. Let us begin with a simple list of words, then proceed to GIF image palettes, and at last to HTML documents. The algorithm is always the same, because all lists are the same.

The first step is to force the list items into a specific order, which can be alphabetically, or a customized sorting. Then the sorted list is re-sorted, depending on the bits of the secret message.

If you know the default sorting, you can sort the list again and compare the stegano-list with the sorted list. The differences tell you everything about the message-bits that produced the word order:

## Line by Line

You don't really need a computer for list-steganography. Take a piece of paper and a pencil, and write down nine animals. No, this is not the beginning of a psycho test, it is a carrier document. The words need a standard sorting we can refer to when re-sorting, so we sort them alphabetically:

1. Bird
2. Cat
3. Dinosaur
4. Dog
5. Fish
6. Horse
7. Rabbit
8. Sheep
9. Unicorn

Without any mathematical tricks, nine list entries can hide 9-1 = 8 bits of information, for example, the ASCII character 'c' (99 or 01100011). Each list item (except the last one) represents one bit. For each 1 bit, we move the first item from the original list into the new list, for 0 bits, we move one of the other list items into the new list. Starting with the highest bit, the process works like that:

```Original List        New List

Bird                 ---
Cat                  ---
Dinosaur             ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '0' - Move any item but the first one

Bird                 Dinosaur
Cat                  ---
---                  ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '1' - Move the first item

---                  Dinosaur
Cat                  Bird
---                  ---
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '1' - Move the first item

---                  Dinosaur
---                  Bird
---                  Cat
Dog                  ---
Fish                 ---
Horse                ---
Rabbit               ---
Sheep                ---
Unicorn              ---

Hide '0' - Move any item but the first one

---                  Dinosaur
---                  Bird
---                  Cat
Dog                  Rabbit
Fish                 ---
Horse                ---
---                  ---
Sheep                ---
Unicorn              ---

... ... and so on ... ...

Hide '1' - Move the first item

---                  Dinosaur
---                  Bird
---                  Cat
---                  Rabbit
---                  Unicorn
---                  Fish
---                  Dog
Sheep                Horse
---                  ---

No not-first item for a zero-bit left,
that means, no capacity for more bits.
Copy the last item, anyway.

---                  Dinosaur
---                  Bird
---                  Cat
---                  Rabbit
---                  Unicorn
---                  Fish
---                  Dog
---                  Horse
---                  Sheep```

The new list contains the same items as the original list and an additional sub-content which only we know about. The hidden byte can be read again, if we work through the same process the other way round.

```Sorted List          Carrier List

Bird                 Dinosaur
Cat                  Bird
Dinosaur             Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Dinosaur' is not the first item in the sorted list, so the hidden bit was '0'.
Note that down, and remove the item from both lists.

Bird                 ---
Cat                  Bird
---                  Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Bird' from the carrier list is the first item in the sorted list => '01'.

---                  ---
Cat                  ---
---                  Cat
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Cat' from the carrier list is the first item in the sorted list => '011'.

---                  ---
---                  ---
---                  ---
Dog                  Rabbit
Fish                 Unicorn
Horse                Fish
Rabbit               Dog
Sheep                Horse
Unicorn              Sheep

'Dinosaur' is not the first item in the sorted list => '0110'.
... ... and so on for all animals ... ...```

The Latin alphabet is not a good default sorting, because everybody would try that one first. I suggest you mix up the letters and write a customized alphabet, before you sort your list. We can do the same sorting thing in C#:

C#
```public StringCollection Hide(String[] lines, Stream message, String alphabet)
{
//sort the lines according to a custom alphabet
StringCollection originalList = Utilities.SortLines(lines, alphabetFileName);
StringCollection resultList = new StringCollection();

bool messageBit = false;
int listElementIndex = 0;
Random random = new Random();

//for each byte of the message
while (messageByte > -1) {
//for each bit
for (int bitIndex=0; bitIndex<8; bitIndex++) {

//decide which line is going to be the next one in the new list

messageBit = ((messageByte & (1 << bitIndex)) > 0) ? true : false;

if (messageBit) {
//pick the first line from the remaining original list
listElementIndex = 0;
}else{
//pick any line but the first one
listElementIndex = random.Next(1, originalList.Count);
}

//move the line from old list to new list

originalList.RemoveAt(listElementIndex);
}

//repeat this with the next byte of the message
}

//copy unused list elements, if there are any
foreach (String s in originalList) {
}

return resultList;
}```

Given a list and an alphabet, reverting the process is easy:

C#
```public Stream Extract(String[] lines, String alphabet)
{
//initialize empty writer for the message
BinaryWriter messageWriter = new BinaryWriter(new MemoryStream());

StringCollection carrierList = new StringCollection();
carrierList.RemoveAt(carrierList.Count - 1);

//sort -> get original list
StringCollection originalList = Utilities.SortLines(lines, alphabetFileName);
String[] unchangeableOriginalList = new String[originalList.Count];
originalList.CopyTo(unchangeableOriginalList, 0);

int messageBit = 0;
int messageBitIndex = 0;
int messageByte = 0;

foreach (String s in carrierList) {

//decide which bit the entry's position hides

if (s == originalList[0]) {
messageBit = 1;
}else{
messageBit = 0;
}

//remove the item from the sorted list
originalList.Remove(s);

//add the bit to the message
messageByte += (byte)(messageBit << messageBitIndex);

messageBitIndex++;
if (messageBitIndex > 7) {
//append the byte to the message
messageWriter.Write((byte)messageByte);
messageByte = 0;
messageBitIndex = 0;
}
}

//return message stream
messageWriter.Seek(0, SeekOrigin.Begin);
return messageWriter.BaseStream;
}```

## Colourful Bits

Every indexed bitmap contains a list that can be abused just that way. These two palettes belong to the same GIF picture, one is the original, the other one carries a hidden text of 31 characters:

Again, the first thing we need is a palette with a default sorting. In this example, we will sort the colours by their ARGB values.

C#
```public Bitmap Hide(Bitmap image, Stream message)
{
//list the palette entries an integer values
int[] colors = new int[image.Palette.Entries.Length];
for (int n = 0; n < colors.Length; n++) {
colors[n] = image.Palette.Entries[n].ToArgb();
}

//initialize empty list for the resulting palette
ArrayList resultList = new ArrayList(colors.Length);

//initialize and fill list for the sorted palette
ArrayList originalList = new ArrayList(colors);
originalList.Sort();```

Many pixels are linked to the palette entries, and we will have to change those, too. So, whenever we move a colour to the new palette, we should map the old index to the new index.

C#
```//initialize list for the mapping of old indices to new indices
SortedList oldIndexToNewIndex = new SortedList(colors.Length);
```

Now that the lists are finished, we can dive into the message and move one colour for each bit...

C#
```Random random = new Random();
int listElementIndex = 0;
bool messageBit = false;

//for each byte of the message
while (messageByte > -1) {
//for each bit
for (int bitIndex = 0; bitIndex < 8; bitIndex++) {

//decide which color is going to be the next one in the new palette

messageBit = ((messageByte & (1 < bitIndex)) > 0) ? true : false;

if (messageBit) {
listElementIndex = 0;
}else{
listElementIndex = random.Next(1, originalList.Count);
}
```

...but don't forget from which position in the original palette the entries came! There are thousands of pixels waiting for updated colour indices.

C#
```           //log change of index for this color
int originalPaletteIndex = Array.IndexOf(colors,
originalList[listElementIndex]);
if( ! oldIndexToNewIndex.ContainsKey(originalPaletteIndex)) {
//add mapping, ignore if the original palette
//contains more than one entry for this color
}

//move the color from old palette to new palette
originalList.RemoveAt(listElementIndex);
}

//repeat this with the next byte of the message
}
//copy unused palette entries
foreach (object obj in originalList) {
int originalPaletteIndex = Array.IndexOf(colors, obj);
}

//create new image
Bitmap newImage = CreateBitmap(image, resultList, oldIndexToNewIndex);
return newImage;
}```

The corresponding `Extract` method works very similar to a combination of the `Hide` and the `Extract` methods for text lists. It sorts the palette, reconstructs the message, and then removes the message from the image (returns an image with a palette, in default sorting). This is the palette from the example above the code, after the message has been extracted and removed:

## Tags full of Text Lists

An HTML document itself is no sortable list, the tags must stay in their order. But inside the tags, there are attributes, and they can be sorted. That means, every tag in an HTML page can store `Attributes.Count`-1 bits. Most tags have just enough attributes for one or two bits, but we can distribute the message's bits over all tags of the page.

C#
```public void Hide(String sourceFileName, String destinationFileName,
Stream message, String alphabet) {
//initializations skipped
// ... ... ...

//list the HTML tags
HtmlTagCollection tags = FindTags(htmlDocument);

//loop over tags
foreach (HtmlTag tag in tags) {

//write beginning of the tag
insertTextBuilder.Remove(0, insertTextBuilder.Length);
insertTextBuilder.AppendFormat("<{0}", tag.Name);

//list attribute names
String[] attributeNames = new String[tag.Attributes.Count];
for (int n = 0; n < attributeNames.Length; n++) {
attributeNames[n] = tag.Attributes[n].Name;
}```

From here, the code is nearly the same as in the other `Hide` methods:

C#
```//get default sorting
StringCollection originalList = Utilities.SortLines(attributeNames, alphabet);
StringCollection resultList = new StringCollection();

if (tag.Attributes.Count > 1) {
//the tag has capacoty for one or more bits

for (int n = 0; n < attributeNames.Length - 1; n++) {

//get next bit of the message

bitIndex++;
if (bitIndex == 8) {
bitIndex = 0;
if (messageByte > -1) {
}
}

if (messageByte > -1) {

//decide which attribute is going
//to be the next one in the new tag

messageBit =
((messageByte & (1 << bitIndex)) > 0) ? true : false;

if (messageBit) {
listElementIndex = 0;
}else{
listElementIndex = random.Next(1, originalList.Count);
}
}else{
listElementIndex = 0;
}

//move the attribute from old list to new list
originalList.RemoveAt(listElementIndex);
}
}

if (originalList.Count > 0) {
//add the last element - it never hides data
}
```

The sorted attributes have to be written back into the document. Most of the code has been copied from the previous article in this series, the one only about HTML attributes.

C#
```       HtmlTag.HtmlAttribute attribute;
foreach (String attributeName in resultList) {
attribute = tag.Attributes.GetByName(attributeName);

insertTextBuilder.Append(" ");
if (attribute.Value.Length > 0) {
insertTextBuilder.AppendFormat("{0}={1}",
attribute.Name, attribute.Value);
}else{
insertTextBuilder.Append(attributeName);
}
}

//replace old tag with new tag
//.. ... ...
}```

Have you noticed that the three `Hide` methods do nearly the same? The `Extract` methods are just as similar. Please look them up in the complete source code, if you don't have a binary allergy, yet. ;-)

## The Demo Application

The application consists of three "half duplex dialogs". You can either enter an empty carrier as a message for hiding, or a carrier for extracting. For text lists and images, you can see the result immediately.

In the dialogs for text lists and HTML documents, you can select an "alphabet file". That is a text file with a customized alphabet. See testdata/demo.txt for a possible custom alphabet.

For any other IDE than Visual Studio 2005/C# Express or SharpDevelop, please unzip the .NET 1.1 version, open an empty project, and add all code files.

## Share

 Software Developer Germany
Corinna lives in Hanover/Germany and works as a C# developer.

 First Prev Next
 Does not work on my 16 colors gif picture PShchegolevatykh19-Apr-13 6:19 PShchegolevatykh 19-Apr-13 6:19
 Re: Great work Corinna John23-Jun-10 23:08 Corinna John 23-Jun-10 23:08
 Algorithms manistago19-Mar-10 2:33 manistago 19-Mar-10 2:33
 Thnx debasish_don5-Jan-09 23:24 debasish_don 5-Jan-09 23:24
 Great, like it vidyaputra21-Mar-08 4:36 vidyaputra 21-Mar-08 4:36
 I just came from Steganography13... E. del Ayre14-Mar-08 20:18 E. del Ayre 14-Mar-08 20:18
 Wow, You're right... This is really wicked stuff... I instantly fell in love with the idea... Sad, that this has been around for sometime, and I got to read only now... Hah, got my 5! think fast, be brave and dont stop.
 Re: I just came from Steganography13... Corinna John15-Mar-08 1:36 Corinna John 15-Mar-08 1:36
 Nnnnice! Markus Mayer1-Aug-06 9:34 Markus Mayer 1-Aug-06 9:34
 Under-utilization of storage capabilities chipmeister28-Jun-05 12:31 chipmeister 28-Jun-05 12:31
 Re: Under-utilization of storage capabilities Corinna John28-Jun-05 19:12 Corinna John 28-Jun-05 19:12
 Shopping lists Paul Watson21-Feb-05 4:02 Paul Watson 21-Feb-05 4:02
 GIFs dshorter126-Jan-05 10:11 dshorter1 26-Jan-05 10:11
 Re: GIFs Corinna John26-Jan-05 19:02 Corinna John 26-Jan-05 19:02
 Last Visit: 31-Dec-99 18:00     Last Update: 31-Jul-21 9:59 Refresh 1