Click here to Skip to main content
13,197,767 members (42,642 online)
Click here to Skip to main content
Add your own
alternative version


51 bookmarked
Posted 14 Nov 2004

Steganography 13 - Hiding Binary Data in HTML Documents

, 13 Mar 2008
Rate this:
Please Sign up or sign in to vote.
Some ideas on how to hide binary data in text documents


The last twelve articles only dealt with hiding binary data in binary files. It's getting boring, isn't it? Let's take the first text format you can think of right now, and hide binary data in such a document. You are just reading an HTML page - alright, HTML is our file format for this article!

Find a Hiding Place

We cannot insert anything into an HTML document. Whatever we insert would be either visible in the browser, or visible in the source text as useless stuff. But the order of attributes can be changed, without changing the visible document or the file's size.

<span class="bigText" style="color:#0088ff">
         Text with a CSS class and special color

<span style="color:#0088ff" class="bigText">
         Do you see the difference?

The example above shows two variations of the same content. Let's define a very simple key from it:

Key AttributeCorresponding Attribute

if( class-attribute before style-attribute ){
    the tag encodes a "1"-bit
    the tag encodes a "0"-bit

With this key, every combination of class and style stands for one bit. We need 80 text spans to hide 10 characters of a secret text. That's very much carrier text, for a little bit of secret text. Fortunately, HTML documents have more common attribute combinations, especially if we use old HTML with inline formatting instead of CSS. Here are a few examples. Key attribute first may mean "1", corresponding attribute first may mean "0".

Key AttributeCorresponding Attribute

A Short Example

The carrier documents must be quite long, because every tag can only hide a few bits. The home page of contains just enough attributes to hide 16 ASCII characters. Anyway, a short example document with hiding places for three bytes should be enough. Would you expect secrets in that page?

Above, you see a typical homepage of a bird fanatic, who has never heard about HTML 4 and uses a WYSIWYG editor he found on an old magazine CD. The page begins like that:

      <title>Canary Birds</title>
      <meta name="author" content="Peter Miller">

             .bigText{ font-size:14px; font-weight:bold; }
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"

                       alink="#FF0000" vlink="#FF0000">
      <div align="center" width="50%">
           <span class="bigText" style="color:#0088ff">
             The Finches who got their Name from Islands
             which got their Name from Dogs

There are five useful attribute couples:

Key AttributeCorresponding Attribute

Each couple occurs only once, so the first part of the document can hide only five bits. Let's go on with the rest of the page:

      <table width="60%" height="100" cellpadding="4" cellspacing="0"

                bgcolor="white" align="center">

                 <td align="right" valign="middle">
                     <img src="exampleImage.jpg" width="164" height="116"

                             alt="Yellow Bird" title="Yellow Bird" border="0">
                 <td align="left" valign="top">
                     The most canaries are yellow, even though they can have
                     all thinkable patterns of
                     <span class="bigText"

                            style="color:#ffffff; background:#000000">white</span>,
                     <span class="bigText" style="color:#bb0000">red</span> and
                     <span class="bigText" style="color:#888888">grey</span>.
                     <a href="#" target="_blank">click here to see photos.</a>
                 <td align="right" valign="top">
                     Male birds are great singers.
                     <a href="#" target="_blank">click here to listen to a sample.</a>
                 <td align="left" valign="middle">
                     <img src="exampleImage2.jpg" width="164" height="176"

                         alt="Singing Bird" title="A Canary is singing" border="0">
                 <td align="left" valign="top">
                     You cannot keep canaries in a cage all day long.
                     They can get sick, if you don't let them fly.
                 <td align="left" valign="top">
                     Another big mistake is to keep one canary alone.
                     Every birds need at least one partner,
                     loneliness can lead to bad disorders.
                 <td colspan="2">
                     <img src="exampleImage3.jpg" width="194" height="35"

                             alt="Feather" title="A Canary Feather" border="0">

In this part of the document, additional attribute couples are possible:

Key AttributeCorresponding Attribute

The combination of width and height occurs four times, that's a capacity of four bits. src and alt appear three times, that's a capacity for three bits. Three more bits from title and border. cellpadding/cellspacing occurs only once, just as bgcolor/align, that's another two bits. align/valign adds capacity for six bits, href/target adds three bits. Together with the five bits from above, the document has enough capacity to hide 26 bits, that's three characters and two unused bits.

Three characters are not enough for a long letter, but enough to say "no!", or, in ASCII values, "110 111 033" ("01101110 01101111 00100001"). Let's go through the document and find the first tag with a useable attribute couple...

<meta name="author" content="Peter Miller">

name/content is "1", content/name is "0".
We have to re-order the attributes, to hide a value of "0":

<meta content="Peter Miller" name="author">

One bit is done. Next bit...

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"

       alink="#FF0000" vlink="#FF0000">

text/bgcolor is "1", bgcolor/text is "0".
alink/vlink is "1", vlink/alink is "0".
We want to hie "1" and "1", no changes to this line are required.

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"

       alink="#FF0000" vlink="#FF0000">

... and so on... for every bit, we need to swap two attributes. Image tags can carry up to three bits, if also the deprecated attributes are there:

<img src="exampleImage.jpg" width="164" height="116" alt="Yellow Bird"

       title="Yellow Bird" border="0">

We want to hide "010".
The first key attribute in this tag is "src",
so we take the corresponding attribute "alt".
The bit to hide is "0", the combination for "0" is alt/src,
so we place the "alt"-attribute before the "src"-attribute.

<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"

        title="Yellow Bird" border="0">

The next key attribute is "width", the corresponding attribute is "height".
Now, the bit to hide is "1", so we put "height" after "width".
The third key attribute is "title", and its corresponding attribute is "border".
To hide a "0", we move "title" behind "border".

<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"

           border="0" title="Yellow Bird">

No more Examples, Show the Implementation!

Alright, first we need two classes to store HTML tags and their attributes. Attributes don't have many properties, they only have a name and a value. Each attribute in a tag can be used for only one message bit. The program has to mark it as already handled.

public class HtmlAttribute {
        private String name;
        private String value;
        private bool handled;

        public String Name {
            get { return name; }

        public String Value {
            get { return this.value; }
            set { this.value = value; }

        public bool Handled {
            get { return handled; }
            set { this.handled = value; }

        public HtmlAttribute(String name) {
   = name.ToLower();
            this.value = String.Empty;
            handled = false;

An HTML tag has a name and a number of attributes. The constructor searches the tag's text for attributes and their values.

public class HtmlTag {
        public int beginPosition;
        public int endPosition;
        private String name;

        public int BeginPosition {
            get { return beginPosition; }
            set { beginPosition = value; }

        public int EndPosition {
            get { return endPosition; }
            set { endPosition = value; }

        public String Name {
            get { return name; }

        private HtmlAttributeCollection attributes;
        public HtmlAttributeCollection Attributes{
            get{ return attributes; }

        public HtmlTag(String text, int beginPosition, int endPosition) {
            //... complicated lines for splitting tags into attributes ...
            //... you better read it in the full source code ...

The Hide method lists all HTML tags, and then loops over the tags and their attributes. Attributes that have already been handled are being ignored. If an attribute is still fresh and unused, the method looks it up in the key table...

/// <span class="code-SummaryComment"><summary>Hide a message in an HTML document</summary></span>
/// <span class="code-SummaryComment"><param name="sourceFileName">Path and name of the HTML document</param></span>
/// <span class="code-SummaryComment"><param name="destinationFileName">Path</span>
///         and name to save the resulting HTML document<span class="code-SummaryComment"></param></span>
/// <span class="code-SummaryComment"><param name="message">The message to hide</param></span>
/// <span class="code-SummaryComment"><param name="keyTable">DataTable with the key attributes</param></span>
public void Hide(String sourceFileName,
       String destinationFileName,
       Stream message,
       DataTable keyTable)
    //read the carrier document
    StreamReader reader = new StreamReader(sourceFileName, Encoding.Default);
    String htmlDocument = reader.ReadToEnd();

    message.Position = 0;

    //list the HTML tags
    HtmlTagCollection tags = FindTags(htmlDocument);

    StringBuilder insertTextBuilder = new StringBuilder();
    DataRow[] rows;
    HtmlAttribute secondAttribute;
    int offset = 0;
    int bitIndex = 7;
    int messageByte = 0;

    foreach (HtmlTag tag in tags) {

        insertTextBuilder.Remove(0, insertTextBuilder.Length);
        insertTextBuilder.AppendFormat("<{0}", tag.Name);

        foreach (HtmlAttribute attribute in tag.Attributes) {

            if (!attribute.Handled) { //attribute has not been used, yet

                //find key row for this attribute
                rows =
                  keyTable.Select(String.Format("firstAttribute = '{0}'",

... If the program finds the attribute's name in the first key column, it is a primary key attribute and its secondary key attribute is looked up in the attribute collection of the current tag. If the secondary key attribute exists, we have found a key attribute couple and are able to hide one bit.

if (rows.Length > 0) {

    //find corresponding attribute
    secondAttribute = FindAttribute(

    if (secondAttribute != null) {

        if (bitIndex == 7) {
            //get next message byte
            bitIndex = 0;
            messageByte = message.ReadByte();
        } else {
            //next bit

        //change the attributes' order

        //mark both attributes as handled
        attribute.Handled = true;
        secondAttribute.Handled = true;

If the attribute was not a primary key attribute, it can be a secondary key attribute. That means, it will be handled later on, together with its primary key attribute. If the attribute is not found in any key column, it is not meant to be used and must be copied into the new tag as it is.

        if (!attribute.Handled) {
            //The attribute is not a primary key attribute.
            //Is it a secondary key attribute?
            bool copyAttribute = false;
            rows =
              keyTable.Select(String.Format("secondAttribute = '{0}'",

            if(rows.Length > 0){
                //if the corresponding first attribute
                //does not exist in
                //this tag or has already been used,
                //this attribute will not be used and must be copied.
                HtmlAttribute firstAttribute = FindAttribute(

                if (firstAttribute == null) {
                    copyAttribute = true;
                    copyAttribute = firstAttribute.Handled;

            else if (rows.Length == 0) {
                //this attribute is not part
                //of the key and must be copied.
                copyAttribute = true;

            if (copyAttribute) {
                //copy unused attribute
                    @" {0}={1}",
                    attribute.Name, attribute.Value);

                attribute.Handled = true;

At this point, you see the reason why we saved the start and end positions with every tag. When we're finished with a tag's attributes, we have to replace the old tag with the new one. Just for the case that a few white spaces got lost on the way, we compare old length and new length. If there is a difference, all following tags will still be found, even though they have been moved.

        //replace old tag with new tag

        tag.BeginPosition += offset;
        tag.EndPosition += offset;

        String insertText = insertTextBuilder.ToString();
        int newLength = insertText.Length;
        if (newLength > 0) {
            int oldLength = tag.EndPosition - tag.BeginPosition;
            htmlDocument = htmlDocument.Remove(tag.BeginPosition, oldLength);
            htmlDocument = htmlDocument.Insert(tag.BeginPosition, insertText);

            offset += (newLength - oldLength);

        if (messageByte < 0) {
            break; //finished

    //save the new document
    StreamWriter writer = new StreamWriter(destinationFileName);

How to Reconstruct the Message

Extracting a message is much easier, because we need not care about unused attributes. Loop through the tags and attributes, find a primary key attribute, get its corresponding attribute, and compare the positions, that's all.

/// <span class="code-SummaryComment"><summary>Extract a hidden message from an HTML document</summary></span>
/// <span class="code-SummaryComment"><param name="sourceFileName">Path and name of the HTML document</param></span>
/// <span class="code-SummaryComment"><param name="message">Empty stream for the message</param></span>
/// <span class="code-SummaryComment"><param name="keyTable">DataTable with the key attributes</param></span>
public void Extract(String sourceFileName, Stream message, DataTable keyTable) {

    // ... read the carrier document ...
    // ... list the HTML tags ...
    // ... declarations ...

    foreach (HtmlTag tag in tags) {
        foreach (HtmlAttribute attribute in tag.Attributes) {

            if (!attribute.Handled) { //attribute has not been used, yet

                //find key row for this attribute
                rows =
                   keyTable.Select(String.Format("firstAttribute = '{0}'",
                if (rows.Length > 0) {

                    //find corresponding attribute
                    secondAttribute = FindAttribute(

                    if (secondAttribute != null) {

                        attributePosition = htmlDocument.IndexOf(

                        secondAttributePosition = htmlDocument.IndexOf(

                        //compare the attributes' positions
                        messageByte = ExtractBit(

Like in the previous articles, the Extract methods expect to find the message's length, before the actual message begins. Because of a document's limited capacity, the length value is only one byte long, not four.

                        //next bit
                        if (bitIndex == 7) {
                            bitIndex = 0;

                            if ((message.Length == 1) && (messageLength == 0)) {
                                //read length
                                message.Position = 0;
                                BinaryReader binaryReader =
                                              new BinaryReader(message);
                                messageLength = binaryReader.ReadByte();
                                reader = null;
                                message.Position = 0;
                            else if ((messageLength > 0) &&
                                     (message.Length == messageLength)) {
                                break; //finished

                        } else {

                        //mark both attributes as handled
                        attribute.Handled = true;
                        secondAttribute.Handled = true;
     // ... skip attributes, exit when finished, and so on ...

Building a Key

The key is not any binary file anymore, it is a table of attributes. You should build your key with the key editor, and save it to an XML file. The *.zip archive contains two example files, maybe they are useful as key templates.


  • 14th November, 2004: Initial post
  • 13th March, 2008: Article updated - bug fixed in source archive


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Corinna John
Software Developer
Germany Germany
Corinna lives in Hanover/Germany and works as a C#/Delphi developer.

You may also be interested in...

Comments and Discussions

GeneralCorinna [modified] Pin
kukurikapu28-Aug-08 12:44
memberkukurikapu28-Aug-08 12:44 
Generalexplosion of possibilities Pin
Darchangel18-Mar-08 4:39
memberDarchangel18-Mar-08 4:39 
Questionplz help me Pin
p.razi16-Mar-08 19:31
memberp.razi16-Mar-08 19:31 
GeneralGreat Pin
merlin98114-Mar-08 4:18
membermerlin98114-Mar-08 4:18 
QuestionSyntax Error Reply its urgent Pin
karthikeyan.net12-Mar-08 14:21
memberkarthikeyan.net12-Mar-08 14:21 
GeneralRe: Syntax Error Reply its urgent Pin
Corinna John12-Mar-08 22:04
memberCorinna John12-Mar-08 22:04 
GeneralRe: Syntax Error Reply its urgent Pin
Corinna John13-Mar-08 11:22
memberCorinna John13-Mar-08 11:22 
Generalplease reply its urgent Pin
suvarna_bvb11-May-07 0:33
membersuvarna_bvb11-May-07 0:33 
GeneralRe: please reply its urgent Pin
Corinna John11-May-07 4:19
memberCorinna John11-May-07 4:19 
GeneralSteganography Vs data hidign. (Attack?) Pin
mahdavi1102-Jul-06 21:35
membermahdavi1102-Jul-06 21:35 
GeneralRe: Steganography Vs data hidign. (Attack?) Pin
Corinna John3-Jul-06 0:55
memberCorinna John3-Jul-06 0:55 
GeneralSuggestion for Optimization Pin
alexiev_nikolay17-Apr-06 10:02
memberalexiev_nikolay17-Apr-06 10:02 
GeneralRe: Suggestion for Optimization Pin
Corinna John17-Apr-06 14:15
memberCorinna John17-Apr-06 14:15 
GeneralRe: Suggestion for Optimization Pin
mahdavi1102-Jul-06 20:44
membermahdavi1102-Jul-06 20:44 
General.please help me Pin
Anonymous20-Jun-05 4:49
sussAnonymous20-Jun-05 4:49 
Generaleep! Pin
Dead Skin Mask8-Jun-05 3:03
memberDead Skin Mask8-Jun-05 3:03 
GeneralRe: eep! Pin
Corinna John8-Jun-05 5:30
memberCorinna John8-Jun-05 5:30 
GeneralRe: eep! Pin
Dead Skin Mask8-Jun-05 5:42
memberDead Skin Mask8-Jun-05 5:42 
GeneralRe: eep! Pin
Corinna John8-Jun-05 5:57
memberCorinna John8-Jun-05 5:57 
GeneralRe: eep! Pin
Dead Skin Mask8-Jun-05 20:51
memberDead Skin Mask8-Jun-05 20:51 
GeneralBut C# Express 2005 won't open... Pin
JonoRingading3-Dec-04 14:42
memberJonoRingading3-Dec-04 14:42 
GeneralRe: But C# Express 2005 won't open... Pin
Corinna John26-Dec-04 7:42
memberCorinna John26-Dec-04 7:42 
GeneralRe: really interesting but... Pin
Corinna John23-Nov-04 8:37
memberCorinna John23-Nov-04 8:37 
GeneralRe: really interesting but... Pin
Andrew C Armstrong23-Nov-04 8:55
memberAndrew C Armstrong23-Nov-04 8:55 
GeneralSlight Improvement Pin
Andrew C Armstrong21-Nov-04 11:30
memberAndrew C Armstrong21-Nov-04 11:30 
GeneralRe: Slight Improvement Pin
Anonymous22-Nov-04 7:28
sussAnonymous22-Nov-04 7:28 
GeneralRe: Slight Improvement Pin
Corinna John22-Nov-04 7:46
memberCorinna John22-Nov-04 7:46 
GeneralRe: Slight Improvement Pin
Andrew C Armstrong22-Nov-04 8:15
memberAndrew C Armstrong22-Nov-04 8:15 
Hmm, interesting. You could extend that further, and rather than just use the src attribute, say that the first bit stored in any <img> tag marks whether or not the image holds further bits. Then you can use the same technique in other referencing tags, such as <a>. Choosing the first bit has an additional advantage - if there is only one attribute in a tag, which is common with <a> tags, you can use any form of encoding to ensure that you store that one bit. For example, <a href="url"> is 0, while <a href="url" >, with an additional space at the end, is 1. This allows you to spread secret text over an entire website, but prevent a decoding agent from taking inappropriate links (e.g. those that leave the site).
If a generated page (say with PHP or ASP.NET) links back to itself, you could code an entire message, irrelevant of size, by changing the secret text in the page each time it is generated, though you'd have to use cookies or session state to keep track of where in the message the decoder is. So, a casual observer would just find a page that has one dud link, whereas a decoder bot could find an entire message.

On a more algorithmic note, a better form of the algorithm above changes the way attributes are arranged to improve overall efficiency, by using as many different permutations as possible. The principle, again, is not notably different to Corinna's original idea, but is a tad more pedantic about optimality.

Consider a tag with five attributes. If we think of it as a tag with five vacant slots, and then iterate through each attribute in order (in the case of the algorithm above, that's alphabetic order, but again, it doesn't matter, as long as there's [i]an[/i] order)-

The first attribute can be placed in one of five slots - this gives us 2 (complete) bits of storage, thus the first two bits of the tag can be stored with the placement of the first attribute, i.e. if the first two bits to encode are 01, the first attribute goes in the second slot (or the slot at index 1).

The second can be placed in one of four slots, ignoring the one taken up by the first tag, and so encodes the next two bits.

The third attribute then has three available slots, storing one bit, the fourth tag stores another bit, and the last tag has only one place to go, so it doesn't store anything.

Et voila, the five attribute <body> tag which stored 4 bits above now stores 6 bits, which is the maximum you can store intact.

And an algorithm to do this-

Sort the attributes, according to your sort order, into an array called attr_strings
n is the number of tags
Create an array of n integers called attr, initialised to -1
Then iterate through the tags
for(int i = 0; i < n; i++)
b is the number of bits to encode with this attribute = trunc(log2(n - i))
s is the number of the free slot in which to place the current tag = the next b bits to encode, parsed into an integer
offset is the number of previously filled slots to 'skip' when placing this attribute = 0
for(int j = 0; j < s + offset; j++)
if(attr[j] > -1)
offset++; (That is, skip a slot that is not free)
next j
attr[j] = i;
next i
Reconstruct the tag using attr & attr_strings as before

Decode by doing the reverse - if the first attribute is in the second free slot, it must represent 01. If the second tag is in the third free slot (actually the [i]fourth[/i] overall slot, as it gets shunted along one by the first attribute in the second overall slot) it represents 10.

I knew I'd work it out eventually. The above can be explained/proven with binary trees and the like, but suffice to say that it works.


Will code for bandwidth and caffeine
GeneralRe: Slight Improvement Pin
David Piepgrass29-Nov-04 13:39
memberDavid Piepgrass29-Nov-04 13:39 
GeneralFabulous Work Pin
Alexander Kent20-Nov-04 20:59
memberAlexander Kent20-Nov-04 20:59 
GeneralReally good Article Pin
crandall18-Nov-04 12:12
membercrandall18-Nov-04 12:12 
Questionhiding, now why should i? Pin
ted van gaalen17-Nov-04 7:00
memberted van gaalen17-Nov-04 7:00 
GeneralYou are 100% crazy Pin
peterchen17-Nov-04 4:28
memberpeterchen17-Nov-04 4:28 
GeneralRe: You are 100% crazy Pin
Corinna John17-Nov-04 19:37
memberCorinna John17-Nov-04 19:37 
GeneralWeb Services Pin
adamhill15-Nov-04 7:32
memberadamhill15-Nov-04 7:32 
GeneralRe: Web Services Pin
Corinna John15-Nov-04 19:11
memberCorinna John15-Nov-04 19:11 
GeneralHello... Pin
HumanOsc15-Nov-04 3:53
memberHumanOsc15-Nov-04 3:53 
GeneralRe: Hello... Pin
Corinna John15-Nov-04 6:02
memberCorinna John15-Nov-04 6:02 
GeneralRe: Hello... Pin
HumanOsc18-Nov-04 0:18
memberHumanOsc18-Nov-04 0:18 
Questionattribute order? Pin
Anonymous15-Nov-04 0:52
sussAnonymous15-Nov-04 0:52 
AnswerRe: attribute order? Pin
Corinna John15-Nov-04 2:02
memberCorinna John15-Nov-04 2:02 
Generalreally interesting but... Pin
l a u r e n14-Nov-04 11:00
memberl a u r e n14-Nov-04 11:00 
GeneralRe: really interesting but... Pin
Corinna John14-Nov-04 19:33
memberCorinna John14-Nov-04 19:33 
GeneralRe: really interesting but... Pin
roel_16-Nov-04 2:16
memberroel_16-Nov-04 2:16 
GeneralRe: really interesting but... Pin
Corinna John16-Nov-04 3:22
memberCorinna John16-Nov-04 3:22 
GeneralRe: really interesting but... Pin
jbryner17-Nov-04 8:36
memberjbryner17-Nov-04 8:36 
GeneralRe: really interesting but... Pin
Ken Beckett (LSI)22-Nov-04 8:30
memberKen Beckett (LSI)22-Nov-04 8:30 
GeneralRe: really interesting but... Pin
Andrew C Armstrong22-Nov-04 9:22
memberAndrew C Armstrong22-Nov-04 9:22 
GeneralRe: really interesting but... Pin
David Piepgrass29-Nov-04 13:52
memberDavid Piepgrass29-Nov-04 13:52 
GeneralRe: really interesting but... Pin
jbryner22-Nov-04 9:32
memberjbryner22-Nov-04 9:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.171020.1 | Last Updated 13 Mar 2008
Article Copyright 2004 by Corinna John
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid