Click here to Skip to main content
Click here to Skip to main content
Go to top

Steganography 13 - Hiding Binary Data in HTML Documents

, 13 Mar 2008
Rate this:
Please Sign up or sign in to vote.
Some ideas on how to hide binary data in text documents


The last twelve articles only dealt with hiding binary data in binary files. It's getting boring, isn't it? Let's take the first text format you can think of right now, and hide binary data in such a document. You are just reading an HTML page - alright, HTML is our file format for this article!

Find a Hiding Place

We cannot insert anything into an HTML document. Whatever we insert would be either visible in the browser, or visible in the source text as useless stuff. But the order of attributes can be changed, without changing the visible document or the file's size.

<span class="bigText" style="color:#0088ff">
         Text with a CSS class and special color

<span style="color:#0088ff" class="bigText">
         Do you see the difference?

The example above shows two variations of the same content. Let's define a very simple key from it:

Key Attribute Corresponding Attribute
class style

if( class-attribute before style-attribute ){
    the tag encodes a "1"-bit
    the tag encodes a "0"-bit

With this key, every combination of class and style stands for one bit. We need 80 text spans to hide 10 characters of a secret text. That's very much carrier text, for a little bit of secret text. Fortunately, HTML documents have more common attribute combinations, especially if we use old HTML with inline formatting instead of CSS. Here are a few examples. Key attribute first may mean "1", corresponding attribute first may mean "0".

Key Attribute Corresponding Attribute
width height
src alt
align valign
href target

A Short Example

The carrier documents must be quite long, because every tag can only hide a few bits. The home page of contains just enough attributes to hide 16 ASCII characters. Anyway, a short example document with hiding places for three bytes should be enough. Would you expect secrets in that page?

Above, you see a typical homepage of a bird fanatic, who has never heard about HTML 4 and uses a WYSIWYG editor he found on an old magazine CD. The page begins like that:

      <title>Canary Birds</title>
      <meta name="author" content="Peter Miller">

             .bigText{ font-size:14px; font-weight:bold; }
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
                       alink="#FF0000" vlink="#FF0000">
      <div align="center" width="50%">
           <span class="bigText" style="color:#0088ff">
             The Finches who got their Name from Islands
             which got their Name from Dogs

There are five useful attribute couples:

Key Attribute Corresponding Attribute
name content
text bgcolor
alink vlink
align width
class style

Each couple occurs only once, so the first part of the document can hide only five bits. Let's go on with the rest of the page:

      <table width="60%" height="100" cellpadding="4" cellspacing="0"
                bgcolor="white" align="center">

                 <td align="right" valign="middle">
                     <img src="exampleImage.jpg" width="164" height="116"
                             alt="Yellow Bird" title="Yellow Bird" border="0">
                 <td align="left" valign="top">
                     The most canaries are yellow, even though they can have
                     all thinkable patterns of
                     <span class="bigText"
                            style="color:#ffffff; background:#000000">white</span>,
                     <span class="bigText" style="color:#bb0000">red</span> and
                     <span class="bigText" style="color:#888888">grey</span>.
                     <a href="#" target="_blank">click here to see photos.</a>
                 <td align="right" valign="top">
                     Male birds are great singers.
                     <a href="#" target="_blank">click here to listen to a sample.</a>
                 <td align="left" valign="middle">
                     <img src="exampleImage2.jpg" width="164" height="176"
                         alt="Singing Bird" title="A Canary is singing" border="0">
                 <td align="left" valign="top">
                     You cannot keep canaries in a cage all day long.
                     They can get sick, if you don't let them fly.
                 <td align="left" valign="top">
                     Another big mistake is to keep one canary alone.
                     Every birds need at least one partner,
                     loneliness can lead to bad disorders.
                 <td colspan="2">
                     <img src="exampleImage3.jpg" width="194" height="35"
                             alt="Feather" title="A Canary Feather" border="0">

In this part of the document, additional attribute couples are possible:

Key Attribute Corresponding Attribute
width height
src alt
title border
cellspacing cellpadding
bgcolor align
align valign
href target

The combination of width and height occurs four times, that's a capacity of four bits. src and alt appear three times, that's a capacity for three bits. Three more bits from title and border. cellpadding/cellspacing occurs only once, just as bgcolor/align, that's another two bits. align/valign adds capacity for six bits, href/target adds three bits. Together with the five bits from above, the document has enough capacity to hide 26 bits, that's three characters and two unused bits.

Three characters are not enough for a long letter, but enough to say "no!", or, in ASCII values, "110 111 033" ("01101110 01101111 00100001"). Let's go through the document and find the first tag with a useable attribute couple...

<meta name="author" content="Peter Miller">

name/content is "1", content/name is "0".
We have to re-order the attributes, to hide a value of "0":

<meta content="Peter Miller" name="author">

One bit is done. Next bit...

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
       alink="#FF0000" vlink="#FF0000">

text/bgcolor is "1", bgcolor/text is "0".
alink/vlink is "1", vlink/alink is "0".
We want to hie "1" and "1", no changes to this line are required.

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
       alink="#FF0000" vlink="#FF0000">

... and so on... for every bit, we need to swap two attributes. Image tags can carry up to three bits, if also the deprecated attributes are there:

<img src="exampleImage.jpg" width="164" height="116" alt="Yellow Bird"
       title="Yellow Bird" border="0">

We want to hide "010".
The first key attribute in this tag is "src",
so we take the corresponding attribute "alt".
The bit to hide is "0", the combination for "0" is alt/src,
so we place the "alt"-attribute before the "src"-attribute.

<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"
        title="Yellow Bird" border="0">

The next key attribute is "width", the corresponding attribute is "height".
Now, the bit to hide is "1", so we put "height" after "width".
The third key attribute is "title", and its corresponding attribute is "border".
To hide a "0", we move "title" behind "border".

<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"
           border="0" title="Yellow Bird">

No more Examples, Show the Implementation!

Alright, first we need two classes to store HTML tags and their attributes. Attributes don't have many properties, they only have a name and a value. Each attribute in a tag can be used for only one message bit. The program has to mark it as already handled.

public class HtmlAttribute {
        private String name;
        private String value;
        private bool handled;

        public String Name {
            get { return name; }

        public String Value {
            get { return this.value; }
            set { this.value = value; }

        public bool Handled {
            get { return handled; }
            set { this.handled = value; }

        public HtmlAttribute(String name) {
   = name.ToLower();
            this.value = String.Empty;
            handled = false;

An HTML tag has a name and a number of attributes. The constructor searches the tag's text for attributes and their values.

public class HtmlTag {
        public int beginPosition;
        public int endPosition;
        private String name;

        public int BeginPosition {
            get { return beginPosition; }
            set { beginPosition = value; }

        public int EndPosition {
            get { return endPosition; }
            set { endPosition = value; }

        public String Name {
            get { return name; }

        private HtmlAttributeCollection attributes;
        public HtmlAttributeCollection Attributes{
            get{ return attributes; }

        public HtmlTag(String text, int beginPosition, int endPosition) {
            //... complicated lines for splitting tags into attributes ...
            //... you better read it in the full source code ...

The Hide method lists all HTML tags, and then loops over the tags and their attributes. Attributes that have already been handled are being ignored. If an attribute is still fresh and unused, the method looks it up in the key table...

/// <span class="code-SummaryComment"><summary>Hide a message in an HTML document</summary></span>
/// <span class="code-SummaryComment"><param name="sourceFileName">Path and name of the HTML document</param></span>
/// <span class="code-SummaryComment"><param name="destinationFileName">Path</span>
///         and name to save the resulting HTML document<span class="code-SummaryComment"></param></span>
/// <span class="code-SummaryComment"><param name="message">The message to hide</param></span>
/// <span class="code-SummaryComment"><param name="keyTable">DataTable with the key attributes</param></span>
public void Hide(String sourceFileName,
       String destinationFileName,
       Stream message,
       DataTable keyTable)
    //read the carrier document
    StreamReader reader = new StreamReader(sourceFileName, Encoding.Default);
    String htmlDocument = reader.ReadToEnd();

    message.Position = 0;

    //list the HTML tags
    HtmlTagCollection tags = FindTags(htmlDocument);

    StringBuilder insertTextBuilder = new StringBuilder();
    DataRow[] rows;
    HtmlAttribute secondAttribute;
    int offset = 0;
    int bitIndex = 7;
    int messageByte = 0;

    foreach (HtmlTag tag in tags) {

        insertTextBuilder.Remove(0, insertTextBuilder.Length);
        insertTextBuilder.AppendFormat("<{0}", tag.Name);

        foreach (HtmlAttribute attribute in tag.Attributes) {

            if (!attribute.Handled) { //attribute has not been used, yet

                //find key row for this attribute
                rows =
                  keyTable.Select(String.Format("firstAttribute = '{0}'",

... If the program finds the attribute's name in the first key column, it is a primary key attribute and its secondary key attribute is looked up in the attribute collection of the current tag. If the secondary key attribute exists, we have found a key attribute couple and are able to hide one bit.

                if (rows.Length > 0) {

                    //find corresponding attribute
                    secondAttribute = FindAttribute(

                    if (secondAttribute != null) {

                        if (bitIndex == 7) {
                            //get next message byte
                            bitIndex = 0;
                            messageByte = message.ReadByte();
                        } else {
                            //next bit

                        //change the attributes' order

                        //mark both attributes as handled
                        attribute.Handled = true;
                        secondAttribute.Handled = true;

If the attribute was not a primary key attribute, it can be a secondary key attribute. That means, it will be handled later on, together with its primary key attribute. If the attribute is not found in any key column, it is not meant to be used and must be copied into the new tag as it is.

                if (!attribute.Handled) {
                    //The attribute is not a primary key attribute.
                    //Is it a secondary key attribute?
                    bool copyAttribute = false;
                    rows =
                      keyTable.Select(String.Format("secondAttribute = '{0}'",

                    if(rows.Length > 0){
                        //if the corresponding first attribute
                        //does not exist in
                        //this tag or has already been used,
                        //this attribute will not be used and must be copied.
                        HtmlAttribute firstAttribute = FindAttribute(

                        if (firstAttribute == null) {
                            copyAttribute = true;
                            copyAttribute = firstAttribute.Handled;

                    else if (rows.Length == 0) {
                        //this attribute is not part
                        //of the key and must be copied.
                        copyAttribute = true;

                    if (copyAttribute) {
                        //copy unused attribute
                            @" {0}={1}",
                            attribute.Name, attribute.Value);

                        attribute.Handled = true;

At this point, you see the reason why we saved the start and end positions with every tag. When we're finished with a tag's attributes, we have to replace the old tag with the new one. Just for the case that a few white spaces got lost on the way, we compare old length and new length. If there is a difference, all following tags will still be found, even though they have been moved.

        //replace old tag with new tag

        tag.BeginPosition += offset;
        tag.EndPosition += offset;

        String insertText = insertTextBuilder.ToString();
        int newLength = insertText.Length;
        if (newLength > 0) {
            int oldLength = tag.EndPosition - tag.BeginPosition;
            htmlDocument = htmlDocument.Remove(tag.BeginPosition, oldLength);
            htmlDocument = htmlDocument.Insert(tag.BeginPosition, insertText);

            offset += (newLength - oldLength);

        if (messageByte < 0) {
            break; //finished

    //save the new document
    StreamWriter writer = new StreamWriter(destinationFileName);

How to Reconstruct the Message

Extracting a message is much easier, because we need not care about unused attributes. Loop through the tags and attributes, find a primary key attribute, get its corresponding attribute, and compare the positions, that's all.

/// <span class="code-SummaryComment"><summary>Extract a hidden message from an HTML document</summary></span>
/// <span class="code-SummaryComment"><param name="sourceFileName">Path and name of the HTML document</param></span>
/// <span class="code-SummaryComment"><param name="message">Empty stream for the message</param></span>
/// <span class="code-SummaryComment"><param name="keyTable">DataTable with the key attributes</param></span>
public void Extract(String sourceFileName, Stream message, DataTable keyTable) {

    // ... read the carrier document ...
    // ... list the HTML tags ...
    // ... declarations ...

    foreach (HtmlTag tag in tags) {
        foreach (HtmlAttribute attribute in tag.Attributes) {

            if (!attribute.Handled) { //attribute has not been used, yet

                //find key row for this attribute
                rows =
                   keyTable.Select(String.Format("firstAttribute = '{0}'",
                if (rows.Length > 0) {

                    //find corresponding attribute
                    secondAttribute = FindAttribute(

                    if (secondAttribute != null) {

                        attributePosition = htmlDocument.IndexOf(

                        secondAttributePosition = htmlDocument.IndexOf(

                        //compare the attributes' positions
                        messageByte = ExtractBit(

Like in the previous articles, the Extract methods expect to find the message's length, before the actual message begins. Because of a document's limited capacity, the length value is only one byte long, not four.

                        //next bit
                        if (bitIndex == 7) {
                            bitIndex = 0;

                            if ((message.Length == 1) && (messageLength == 0)) {
                                //read length
                                message.Position = 0;
                                BinaryReader binaryReader =
                                              new BinaryReader(message);
                                messageLength = binaryReader.ReadByte();
                                reader = null;
                                message.Position = 0;
                            else if ((messageLength > 0) &&
                                     (message.Length == messageLength)) {
                                break; //finished

                        } else {

                        //mark both attributes as handled
                        attribute.Handled = true;
                        secondAttribute.Handled = true;
     // ... skip attributes, exit when finished, and so on ...

Building a Key

The key is not any binary file anymore, it is a table of attributes. You should build your key with the key editor, and save it to an XML file. The *.zip archive contains two example files, maybe they are useful as key templates.


  • 14th November, 2004: Initial post
  • 13th March, 2008: Article updated - bug fixed in source archive


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Corinna John
Software Developer
Germany Germany
Corinna lives in Hannover/Germany (CeBIT City) and works as a Delphi developer, though her favorite language is C#.

Comments and Discussions

GeneralCorinna [modified] Pinmemberkukurikapu28-Aug-08 12:44 
Generalexplosion of possibilities PinmemberDarchangel18-Mar-08 4:39 
Questionplz help me Pinmemberp.razi16-Mar-08 19:31 
GeneralGreat Pinmembermerlin98114-Mar-08 4:18 
QuestionSyntax Error Reply its urgent Pinmemberkarthikeyan.net12-Mar-08 14:21 
GeneralRe: Syntax Error Reply its urgent PinmemberCorinna John12-Mar-08 22:04 
GeneralRe: Syntax Error Reply its urgent PinmemberCorinna John13-Mar-08 11:22 
Generalplease reply its urgent Pinmembersuvarna_bvb11-May-07 0:33 
GeneralRe: please reply its urgent PinmemberCorinna John11-May-07 4:19 
GeneralSteganography Vs data hidign. (Attack?) Pinmembermahdavi1102-Jul-06 21:35 
GeneralRe: Steganography Vs data hidign. (Attack?) PinmemberCorinna John3-Jul-06 0:55 
GeneralSuggestion for Optimization Pinmemberalexiev_nikolay17-Apr-06 10:02 
GeneralRe: Suggestion for Optimization PinmemberCorinna John17-Apr-06 14:15 
GeneralRe: Suggestion for Optimization Pinmembermahdavi1102-Jul-06 20:44 
General.please help me PinsussAnonymous20-Jun-05 4:49 
Generaleep! PinmemberDead Skin Mask8-Jun-05 3:03 
GeneralRe: eep! PinmemberCorinna John8-Jun-05 5:30 
GeneralRe: eep! PinmemberDead Skin Mask8-Jun-05 5:42 
GeneralRe: eep! PinmemberCorinna John8-Jun-05 5:57 
GeneralRe: eep! PinmemberDead Skin Mask8-Jun-05 20:51 
GeneralBut C# Express 2005 won't open... PinmemberJonoRingading3-Dec-04 14:42 
GeneralRe: But C# Express 2005 won't open... PinmemberCorinna John26-Dec-04 7:42 
GeneralRe: really interesting but... PinmemberCorinna John23-Nov-04 8:37 
GeneralRe: really interesting but... PinmemberAndrew C Armstrong23-Nov-04 8:55 
GeneralSlight Improvement PinmemberAndrew C Armstrong21-Nov-04 11:30 
Thinking about increasing the secret text efficiency, I've thought of a possible improvement to your attribute arrangement system.
My first thought was that there's a lot of overlap - for example, <body text="#000000" bgcolor="#FFFFFF" link="#FF0000" alink="#FF0000" vlink="#FF0000"> is the same as <body link="#FF0000" alink="#FF0000" vlink="#FF0000" text="#000000" bgcolor="#FFFFFF">, which means there's already one bit we've lost. But going further, there's 5! (or 120) permutations of the five attributes, which is just short of 7 bits. So we have at least 6 bits to play with in this tag alone.
The first thing to do is, rather than define individual pairs of attributes, assign each tag an ordinal, so that we can say that tag A is greater or less than tag B. Thus, if alink has a lower ordinal than link-
alink="#FF0000" link="#FF0000" ==> 0
link="#FF0000" alink="#FF0000" ==> 1
A very simple means of assigning ordinals, and thus pair orders, is by using alphabetic order - a pair of attributes are in alphabetic order means 0, in reverse order means 1.
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000" alink="#FF0000" vlink="#FF0000">   ==>   1010
text="#000000"      >   bgcolor="#FFFFFF"   ==> 1
bgcolor="#FFFFFF" <   link="#FF0000"      ==> 0
link="#FF0000"      >   alink="#FF0000"      ==> 1
alink="#FF0000"   <   vlink="#FF0000"      ==> 0
<body link="#FF0000" alink="#FF0000" vlink="#FF0000" text="#000000" bgcolor="#FFFFFF">   ==>   1011
link="#FF0000"   >   alink="#FF0000"      ==> 1
alink="#FF0000"   <   vlink="#FF0000"      ==> 0
vlink="#FF0000"   >   text="#000000"         ==> 1
text="#000000"   >   bgcolor="#FFFFFF"   ==> 1
And, because we're working with every attribute tag, every HTML Tag with n attributes can store n-1 bits.
And a quick algorithm to implement this system-
Sort all the attributes within a tag into alphabetical order in an array called attr_strings.
n is the number of bits we can store (number of attributes minus 1)
create an array of integers called attr, and two integers, min=0 and max=0
set attr[0] to 0
for(i = 0; i < n-1; i++)
      if bit[i] = 1
            attr[i+1] = min
            attr[i+1 = max
next i
for(i = 0; i < n; i++)
      attr[i] -= min
use attr as an array of indexes in attr_strings, and reassemble the tag
Therefore, to encode 0101 with our <body> tag:
Sort the attributes
attr_strings = { alink="#FF0000", bgcolor="#FFFFFF", link="#FF0000", text="#000000", vlink="#FF0000" }
min = 0
max = 0
attr = {0,0,0,0,0}
iteration 1:
bit[1] = 0, so
      max = 1
      attr = {0,1,0,0,0}
iteration 2:
bit[2] = 1, so
      min = -1
      attr = {0,1,-1,0,0}
iteration 3:
bit[3] = 0, so
      max = 2
      attr = {0,1,-1,2,0}
iteration 4:
bit[4] = 1, s0
      min = -2
      attr = {0,1,-1,2,-2}
Subtract min (-2) from each attr to give
attr = {2,3,1,4,0}
Then reconstruct the tag-
Tag = "<body "+attr_strings[2]+" "+attr_strings[1]+" "+attr_strings[3]+" "+attr_strings[0]+" "+attr_strings[4]+">";
Thus, Tag= "<body link="#FF0000" text="#000000" bgcolor="#FFFFFF" vlink="#FF0000 alink="#FF0000">"
Which equates to 0101.
Note that you needn't use alphabetic order, any ordering will suffice, just as long as both the encoder and decoder are using the same order.
This algorithm is still far from the mathematical optimum, but it's a start, and its pretty simple, as it doesn't really vary from your original idea. Taking into account all the attributes, across the entire file, you could fit quite a lot in a single document, expecially if you double up by including/removing quotation marks on attribute values (though that would need to ignore attributes where quotation marks were necessary).
Thanks for the idea!
Any comments?
Will code for bandwidth & caffeine
GeneralRe: Slight Improvement PinsussAnonymous22-Nov-04 7:28 
GeneralRe: Slight Improvement PinmemberCorinna John22-Nov-04 7:46 
GeneralRe: Slight Improvement PinmemberAndrew C Armstrong22-Nov-04 8:15 
GeneralRe: Slight Improvement PinmemberDavid Piepgrass29-Nov-04 13:39 
GeneralFabulous Work PinmemberAlexander Kent20-Nov-04 20:59 
GeneralReally good Article Pinmembercrandall18-Nov-04 12:12 
Questionhiding, now why should i? Pinmemberted van gaalen17-Nov-04 7:00 
GeneralYou are 100% crazy Pinmemberpeterchen17-Nov-04 4:28 
GeneralRe: You are 100% crazy PinmemberCorinna John17-Nov-04 19:37 
GeneralWeb Services Pinmemberadamhill15-Nov-04 7:32 
GeneralRe: Web Services PinmemberCorinna John15-Nov-04 19:11 
GeneralHello... PinmemberHumanOsc15-Nov-04 3:53 
GeneralRe: Hello... PinmemberCorinna John15-Nov-04 6:02 
GeneralRe: Hello... PinmemberHumanOsc18-Nov-04 0:18 
Questionattribute order? PinsussAnonymous15-Nov-04 0:52 
AnswerRe: attribute order? PinmemberCorinna John15-Nov-04 2:02 
Generalreally interesting but... Pinmemberl a u r e n14-Nov-04 11:00 
GeneralRe: really interesting but... PinmemberCorinna John14-Nov-04 19:33 
GeneralRe: really interesting but... Pinmemberroel_16-Nov-04 2:16 
GeneralRe: really interesting but... PinmemberCorinna John16-Nov-04 3:22 
GeneralRe: really interesting but... Pinmemberjbryner17-Nov-04 8:36 
GeneralRe: really interesting but... PinmemberKen Beckett (LSI)22-Nov-04 8:30 
GeneralRe: really interesting but... PinmemberAndrew C Armstrong22-Nov-04 9:22 
GeneralRe: really interesting but... PinmemberDavid Piepgrass29-Nov-04 13:52 
GeneralRe: really interesting but... Pinmemberjbryner22-Nov-04 9:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140916.1 | Last Updated 13 Mar 2008
Article Copyright 2004 by Corinna John
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid