Click here to Skip to main content
Email Password   helpLost your password?

Introduction

The last twelve articles only dealt with hiding binary data in binary files. It's getting boring, isn't it? Let's take the first text format you can think of right now, and hide binary data in such a document. You are just reading an HTML page - alright, HTML is our file format for this article!

Find a Hiding Place

We cannot insert anything into an HTML document. Whatever we insert would be either visible in the browser, or visible in the source text as useless stuff. But the order of attributes can be changed, without changing the visible document or the file's size.

<span class="bigText" style="color:#0088ff">
         Text with a CSS class and special color
</span>

<span style="color:#0088ff" class="bigText">
         Do you see the difference?
</span>

The example above shows two variations of the same content. Let's define a very simple key from it:

Key Attribute Corresponding Attribute
class style


if( class-attribute before style-attribute ){
    the tag encodes a "1"-bit
}
else{
    the tag encodes a "0"-bit
}

With this key, every combination of class and style stands for one bit. We need 80 text spans to hide 10 characters of a secret text. That's very much carrier text, for a little bit of secret text. Fortunately, HTML documents have more common attribute combinations, especially if we use old HTML with inline formatting instead of CSS. Here are a few examples. Key attribute first may mean "1", corresponding attribute first may mean "0".

Key Attribute Corresponding Attribute
width height
src alt
align valign
href target

A Short Example

The carrier documents must be quite long, because every tag can only hide a few bits. The home page of pc-errors.de contains just enough attributes to hide 16 ASCII characters. Anyway, a short example document with hiding places for three bytes should be enough. Would you expect secrets in that page?

Above, you see a typical homepage of a bird fanatic, who has never heard about HTML 4 and uses a WYSIWYG editor he found on an old magazine CD. The page begins like that:

<html>
<head>
      <title>Canary Birds</title>
      <meta name="author" content="Peter Miller">

      <style>
             .bigText{ font-size:14px; font-weight:bold; }
      </style>
</head>
<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
                       alink="#FF0000" vlink="#FF0000">
      <div align="center" width="50%">
           <h1>Canaries</h1>
           <span class="bigText" style="color:#0088ff">
             The Finches who got their Name from Islands
             which got their Name from Dogs
           </span>
      </div>

There are five useful attribute couples:

Key Attribute Corresponding Attribute
name content
text bgcolor
alink vlink
align width
class style

Each couple occurs only once, so the first part of the document can hide only five bits. Let's go on with the rest of the page:

      <table width="60%" height="100" cellpadding="4" cellspacing="0"
                bgcolor="white" align="center">

             <tr>
                 <td align="right" valign="middle">
                     <img src="exampleImage.jpg" width="164" height="116"
                             alt="Yellow Bird" title="Yellow Bird" border="0">
                 </td>
                 <td align="left" valign="top">
                     The most canaries are yellow, even though they can have
                     all thinkable patterns of
                     <span class="bigText"
                            style="color:#ffffff; background:#000000">white</span>,
                     <span class="bigText" style="color:#bb0000">red</span> and
                     <span class="bigText" style="color:#888888">grey</span>.
                     <a href="#" target="_blank">click here to see photos.</a>
                 </td>
             </tr>
             <tr>
                 <td align="right" valign="top">
                     Male birds are great singers.
                     <a href="#" target="_blank">click here to listen to a sample.</a>
                 </td>
                 <td align="left" valign="middle">
                     <img src="exampleImage2.jpg" width="164" height="176"
                         alt="Singing Bird" title="A Canary is singing" border="0">
                 </td>
             </tr>
             <tr>
                 <td align="left" valign="top">
                     You cannot keep canaries in a cage all day long.
                     They can get sick, if you don't let them fly.
                 </td>
                 <td align="left" valign="top">
                     Another big mistake is to keep one canary alone.
                     Every birds need at least one partner,
                     loneliness can lead to bad disorders.
                 </td>
             </tr>
             <tr>
                 <td colspan="2">
                     <img src="exampleImage3.jpg" width="194" height="35"
                             alt="Feather" title="A Canary Feather" border="0">
                 </td>
             </tr>
      </table>
</body>
</html>

In this part of the document, additional attribute couples are possible:

Key Attribute Corresponding Attribute
width height
src alt
title border
cellspacing cellpadding
bgcolor align
align valign
href target

The combination of width and height occurs four times, that's a capacity of four bits. src and alt appear three times, that's a capacity for three bits. Three more bits from title and border. cellpadding/cellspacing occurs only once, just as bgcolor/align, that's another two bits. align/valign adds capacity for six bits, href/target adds three bits. Together with the five bits from above, the document has enough capacity to hide 26 bits, that's three characters and two unused bits.

Three characters are not enough for a long letter, but enough to say "no!", or, in ASCII values, "110 111 033" ("01101110 01101111 00100001"). Let's go through the document and find the first tag with a useable attribute couple...

<meta name="author" content="Peter Miller">

name/content is "1", content/name is "0".
We have to re-order the attributes, to hide a value of "0":

<meta content="Peter Miller" name="author">

One bit is done. Next bit...

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
       alink="#FF0000" vlink="#FF0000">

text/bgcolor is "1", bgcolor/text is "0".
alink/vlink is "1", vlink/alink is "0".
We want to hie "1" and "1", no changes to this line are required.

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000"
       alink="#FF0000" vlink="#FF0000">

... and so on... for every bit, we need to swap two attributes. Image tags can carry up to three bits, if also the deprecated attributes are there:

<img src="exampleImage.jpg" width="164" height="116" alt="Yellow Bird"
       title="Yellow Bird" border="0">

We want to hide "010".
The first key attribute in this tag is "src",
so we take the corresponding attribute "alt".
The bit to hide is "0", the combination for "0" is alt/src,
so we place the "alt"-attribute before the "src"-attribute.

<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"
        title="Yellow Bird" border="0">

The next key attribute is "width", the corresponding attribute is "height".
Now, the bit to hide is "1", so we put "height" after "width".
The third key attribute is "title", and its corresponding attribute is "border".
To hide a "0", we move "title" behind "border".

<img alt="Yellow Bird" src="exampleImage.jpg" width="164" height="116"
           border="0" title="Yellow Bird">

No more Examples, Show the Implementation!

Alright, first we need two classes to store HTML tags and their attributes. Attributes don't have many properties, they only have a name and a value. Each attribute in a tag can be used for only one message bit. The program has to mark it as already handled.

public class HtmlAttribute {
        private String name;
        private String value;
        private bool handled;

        public String Name {
            get { return name; }
        }

        public String Value {
            get { return this.value; }
            set { this.value = value; }
        }

        public bool Handled {
            get { return handled; }
            set { this.handled = value; }
        }

        public HtmlAttribute(String name) {
            this.name = name.ToLower();
            this.value = String.Empty;
            handled = false;
        }
}

An HTML tag has a name and a number of attributes. The constructor searches the tag's text for attributes and their values.

public class HtmlTag {
        public int beginPosition;
        public int endPosition;
        private String name;

        public int BeginPosition {
            get { return beginPosition; }
            set { beginPosition = value; }
        }

        public int EndPosition {
            get { return endPosition; }
            set { endPosition = value; }
        }

        public String Name {
            get { return name; }
        }

        private HtmlAttributeCollection attributes;
        public HtmlAttributeCollection Attributes{
            get{ return attributes; }
        }

        public HtmlTag(String text, int beginPosition, int endPosition) {
            //... complicated lines for splitting tags into attributes ...
            //... you better read it in the full source code ...
        }
}

The Hide method lists all HTML tags, and then loops over the tags and their attributes. Attributes that have already been handled are being ignored. If an attribute is still fresh and unused, the method looks it up in the key table...

/// <summary>Hide a message in an HTML document</summary>
/// <param name="sourceFileName">Path and name of the HTML document</param>
/// <param name="destinationFileName">Path
///         and name to save the resulting HTML document</param>
/// <param name="message">The message to hide</param>
/// <param name="keyTable">DataTable with the key attributes</param>
public void Hide(String sourceFileName,
       String destinationFileName,
       Stream message,
       DataTable keyTable)
{
    //read the carrier document
    StreamReader reader = new StreamReader(sourceFileName, Encoding.Default);
    String htmlDocument = reader.ReadToEnd();
    reader.Close();

    message.Position = 0;

    //list the HTML tags
    HtmlTagCollection tags = FindTags(htmlDocument);

    StringBuilder insertTextBuilder = new StringBuilder();
    DataRow[] rows;
    HtmlAttribute secondAttribute;
    int offset = 0;
    int bitIndex = 7;
    int messageByte = 0;

    foreach (HtmlTag tag in tags) {

        insertTextBuilder.Remove(0, insertTextBuilder.Length);
        insertTextBuilder.AppendFormat("<{0}", tag.Name);

        foreach (HtmlAttribute attribute in tag.Attributes) {

            if (!attribute.Handled) { //attribute has not been used, yet

                //find key row for this attribute
                rows =
                  keyTable.Select(String.Format("firstAttribute = '{0}'",
                  attribute.Name));

... If the program finds the attribute's name in the first key column, it is a primary key attribute and its secondary key attribute is looked up in the attribute collection of the current tag. If the secondary key attribute exists, we have found a key attribute couple and are able to hide one bit.

                if (rows.Length > 0) {

                    //find corresponding attribute
                    secondAttribute = FindAttribute(
                                    rows[0]["secondAttribute"].ToString(),
                                    tag.Attributes);

                    if (secondAttribute != null) {

                        if (bitIndex == 7) {
                            //get next message byte
                            bitIndex = 0;
                            messageByte = message.ReadByte();
                        } else {
                            //next bit
                            bitIndex++;
                        }

                        //change the attributes' order
                        HideBit(messageByte,
                                bitIndex,
                                attribute,
                                secondAttribute,
                                insertTextBuilder);

                        //mark both attributes as handled
                        attribute.Handled = true;
                        secondAttribute.Handled = true;
                    }
                }

If the attribute was not a primary key attribute, it can be a secondary key attribute. That means, it will be handled later on, together with its primary key attribute. If the attribute is not found in any key column, it is not meant to be used and must be copied into the new tag as it is.

                if (!attribute.Handled) {
                    //The attribute is not a primary key attribute.
                    //Is it a secondary key attribute?
                    bool copyAttribute = false;
                    rows =
                      keyTable.Select(String.Format("secondAttribute = '{0}'",
                      attribute.Name));

                    if(rows.Length > 0){
                        //if the corresponding first attribute
                        //does not exist in
                        //this tag or has already been used,
                        //this attribute will not be used and must be copied.
                        HtmlAttribute firstAttribute = FindAttribute(
                                      rows[0]["firstAttribute"].ToString(),
                                      tag.Attributes);

                        if (firstAttribute == null) {
                            copyAttribute = true;
                        }else{
                            copyAttribute = firstAttribute.Handled;
                        }
                    }

                    else if (rows.Length == 0) {
                        //this attribute is not part
                        //of the key and must be copied.
                        copyAttribute = true;
                    }

                    if (copyAttribute) {
                        //copy unused attribute
                        insertTextBuilder.AppendFormat(
                            @" {0}={1}",
                            attribute.Name, attribute.Value);

                        attribute.Handled = true;
                    }
                }
            }
        }

At this point, you see the reason why we saved the start and end positions with every tag. When we're finished with a tag's attributes, we have to replace the old tag with the new one. Just for the case that a few white spaces got lost on the way, we compare old length and new length. If there is a difference, all following tags will still be found, even though they have been moved.

        //replace old tag with new tag

        tag.BeginPosition += offset;
        tag.EndPosition += offset;

        String insertText = insertTextBuilder.ToString();
        int newLength = insertText.Length;
        if (newLength > 0) {
            int oldLength = tag.EndPosition - tag.BeginPosition;
            htmlDocument = htmlDocument.Remove(tag.BeginPosition, oldLength);
            htmlDocument = htmlDocument.Insert(tag.BeginPosition, insertText);

            offset += (newLength - oldLength);
        }

        if (messageByte < 0) {
            break; //finished
        }
    }

    //save the new document
    StreamWriter writer = new StreamWriter(destinationFileName);
    writer.Write(htmlDocument);
    writer.Close();
}

How to Reconstruct the Message

Extracting a message is much easier, because we need not care about unused attributes. Loop through the tags and attributes, find a primary key attribute, get its corresponding attribute, and compare the positions, that's all.

/// <summary>Extract a hidden message from an HTML document</summary>
/// <param name="sourceFileName">Path and name of the HTML document</param>
/// <param name="message">Empty stream for the message</param>
/// <param name="keyTable">DataTable with the key attributes</param>
public void Extract(String sourceFileName, Stream message, DataTable keyTable) {

    // ... read the carrier document ...
    // ... list the HTML tags ...
    // ... declarations ...

    foreach (HtmlTag tag in tags) {
        foreach (HtmlAttribute attribute in tag.Attributes) {

            if (!attribute.Handled) { //attribute has not been used, yet

                //find key row for this attribute
                rows =
                   keyTable.Select(String.Format("firstAttribute = '{0}'",
                   attribute.Name));
                if (rows.Length > 0) {

                    //find corresponding attribute
                    secondAttribute = FindAttribute(
                                    rows[0]["secondAttribute"].ToString(),
                                    tag.Attributes);

                    if (secondAttribute != null) {

                        attributePosition = htmlDocument.IndexOf(
                                          attribute.Name,
                                          tag.BeginPosition);

                        secondAttributePosition = htmlDocument.IndexOf(
                                                secondAttribute.Name,
                                                tag.BeginPosition);

                        //compare the attributes' positions
                        messageByte = ExtractBit(
                                    attributePosition,
                                    secondAttributePosition,
                                    messageByte,
                                    bitIndex,
                                    message);

Like in the previous articles, the Extract methods expect to find the message's length, before the actual message begins. Because of a document's limited capacity, the length value is only one byte long, not four.

                        //next bit
                        if (bitIndex == 7) {
                            bitIndex = 0;

                            if ((message.Length == 1) && (messageLength == 0)) {
                                //read length
                                message.Position = 0;
                                BinaryReader binaryReader =
                                              new BinaryReader(message);
                                messageLength = binaryReader.ReadByte();
                                reader = null;
                                message.SetLength(0);
                                message.Position = 0;
                            }
                            else if ((messageLength > 0) &&
                                     (message.Length == messageLength)) {
                                break; //finished
                            }

                        } else {
                            bitIndex++;
                        }

                        //mark both attributes as handled
                        attribute.Handled = true;
                        secondAttribute.Handled = true;
                    }
                }
     // ... skip attributes, exit when finished, and so on ...
}

Building a Key

The key is not any binary file anymore, it is a table of attributes. You should build your key with the key editor, and save it to an XML file. The *.zip archive contains two example files, maybe they are useful as key templates.

History

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralCorinna [modified]
kukurikapu
13:44 28 Aug '08  
i would just like to ask if what really is the significance of your HTML steg and what particular part of a system can we apply it? and another favor, can you please give the algorithm of your program? you know, begginer's dilemma! peace out! tnx in advance. Big Grin

modified on Thursday, August 28, 2008 6:50 PM

Generalexplosion of possibilities
Darchangel
5:39 18 Mar '08  
On many websites, especially corporate ones, there are headers that get used on all pages which load tons of CSS and JavaScript. Furthermore, these headers often have you load code that isn't used for the page you're on. It's there because the same header is used throughout the site and it contains all of the scripts and style definitions for that site.
1) There are tons of item here to play with, reorder, etc.
2) If you know what you're doing, CSS can be written very concisely with no unnecessary repetition. The corollary to this is that by doing the opposite, you can greatly increase the amount of data you can hide. Regarding secrecy, seeing large sloppy CSS should arouse no suspicion since that's how it's often seen if WYSIWYG generated or if written by someone not very good at it.
3) Again, it's likely that for any given page there will be CSS present that won't be used for that page. So it should also arouse no suspicion if you add CSS that is used on no page at all.

Haven't given much thought about whether your methods would be compatible with JavaScript. If they are, you can apply all of the above to JS as well.
Questionplz help me
p.razi
20:31 16 Mar '08  
Hi,

I saw your beautiful programs , and enjoyed from them ,thank u very because of them,u are really expert girl in programer's world(it's rare and it's honorific because i'm too girl)
but i have one question about HTML steganography, in your program ,when we want to save web page ,it doesn't save completely,it meens the folder that have the image ,... of web page it not created,and at real world at internet,with this problem it can't be used,if u can solve this problem plz change it,thank u from your attention
GeneralGreat
merlin981
5:18 14 Mar '08  
A very interesting idea. Thank you for showing us other ways of hiding data. I hadn't considered using an html (or other text file) before. This has really got me thinking.

5 from me, great job. Love the key combination ideas.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rhabot - World of Warcraft Bot Uber RPGE - Free Private World of Warcraft Server Make long URLs short with NeatURL.net ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

QuestionSyntax Error Reply its urgent
karthikeyan.net
15:21 12 Mar '08  
{"Syntax error: Missing operand after 'events' operator."}

In the Line
rows = keyTable.Select("firstAttribute = '" + attribute.Name + "'");
GeneralRe: Syntax Error Reply its urgent
Corinna John
23:04 12 Mar '08  
Thanks for the hint! There's a bug in the HtmlAttribute class. The quotation marks in attribute names like "document.getelementbyid('sectionmenu')" must be escaped before they're used in SQL queries.

I've just submitted an update to CodeProject, the fixed code will arrive on this page in a few days. If you need the bugfix today, please tell me where to email it.

_____________________________________________________
This statement is false.

GeneralRe: Syntax Error Reply its urgent
Corinna John
12:22 13 Mar '08  
Everything's fine. Thanks to our great CP editors!
The fixed code has just been posted. Please re-download the archive and try again.

_____________________________________________________
This statement is false.

Generalplease reply its urgent
suvarna_bvb
1:33 11 May '07  
hello mam,
in our project v r getting error of

'System.Windows.Forms.Application' does not contain a definition for 'EnableRTLMirroring'

so please tel how to solve this error. if v comment this line v r not getting the html output file same as input file,..
so please................ help us
thank u
Smile



GeneralRe: please reply its urgent
Corinna John
5:19 11 May '07  
You are using an old version of VisualStudio. Newer versions generate that line by default.
Comment it and you'll be fine. If the application doesn't work then, there must be a different problem.

____________________________________
There is no proof for this sentence.

GeneralSteganography Vs data hidign. (Attack?)
mahdavi110
22:35 2 Jul '06  
your method is a data hiding method and does not concern about attacks that is able to reveal existanse of a message.

1- Security of such methods is based on Security by obscurity. Knowing your method, with a simple statistical analisys can reveal that there is a hidden message in it.

2-If some body consideres hiding data within html pages in a way that its browser view is intact, there is lots of space than can be used for hiding data.
for example in comments field or by capitalizing charaters of attributes name
(vlink for hiding 00000 and VlInK for hiding 10101).

4-steganography needs Stego key which is not described how to use a stego key in your method. Can some body apply stego key with your methode or not. As you better know that any other one, Stego key is used to spread hidden data whithin cover media.

5-Actually I am searching for Steganography in text's. Do you know some methods that can do it for me (With steganograhy considerations). Any kind of comment is very usefull.
Rose


GeneralRe: Steganography Vs data hidign. (Attack?)
Corinna John
1:55 3 Jul '06  
mahdavi110 wrote:
with a simple statistical analisys can reveal that there is a hidden message

How would such a statistical analysis work, could you please post an example?

mahdavi110 wrote:
capitalizing charaters of attributes name
1. Which HTML generator would write such a chaos?
2. A web designer who writes that style should be fired.
=> If there are uncommon capitals within a word, it is 100% sure that's something is wrong with the document.

mahdavi110 wrote:
steganography needs Stego key
There are several way of applying a key. For example, you could use only certain tags or attributes.

mahdavi110 wrote:
Do you know some methods that can do it for me

Sorry, I'm not Google. Wink

_____________________________________________________________________________
I don't expect too much, all I want is your vote for Halbsichtigkeit.

GeneralSuggestion for Optimization
alexiev_nikolay
11:02 17 Apr '06  
I read most of your steganography articles but some how I skiped this one. So now when I read it I was thinking about some improvmants and most important how to save larger message.

There is a way to use all the attributes and even to use every attribute to save one bit. We can use the alphabet order to determined which attribute suppose to be first.

For example:
let's pretend that the numbers 1, 2, 3 are attributes and alphabeticaly thay are in the same oreder like 1, 2, 3. so in this example I'm using every attribute for storing 1 bit. Only the first one is skiped - used for comparing.

here I store all the possible 2 bits with 3 attributes.
00 - 123
01 - 132
10 - 213
11 - 321

I didn't try this with complex example but I'm pretty sure this strategy suppose to work.



-- modified at 16:02 Monday 17th April, 2006
GeneralRe: Suggestion for Optimization
Corinna John
15:15 17 Apr '06  
Good idea - somebody else already told me. Cool
Your suggestion works for any list, I tried to implement it there:
http://www.codeproject.com/csharp/steganodotnet14.asp
_________________________________
Please inform me about my English mistakes, as I'm still trying to learn your language!

GeneralRe: Suggestion for Optimization
mahdavi110
21:44 2 Jul '06  
First Idea that may come into mind.

generally if you can find a set on N attributes that are permutable you can hide a message as long as log2(N!) bits.

But finding such sets may be difficault. But we may constract some additional attributes when needed. for example if a text tag just has font and color attributes we may add size and ... other tags to it while considering not to chage the appearence.
So we may find These long sets and hide more bits.

If we can find 10 attributes set we may hide 21 bits.
How ... I am.

Let me know your Idea.

((:Smile )
General.please help me
Anonymous
5:49 20 Jun '05  
dear corrina..
Generaleep!
Dead Skin Mask
4:03 8 Jun '05  
this might just be the craziest CP article i've read...
seriously... like... wow...

(really good though Poke tongue )
GeneralRe: eep!
Corinna John
6:30 8 Jun '05  
Does that mean.. *sniff* ... you have not read part 14, yet?
It does the same, only a lot crazier. Rose

_________________________________
Please inform me about my English mistakes, I still try to learn your language!

GeneralRe: eep!
Dead Skin Mask
6:42 8 Jun '05  
actually i've read them all now.. i think.
...
i demand more!

GeneralRe: eep!
Corinna John
6:57 8 Jun '05  
Thanks four your interest. Smile
I'm happy about every CPian who can read 15 articles without becoming as crazy as I am.


Dead Skin Mask wrote: i demand more!
Hmmm... that's not easy. Before I can post more, I have to write more, and before I can write more, I have to discover more. Usually, an article needs much time, and the longest parts are "waiting for the next idea" and "finishing all other things I planned".

_________________________________
Please inform me about my English mistakes, I still try to learn your language!

GeneralRe: eep!
Dead Skin Mask
21:51 8 Jun '05  
yeh i understand.

i'm just demanding more because your articles are actually worth reading from start to finish and not just reading the intro and conclusions like i do with most of them now.

in any case, i hope there's more.. Big Grin
GeneralBut C# Express 2005 won't open...
JonoRingading
15:42 3 Dec '04  
Hi Coco,
Do you know of anyway to provide source for C# v1.1? Partial and Static aren't part of production yet...
also, if spaces (not non-breaking spaces, are maintained when sent by your webserver (they are on Mine...), you could use varying spaces to encode quite a few bits... place 1 to 16 spaces between any tags....<input            type="text"      name="TheName"         size=      "15" LENGTH      = "6"   >
most people wouldn't notice.
The Key would be a guide to look inside which tags and how the spaces are interpreted. the encryptor would first clean out all unnecessary spaces, then pad the spaces to add the bits to the HTML.



Cheers,
John R. Hanson
GeneralRe: But C# Express 2005 won't open...
Corinna John
8:42 26 Dec '04  
Here is a .NET 1.1 version:
http://www.binary-universe.net/articles/13/steganodotnet13_src_sd.zip
The solution and project files are for SharpDevelop. VisualStudio fans may create a new solution and add all files Wink

_________________________________
Vote '1' if you're too lazy for a discussion

GeneralRe: really interesting but...
Corinna John
9:37 23 Nov '04  
No and yes... yesterday I thought a stegano-webserver or a pseudo-SOAP-formatter would be of nearly no interest to anyone out there. But now that Andrew pointed out a few improvements, maybe the next article will be on hiding a meta-stream in website content. All references in a every HTML page can link to more carrier documents with more hidden data... Every website can contain a full meta-website. We'll need a browser plugin to view the hidden webs...
Damn it, I'll never finish this series, it's going to be my fellow for years! Roll eyes

_________________________________
Vote '1' if you're too lazy for a discussion

GeneralRe: really interesting but...
Andrew C Armstrong
9:55 23 Nov '04  
Wow, I inspired someone. Or gave them more work to do, I'm not sure... Though on the flip side, this article has given me a great idea for my current project in Combinatorial Optimisation, and converting combinations into permutations. So you see, this really is useful, if not specifically for steganography.

I'm also poking round with a linguisitic engine, intially for semantic analysis and translation, but given the amount of redundancy in natural language, you could possibly use it for steganography - and then you're hiding data in the very way sentences are worded.

But first, I have to get it working...

Andrew

Will code for bandwidth and caffeine
GeneralSlight Improvement
Andrew C Armstrong
12:30 21 Nov '04  
Thinking about increasing the secret text efficiency, I've thought of a possible improvement to your attribute arrangement system.

My first thought was that there's a lot of overlap - for example, <body text="#000000" bgcolor="#FFFFFF" link="#FF0000" alink="#FF0000" vlink="#FF0000"> is the same as <body link="#FF0000" alink="#FF0000" vlink="#FF0000" text="#000000" bgcolor="#FFFFFF">, which means there's already one bit we've lost. But going further, there's 5! (or 120) permutations of the five attributes, which is just short of 7 bits. So we have at least 6 bits to play with in this tag alone.

The first thing to do is, rather than define individual pairs of attributes, assign each tag an ordinal, so that we can say that tag A is greater or less than tag B. Thus, if alink has a lower ordinal than link-

alink="#FF0000" link="#FF0000" ==> 0
link="#FF0000" alink="#FF0000" ==> 1

A very simple means of assigning ordinals, and thus pair orders, is by using alphabetic order - a pair of attributes are in alphabetic order means 0, in reverse order means 1.

Thus,

<body text="#000000" bgcolor="#FFFFFF" link="#FF0000" alink="#FF0000" vlink="#FF0000">   ==>   1010

Because-

text="#000000"      >   bgcolor="#FFFFFF"   ==> 1
bgcolor="#FFFFFF" <   link="#FF0000"      ==> 0
link="#FF0000"      >   alink="#FF0000"      ==> 1
alink="#FF0000"   <   vlink="#FF0000"      ==> 0

Whereas,

<body link="#FF0000" alink="#FF0000" vlink="#FF0000" text="#000000" bgcolor="#FFFFFF">   ==>   1011
link="#FF0000"   >   alink="#FF0000"      ==> 1
alink="#FF0000"   <   vlink="#FF0000"      ==> 0
vlink="#FF0000"   >   text="#000000"         ==> 1
text="#000000"   >   bgcolor="#FFFFFF"   ==> 1

And, because we're working with every attribute tag, every HTML Tag with n attributes can store n-1 bits.

And a quick algorithm to implement this system-

Sort all the attributes within a tag into alphabetical order in an array called attr_strings.
n is the number of bits we can store (number of attributes minus 1)
create an array of integers called attr, and two integers, min=0 and max=0
set attr[0] to 0
for(i = 0; i < n-1; i++)
      if bit[i] = 1
            min--
            attr[i+1] = min
      else
            max++
            attr[i+1 = max
next i

for(i = 0; i < n; i++)
      attr[i] -= min
use attr as an array of indexes in attr_strings, and reassemble the tag

Therefore, to encode 0101 with our <body> tag:

Sort the attributes
attr_strings = { alink="#FF0000", bgcolor="#FFFFFF", link="#FF0000", text="#000000", vlink="#FF0000" }
min = 0
max = 0
attr = {0,0,0,0,0}

iteration 1:
bit[1] = 0, so
      max = 1
      attr = {0,1,0,0,0}

iteration 2:
bit[2] = 1, so
      min = -1
      attr = {0,1,-1,0,0}

iteration 3:
bit[3] = 0, so
      max = 2
      attr = {0,1,-1,2,0}

iteration 4:
bit[4] = 1, s0
      min = -2
      attr = {0,1,-1,2,-2}

Subtract min (-2) from each attr to give
attr = {2,3,1,4,0}

Then reconstruct the tag-
Tag = "<body "+attr_strings[2]+" "+attr_strings[1]+" "+attr_strings[3]+" "+attr_strings[0]+" "+attr_strings[4]+">";
Thus, Tag= "<body link="#FF0000" text="#000000" bgcolor="#FFFFFF" vlink="#FF0000 alink="#FF0000">"

Which equates to 0101.

Note that you needn't use alphabetic order, any ordering will suffice, just as long as both the encoder and decoder are using the same order.

This algorithm is still far from the mathematical optimum, but it's a start, and its pretty simple, as it doesn't really vary from your original idea. Taking into account all the attributes, across the entire file, you could fit quite a lot in a single document, expecially if you double up by including/removing quotation marks on attribute values (though that would need to ignore attributes where quotation marks were necessary).

Thanks for the idea!

Any comments?

Andrew

Will code for bandwidth & caffeine


Last Updated 13 Mar 2008 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010