Click here to Skip to main content
13,348,874 members (76,845 online)
Click here to Skip to main content
Add your own
alternative version


14 bookmarked
Posted 5 Jul 2011

Data URI Image Extractor

Rate this:
Please Sign up or sign in to vote.
This short article will show an easy way to extract HTML data URI images and convert the HTML to use external images.


In HTML, it is actually possible to embed raw image data directly into HTML so that separate image files are not needed. This can speed up HTTP transfers in many cases, but there are compatibility issues especially with Internet Explorer up to and including version 8. This short article will show an easy way to extract these images and convert the HTML to use external images.

Data URI

Normally images are included using this syntax:

<img src="Image1.png">

The data URI syntax however allows images to be directly embedded, reducing the number of HTTP requests and also allowing for saving on disk as a single file. While this article deals with only the img tag, this method can be applied to other tags as well. Here is an example of data URI usage:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA

More information on data URIs is available here:

SeaMonkey 2.1

Most editors do not use the data URI syntax. However, starting with SeaMonkey 2.1 Composer (Mozilla HTML Editor), images which are dragged and dropped are imported using this syntax. This is quite a bad change in my opinion, especially since it is not obvious and because it is a change in behavior from 2.0. In my case, I made a large HTML file with over 50 images before I discovered it was not linking them, but instead embedding them.


Amazingly, there are quite a few online utilities to convert images to the data URI format, but none that I could find that could do the reverse. Because I did not want to hand-edit my document, I wrote a quick utility to extract the images to disk and change the HTML to use external images. This allows the document to be loaded by any standard browser including Internet Explorer 8.

About the Source Code

The source code is quite targeted to my specific need. It has a lot of limitations. I have published it however so that it is available as a foundation for you to expand should you have the same need.

  • The parsing is very basic, but works fine with SeaMonkey output.
  • It only supports PNG format currently.
  • There is no exception handling.
  • The code has not been optimized in any way.


ImageExtract is a console application and accepts one parameter. The parameter is the HTML file for input. The images will be output in the same directory, and the new HTML file will have a -new suffix. So if the input is index.html, the output HTML will be index-new.html.

Source Code

I have made the project available for download, but it is quite simple. It is a C# .NET Console application. For easy viewing, here is the class:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

namespace ImageExtract {
  class Program {
    // NOTE - This program is rough and dirty - I designed
    // it to accomplish and urgent task. I have not built in 
    // normal error handling etc.
    // It also has not been optimized at all
    // and certainly is not very efficient.
    // It also assumes all images are png files.
    static void Main(string[] aArgs) {
      string xSrcPathname = aArgs[0];
      string xPath = Path.GetDirectoryName(xSrcPathname);
      string xDestPathname = Path.Combine(xPath, 
             Path.GetFileNameWithoutExtension(xSrcPathname) + "-New.html");
      int xImgIdx = 0;

      Console.WriteLine("Processing " + Path.GetFileName(xSrcPathname));
      string xSrc = File.ReadAllText(xSrcPathname);
      var xDest = new StringBuilder();

      string xStart = @"data:image/png;base64,";
      string xB64;
      int x = 0;
      int y = 0;
      int z = 0;
      do {
        x = xSrc.IndexOf(xStart, z);
        if (x == -1) {
        // Write out preceding HTML
        xDest.Append(xSrc.Substring(z, x - z));

        // Get the Base64 string
        y = xSrc.IndexOf('"', x + 1);
        xB64 = xSrc.Substring(x + xStart.Length, y - x - xStart.Length);
        // Convert the Base64 string to binary data
        byte[] xImgData = System.Convert.FromBase64String(xB64);

        string xImgName;
        // Get Image name and replace it in the HTML
        // We don't want to overwrite images that might already exist on disk,
        // so cycle till we find a non used name
        do {
          xImgName = "Image" + xImgIdx.ToString("0000") + ".png";
        } while (File.Exists(Path.Combine(xPath, xImgName)));

        Console.WriteLine("Extracting " + xImgName);

        // Write image name into HTML
        // Write the binary data to disk
        File.WriteAllBytes(Path.Combine(xPath, xImgName), xImgData);

        z = y;
      } while (true);
      // Write out remaining HTML

      // Write out result
      File.WriteAllText(xDestPathname, xDest.ToString());
      Console.WriteLine("Output to " + Path.GetFileName(xDestPathname));


  • 5th July, 2011: Initial version


This article, along with any associated source code and files, is licensed under The BSD License


About the Author

Chad Z. Hower, a.k.a. Kudzu
"Programming is an art form that fights back"

Formerly the Regional Developer Adviser (DPE) for Microsoft Middle East and Africa, he was responsible for 85 countries spanning 4 continents and 10 time zones. Now Chad is a Microsoft MVP.

Chad is the chair of several popular open source projects including Indy and Cosmos (C# Open Source Managed Operating System).

Chad is the author of the book Indy in Depth and has contributed to several other books on network communications and general programming.

Chad has lived in Canada, Cyprus, Switzerland, France, Jordan, Russia, Turkey, and the United States. Chad has visited more than 60 countries, visiting most of them several times.

You may also be interested in...


Comments and Discussions

QuestionPaste URI into browser address bar for quick result Pin
Rich Trefz10-Jul-13 3:53
memberRich Trefz10-Jul-13 3:53 
AnswerRe: Paste URI into browser address bar for quick result Pin
Chad Z. Hower aka Kudzu10-Jul-13 3:56
memberChad Z. Hower aka Kudzu10-Jul-13 3:56 
QuestionUpdated to handle JPEG Pin
Chad Z. Hower aka Kudzu6-Jul-11 17:47
memberChad Z. Hower aka Kudzu6-Jul-11 17:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.180111.1 | Last Updated 5 Jul 2011
Article Copyright 2011 by Chad Z. Hower aka Kudzu
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid