Click here to Skip to main content
11,932,103 members (59,395 online)
Click here to Skip to main content
Add your own
alternative version


5 bookmarked

How to export a blogspot blog to HTML/GitHub with C#

, 25 Aug 2014 LGPL3
Rate this:
Please Sign up or sign in to vote.
A LINQ-to-XML example

I finally did it: I bought LINQPad's Code Completion so I could write C# scripts easily. Now, sure, I could have used C# script for free, but... wait, why didn't I do that?

Anyway, in my previous post I explained in detail how to set up a blog on GitHub. Now it's time to convert my old blogger blog to a shiny new GitHub format.

To export your blog from Blogger, log into Blogger, go to your blog control panel, go to the Settings | Other tab, and click "Export Blog".

You get an XML file that is basically in Atom format. It's hard to understand because it doesn't have any line breaks (apart from the line breaks in your blog template, which just serve to distract you). It's a <feed> root element containing a bunch of <entry> elements, some of which contain your posts and others of which contain metadata.

Here's the code I came up with to export to GitHub. Just paste this into LINQPad or whatever, change the filepath to point to your xml file, and run it!

void Main()
    string filepath = @"C:\Downloads\Blog.xml";
    string text = File.ReadAllText(filepath);
    XDocument doc = XDocument.Parse(text);

    // Use XNamespaces to deal with those pesky "xmlns" attributes.
    // The underscore represents the default namespace.
    var _ = XNamespace.Get("");
    var app = XNamespace.Get("");

    var posts = doc.Root.Elements(_+"entry")
        // An <entry> is either a post, or some bit of metadata no one cares about.
        // Exclude entries that don't have a child like <category term="...#post"/>
        .Where(entry => entry.Element(_+"category").Attribute("term").ToString().Contains("#post"))
        // Exclude any entries with an <app:draft> element except <app:draft>no</app:draft>
        .Where(entry => !entry.Descendants(app+"draft").Any(draft => draft.Value != "no"));

    var outfolder = Path.Combine(Path.GetDirectoryName(filepath), Path.GetFileNameWithoutExtension(filepath));

    foreach (var entry in posts)
       // Extract data from XML
        DateTime published = DateTime.Parse(entry.Element(_+"published").Value);
        DateTime updated = DateTime.Parse(entry.Element(_+"updated").Value);
        string title = entry.Element(_+"title").Value;
        string content = entry.Element(_+"content").Value;
        string type = entry.Element(_+"content").Attribute("type").Value ?? "html";
        XElement empty = new XElement("empty");
        XAttribute emptA = new XAttribute("empty","");
        string originalLink = ((entry.Elements(_+"link")
            .FirstOrDefault(e => e.Attribute("rel").Value == "alternate") ?? empty)
            .Attribute("href") ?? emptA).Value;
        string outFileName = string.Format("{0:yyyy-MM-dd}-{1}.{2}", published,
               Path.GetFileNameWithoutExtension(originalLink), type);
        var outPath = Path.Combine(outfolder, outFileName);

        if (content.Count(c => c == '\n') <= 3)
            content = AddLineBreaks(content); // optional

        // Write output file (partial HTML for Jekyll)
        using (StreamWriter output = File.CreateText(outPath)) {
            output.WriteLine("title: \"{0}\"", title);
            output.WriteLine("layout: post");
            output.WriteLine("# Pulled from Blogger. Last updated there on: {0:yyyy-MM-dd}", updated);
            if (originalLink != "")
                output.WriteLine("<small><p><i>This post was imported from "+
                 "<a href='{0}'>blogspot</a>.</i></p></small>", originalLink);
            output.WriteLine(""); // Disable Jekyll/Liquid

It will create a folder named after the xml file, and inside that folder it will create an html file for each post, like this:


These filenames are in the correct format for Jekyll, so if you're moving to GitHub, just move all these files to your /_posts folder, commit, and you're done! If you want "proper" HTML files, modify the code above to produce proper code like <html><head>...</head>... instead of Jekyll front-matter.

Oh, by the way, Blogger's exported HTML contains no line breaks at all in your posts. So I wrote this little method to add some line breaks at appropriate spots:

string AddLineBreaks(string content)
    var sb = new StringBuilder(content.Length + 100);

    bool pre = false, fail;
    for (UString rest = content; !rest.IsEmpty;) {
        if (rest.StartsWith("<pre")) pre = true;
        if (rest.StartsWith("</pre")) pre = false;
        bool s;
        if ((s = rest.StartsWith("<br />")) || rest.StartsWith("<br/>")) {
            sb.Append(pre ? "\n" : "<br/>\n");
            rest = rest.Substring(s ? 6 : 5);
        if (rest.StartsWith("<li>") || rest.StartsWith("<p>") || rest.StartsWith("<tr>") || rest.StartsWith("<pre>") || rest.StartsWith("<blockquote>") || rest.StartsWith("<img"))
        if (rest.StartsWith("</ul>") || rest.StartsWith("</ol>") || rest.StartsWith("</blockquote>"))
        char c = (char)rest.PopFront(out fail);
        if (!fail) sb.Append(c);

    return sb.ToString();

This relies on UString in my Loyc.Essentials.dll library though (it's a kind of string slice). If you want to use this function, download LoycCore from NuGet.

The code was good enough for me, and hopefully it will be good enough for you... but I don't know if images work (certainly Blogger doesn't include images in the export file). Note: by default the HTML in blogspot has an "implicit" line breaks feature in which newlines are converted to <br/> for you. I turned off that feature because it often screws up formatting of nontrivial posts; if you left that option on, I'm not sure if the HTML that blogger gives to you in the XML file includes those auto-inserted <br/>s.


This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)


About the Author

Software Developer None
Canada Canada
Since I started programming when I was 11, I wrote the SNES emulator "SNEqr", the FastNav mapping component, LLLPG, and LES: XML for code, among other things. Now I'm old.

In my spare time I work on the Language of your choice (Loyc) initiative, which is about investigating ways to improve interoperability between programming languages, and includes Enhanced C# and LeMP, its Lexical Macro Processor.

You may also be interested in...

Comments and Discussions

GeneralMy vote of 5 Pin
Volynsky Alex3-Jan-15 8:41
professionalVolynsky Alex3-Jan-15 8:41 
QuestionAwesome! Pin
Member 1103544925-Aug-14 10:58
memberMember 1103544925-Aug-14 10:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.151126.1 | Last Updated 26 Aug 2014
Article Copyright 2014 by Qwertie
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid