Click here to Skip to main content
Click here to Skip to main content

A Simple Parser That Converts HTML from OneNote to Markdown

, 13 Jan 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
OneNote2Markdown converts the html generated from OneNote to Markdown format, which can then be translated to a cleaner normalized html by any online Markdown parser later.
This is an old version of the currently published tip/trick.

Introduction

OneNote2Markdown converts the html generated from OneNote (by sending to Word first and save as html) to Markdown format, which can then be translated to a cleaner normalized html by any online Markdown parser later.
  • Written in F#, the tool handles normal paragraphs, headings, links, lists, inlined code and code blocks only.
  • The input html file has to be named as "input.html", the tool will then generate "output.txt".
  • You can get the source code from here. To compile, it requires HtmlAgilityPack.
  • The example pack contains a sample article in docx, html & Markdown formats, which would give a basic demonstration on how the tool works.

Background

I tend to take notes in OneNote. When I first try to submit an article which is composed in OneNote, it is really a hassle to adapt the article to the template here manually . So I decided to make a parser which would automate most of the formatting work for me.

Implementation Overview

Headings

  • Determines the heading type of a paragraph by checking its font-size & color CSS property as well as if it has <b> or <i> ancestor.
    match font, color, node with
    | Some "16.0pt", Some "#17365D", (HasAncestor "b" true)   -> H 1
    | Some "13.0pt", Some "#366092", (HasAncestor "b" true)   -> H 2
    | Some "11.0pt", Some "#366092", (HasAncestor "b" true)  & (HasAncestor "i" false) -> H 3
    | Some "11.0pt", Some "#366092", (HasAncestor "b" true)  & (HasAncestor "i" true)  -> H 4
    | Some "11.0pt", Some "#366092", (HasAncestor "b" false) & (HasAncestor "i" false) -> H 5
    | Some "11.0pt", Some "#366092", (HasAncestor "b" false) & (HasAncestor "i" true)  -> H 6
    | _ -> Normal 
  • In the current version the heading mapping is as follows: h(n) in OneNote -> h(n+1) in Markdown, as a result of that there is no heading 1 in this site.
Code
  • Any text whose font is Consolas maps to code, otherwise not.
  • Is able to distinguish between inlined code and code blocks.
  • Preserves the indentation in code blocks.
Lists
  • Is able to distinguish between ordered lists and unordered lists.
  • Handles nested lists correctly.
Links
  • Checks if a piece of text contains the link by looking for <a> tag in its ancestors. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

colinfang

United Kingdom United Kingdom
No Biography provided

Comments and Discussions


Discussions posted for the Published version of this article. Posting a message here will take you to the publicly available article in order to continue your conversation in public.
 
GeneralMy vote of 1 Pinprofessionalsaramgsilva9-Aug-14 12:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150414.1 | Last Updated 13 Jan 2013
Article Copyright 2013 by colinfang
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid