Click here to Skip to main content
12,445,907 members (71,779 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as

Stats

11.4K views
247 downloads
6 bookmarked
Posted

A Simple Parser That Converts HTML from OneNote to Markdown

, 13 Jan 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
OneNote2Markdown converts the html generated from OneNote to Markdown format, which can then be translated to a cleaner normalized html by any online Markdown parser later.
This is an old version of the currently published tip/trick.

Introduction

OneNote2Markdown converts the html generated from OneNote (by sending to Word first and save as html) to Markdown format, which can then be translated to a cleaner normalized html by any online Markdown parser later.
  • Written in F#, the tool handles normal paragraphs, headings, links, lists, inlined code and code blocks only.
  • The input html file has to be named as "input.html", the tool will then generate "output.txt".
  • You can get the source code from here. To compile, it requires HtmlAgilityPack.
  • The example pack contains a sample article in docx, html & Markdown formats, which would give a basic demonstration on how the tool works.

Background

I tend to take notes in OneNote. When I first try to submit an article which is composed in OneNote, it is really a hassle to adapt the article to the template here manually . So I decided to make a parser which would automate most of the formatting work for me.

Implementation Overview

Headings

  • Determines the heading type of a paragraph by checking its font-size & color CSS property as well as if it has <b> or <i> ancestor.
match font, color, node with
| Some "16.0pt", Some "#17365D", (HasAncestor "b" true)   -> H 1
| Some "13.0pt", Some "#366092", (HasAncestor "b" true)   -> H 2
| Some "11.0pt", Some "#366092", (HasAncestor "b" true)  & (HasAncestor "i" false) -> H 3
| Some "11.0pt", Some "#366092", (HasAncestor "b" true)  & (HasAncestor "i" true)  -> H 4
| Some "11.0pt", Some "#366092", (HasAncestor "b" false) & (HasAncestor "i" false) -> H 5
| Some "11.0pt", Some "#366092", (HasAncestor "b" false) & (HasAncestor "i" true)  -> H 6
| _ -> Normal
  • In the current version the heading mapping is as follows: h(n) in OneNote ->h(n+1) in Markdown, as a result of that there is no heading 1 in this site.
Code
  • Any text whose font is Consolas maps to code, otherwise not.
  • Is able to distinguish between inlined code and code blocks.
  • Preserves the indentation in code blocks.
Lists
  • Is able to distinguish between ordered lists and unordered lists.
  • Handles nested lists correctly.
Links
  • Checks if a piece of text contains the link by looking for <a> tag in its ancestors. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

colinfang
United Kingdom United Kingdom
No Biography provided

You may also be interested in...

Pro
Pro

Comments and Discussions


Discussions posted for the Published version of this article. Posting a message here will take you to the publicly available article in order to continue your conversation in public.
 
GeneralMy vote of 1 Pin
saramgsilva9-Aug-14 11:23
professionalsaramgsilva9-Aug-14 11:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160811.3 | Last Updated 13 Jan 2013
Article Copyright 2013 by colinfang
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid