Click here to Skip to main content
13,559,528 members
Click here to Skip to main content
Add your own
alternative version


7 bookmarked
Posted 17 Jan 2013
Licenced CPOL

Dead Simple HTML Sanitizer

, 18 Jan 2013
Rate this:
Please Sign up or sign in to vote.
A dead simple HTML Sanitizer (and HTML Parser) you can use to clean user HTML input.


This article introduces a dead simple HTML Sanitizer which you can use to clean up user-entered HTML or uploaded HTML documents.


Html Sanitizer WPF Demo



One of our systems features a Document Production module which allows users to upload (and save) custom HTML documents which can be downloaded by other user. The problem was that some users kept adding "unsafe" script tags (and other XSS vulnerabilities) in their documents which we had to Sanitize.

Note: I know of the Microsoft Anti-Cross Site Scripting Library but decided to write my own since adding a new reference to the project was out of the question.

Using the code

// Sample usage
const string input = "<scriPt>alert(0)</Script>This is the game <SCRIPT>";
var output = HtmlSanitizer.Sanitize(input);
Assert.AreEqual("This is the game ", output); 

Parse the HTML

You can also just parse the HTML document.

// Parsing a malfomed HTML document
var input = System.IO.File.ReadAllText("myfile.htm");
var doc = HtmlParser.Parse(input);
Assert.AreEqual(2, doc.ChildNodes.Count);

Tidy the HTML

You can also just tidy the HTML content.

// Tidy a malfomed HTML document
var input = "<input type=checkbox value=ON checked>";
var output = HtmlParser.Tidy(input);
Assert.AreEqual("<input type=\"checkbox\" value=\"ON\" checked=\"checked\"/>", output); 

Points of Interest/References

  1. The XML Viewer used in this article was taken from A Simple XML Document Viewer Control
  2. This code has not been tested against extremely malformed HTML so please be careful how you use it.
  3. You can always change the list of unsafe tags and attributes to meet your requirements


This is the first revision of the article.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Tawani Anyangwe
United States United States
No Biography provided

You may also be interested in...

Comments and Discussions

BugError on BR tags and at fix attribute value Pin
Pelle Penna28-Jan-14 0:40
memberPelle Penna28-Jan-14 0:40 
QuestionAwesome. Just what I needed. Pin
nakash20503-Oct-13 5:19
membernakash20503-Oct-13 5:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web02 | 2.8.180527.1 | Last Updated 18 Jan 2013
Article Copyright 2013 by Tawani Anyangwe
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid