A managed wrapper for the HTML Tidy library
A managed C++ for a small part of the HTML Tidy C library
(For the latest changes, please see the history section at the end of the article)
Introduction
This is a small library in its initial creation state to provide a native .NET way in accessing the functions of the HTML Tidy library.
HTML Tidy is an open source C library for checking and generating clean XHTML/HTML. In other words: You can throw a misformatted HTML to the library and it will do its best to repair the errors and clean unnecessary items/tags from the HTML.
The Library
There already does exist a way to access the library from .NET, namely through the ATL wrapper of Charles Reitzel (SourceForge CVS repository of the sources here). But you need to register the COM ActiveX control first.
To get rid of this registration limitation, I created a C++/CLI wrapper of the original C library of HTML Tidy. This wrapper is a normal library that you can use in your .NET applications by simply adding a reference to the library.
Please note that my created library currently does not deserve to be called "library", because it really just consists of one single function until now.
The reason why I still do publish it here and now is that I want to provide the basic idea as early as possible to anyone being in the same situation than me (by needing a .NET wrapper for HTML Tidy). It's rather ease to take my library as a starting point and add the required functions you need. I did the core work, you simply add the functions you like.
Of course I gradually will add more functions to the library, as my requirements grow. And I also do encourage you to enhance it by yourself and send me your code so that I can include it.
The underlying C library
It was a pleasure to compile the original HTML Tidy C library. After first starting the provided Visual Studio .NET project file, compiled it for debug and release, received no errors, no warnings. Amazing! I never had such a seamless experience with compiling foreign C/C++ libraries.
Using my .NET library
The library currently has one function to call:
public string CleanHtml( string html );
Simply pass a string and get a cleaned up string back. Easy, isn't it?
An example usage could be:
using ( HtmlTidy tidy = new HtmlTidy() ) { string html = @" <html> <head> <meta http-equiv=""Content-Type"" content=""text/html; charset=utf-16""> </head> <body> <p>Hello, <b><i>With German</b></i>: ÄÖÜ. Some Chinese: 讪.</p> <body> </html> "; string s = tidy.CleanHtml( html, HtmlTidyOptions.ConvertToXhtml ); Console.WriteLine( s ); }
As you see, simply pass the string to the function. There is an overload with one option (currently, will be enhanced in the future, too).
Redistributing
In order to redistribute the library, please ensure that the Microsoft CRT runtime DLLs "msvcr80.dll", "msvcm80.dll" and "msvcp80.dll" are also being distributed. The libraries are usually being found in the folder "C:\Program Files\Microsoft Visual Studio 8\VC\redist\x86\Microsoft.VC80.CRT".
History
- 2007-01-14
Added the section about redistributing the CRT library.
- 2007-01-12
First version published.