HtmlDocument Introspection in Treeview






4.63/5 (9 votes)
HtmlDocument Introspection in Treeview showing html , form , link ,images and css

Introduction
After my article XML Introspection and TreeView , I take a look about the webbrowser component. and I discover this component have a property HtmlDocument (webBrowser1.Document). This is a good way to get info of the webpage , without parsing Html, the webbrowser component make it for you.
Background
I want to expose to you here, a little application showing a webpage ( in the screenshoot the codeproject page ) and get information in the HtmlDocument ( tree of HtmlElement ).
Showing theses in a treeview and show in right a preview , and display property in a propertygrid ( right and bottom ).
Using the code
Enter an URL in the text entry and press the Go button.
When the web page is loaded, then the Event Handler webBrowser1_DocumentCompleted is call.
So we catch all html tag of body , forms , links, Images, and CSS
For each type there's a method:
private void FillTree(HtmlElement hElmFather, TreeNodeHtmlElm t,TreeNodeHtmlElm.TypeNode type)
private void FillTreeForm(HtmlDocument doc, TreeNodeHtmlElm t) {
System.Collections.IEnumerator en = doc.Forms.GetEnumerator();
while (en.MoveNext())
{
FillTree((HtmlElement)en.Current,t,TreeNodeHtmlElm.TypeNode.Form);
}
private void FillTreeLink(HtmlDocument doc, TreeNodeHtmlElm t)
// To find all link : string textToAdd = e.GetAttribute("href"); where e is a HtmlElement
private void FillTreeImage(HtmlDocument doc, TreeNodeHtmlElm t)
// To find all image : string textToAdd = e.GetAttribute("src");
At each time we use a tempory array to not concider same img or link.
private void FillTreeCss(HtmlDocument doc, TreeNodeHtmlElm t)
For the CSS, the test is :
if(e.TagName.ToLower() == "link") { if (e.GetAttribute("rel").ToLower() == "stylesheet")
So, the information are structured in a treeview, each element of treeview is a class TreeNodeHtmlElm : TreeNode.
Points of Interest
I found interesting to explore a webpage in this way, a different way to see one.
I have a problem with tree view because the text of the node a too huge, and then the application is really slow when tooltips appear so I limit the size of 100:
public TreeNodeHtmlElm(HtmlElement elm,TypeNode t) : base() { type = t; mHtmlElement = elm; try { if (elm.OuterText == null || elm.OuterText == "") { Text = elm.OuterHtml; } else { if (elm.OuterText.Length > 100) { Text = elm.OuterText.Substring(0, 100); } else { Text = elm.OuterText; } } } catch (Exception e) { Text = ""; }
If you click on the treenode, the application make a preview a the piece of html, in the windows a the right top.
You can right click, and the there's a content menu , and you can save ( SaveTreeNodeHtml ) the Text of the subnodes.
It don't work for image , it doesn't save image only url of image, it could be inteesting in another version to download and save the image , the same for the CSS.
Please take a look of my different page
http://www.cmb-soft.com/ a css editor
My homepage http://vidalcharles.free.fr/
I'm looking for a job, if anybody have a job proposition please email me at charles.vidal(at)gmail.com thanks.