Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

HtmlDocument Introspection in Treeview

0.00/5 (No votes)
8 Feb 2009 1  
HtmlDocument Introspection in Treeview showing html , form , link ,images and css

HtmlIntrospection

Introduction

After my article XML Introspection and TreeView , I take a look about the webbrowser component. and I discover this component have a property HtmlDocument (webBrowser1.Document). This is a good way to get info of the webpage , without parsing Html, the webbrowser component make it for you.  

Background 

  I want to expose to you here, a little application showing a webpage ( in the screenshoot the codeproject page ) and get information in the HtmlDocument ( tree of HtmlElement ).

Showing theses in a treeview and show in right a preview , and display property in a propertygrid ( right and bottom ). 

Using the code 

Enter an URL in the text entry  and press the Go button.  

When the web page is loaded, then the Event Handler webBrowser1_DocumentCompleted is call.

So we catch all html tag of body , forms , links, Images, and CSS

 For each type there's a method: 
private void FillTree(HtmlElement hElmFather, TreeNodeHtmlElm t,TreeNodeHtmlElm.TypeNode type) 
private void FillTreeForm(HtmlDocument doc, TreeNodeHtmlElm t) { 
            System.Collections.IEnumerator en = doc.Forms.GetEnumerator();
            while (en.MoveNext())
            {
                FillTree((HtmlElement)en.Current,t,TreeNodeHtmlElm.TypeNode.Form);  
            }  
private void FillTreeLink(HtmlDocument doc, TreeNodeHtmlElm t) 
// To find all link : string textToAdd = e.GetAttribute("href"); where e is a HtmlElement
private void FillTreeImage(HtmlDocument doc, TreeNodeHtmlElm t) 
// To find all image : string textToAdd = e.GetAttribute("src");
At each time we use a tempory array to not concider same img or link.
private void FillTreeCss(HtmlDocument doc, TreeNodeHtmlElm t)
For the CSS, the test is : 
     if(e.TagName.ToLower() == "link")
                {
                    if (e.GetAttribute("rel").ToLower() == "stylesheet")
		

So, the information are structured in a treeview, each element of treeview is a class TreeNodeHtmlElm : TreeNode.

Points of Interest 

I found interesting to explore a webpage in this way, a different way to see one.

I have a problem with tree view because the text of the node a too huge, and then the application is really slow when tooltips appear so I limit the size of 100:  

            public TreeNodeHtmlElm(HtmlElement elm,TypeNode t) : base()
            {
                type = t;
                mHtmlElement = elm;
                try
                {
                    if (elm.OuterText == null || elm.OuterText == "")
                    {
                        Text = elm.OuterHtml;
                    }
                    else
                    {
                        if (elm.OuterText.Length > 100)
                        {
                            Text = elm.OuterText.Substring(0, 100);
                        }
                        else
                        {
                            Text = elm.OuterText;
                        }
                    }
                }
                catch (Exception e)
                {
                    Text = "";
                }

If you click on the treenode, the application make a preview a the piece of html, in the windows a the right top. 

You can right click, and the there's a content menu , and you can save ( SaveTreeNodeHtml ) the Text of the subnodes.

It don't work for image , it doesn't save image only url of image, it could be inteesting in another version to download and save the image , the same for the CSS.

Please take a look of my different page

http://www.cmb-soft.com/ a css editor

My homepage http://vidalcharles.free.fr/

I'm looking for a job, if anybody have a job proposition please email me at charles.vidal(at)gmail.com thanks.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here