65.9K
CodeProject is changing. Read more.
Home

HtmlDocument Introspection in Treeview

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.63/5 (9 votes)

Feb 9, 2009

CPOL

2 min read

viewsIcon

39021

downloadIcon

1208

HtmlDocument Introspection in Treeview showing html , form , link ,images and css

HtmlIntrospection

Introduction

After my article XML Introspection and TreeView , I take a look about the webbrowser component. and I discover this component have a property HtmlDocument (webBrowser1.Document). This is a good way to get info of the webpage , without parsing Html, the webbrowser component make it for you.  

Background 

  I want to expose to you here, a little application showing a webpage ( in the screenshoot the codeproject page ) and get information in the HtmlDocument ( tree of HtmlElement ).

Showing theses in a treeview and show in right a preview , and display property in a propertygrid ( right and bottom ). 

Using the code 

Enter an URL in the text entry  and press the Go button.  

When the web page is loaded, then the Event Handler webBrowser1_DocumentCompleted is call.

So we catch all html tag of body , forms , links, Images, and CSS

 For each type there's a method: 
private void FillTree(HtmlElement hElmFather, TreeNodeHtmlElm t,TreeNodeHtmlElm.TypeNode type) 
private void FillTreeForm(HtmlDocument doc, TreeNodeHtmlElm t) { 
            System.Collections.IEnumerator en = doc.Forms.GetEnumerator();
            while (en.MoveNext())
            {
                FillTree((HtmlElement)en.Current,t,TreeNodeHtmlElm.TypeNode.Form);  
            }  
private void FillTreeLink(HtmlDocument doc, TreeNodeHtmlElm t) 
// To find all link : string textToAdd = e.GetAttribute("href"); where e is a HtmlElement
private void FillTreeImage(HtmlDocument doc, TreeNodeHtmlElm t) 
// To find all image : string textToAdd = e.GetAttribute("src");
At each time we use a tempory array to not concider same img or link.
private void FillTreeCss(HtmlDocument doc, TreeNodeHtmlElm t)
For the CSS, the test is : 
     if(e.TagName.ToLower() == "link")
                {
                    if (e.GetAttribute("rel").ToLower() == "stylesheet")
		

So, the information are structured in a treeview, each element of treeview is a class TreeNodeHtmlElm : TreeNode.

Points of Interest 

I found interesting to explore a webpage in this way, a different way to see one.

I have a problem with tree view because the text of the node a too huge, and then the application is really slow when tooltips appear so I limit the size of 100:  

            public TreeNodeHtmlElm(HtmlElement elm,TypeNode t) : base()
            {
                type = t;
                mHtmlElement = elm;
                try
                {
                    if (elm.OuterText == null || elm.OuterText == "")
                    {
                        Text = elm.OuterHtml;
                    }
                    else
                    {
                        if (elm.OuterText.Length > 100)
                        {
                            Text = elm.OuterText.Substring(0, 100);
                        }
                        else
                        {
                            Text = elm.OuterText;
                        }
                    }
                }
                catch (Exception e)
                {
                    Text = "";
                }

If you click on the treenode, the application make a preview a the piece of html, in the windows a the right top. 

You can right click, and the there's a content menu , and you can save ( SaveTreeNodeHtml ) the Text of the subnodes.

It don't work for image , it doesn't save image only url of image, it could be inteesting in another version to download and save the image , the same for the CSS.

Please take a look of my different page

http://www.cmb-soft.com/ a css editor

My homepage http://vidalcharles.free.fr/

I'm looking for a job, if anybody have a job proposition please email me at charles.vidal(at)gmail.com thanks.