Introduction
After my article XML Introspection and TreeView , I take a look about the webbrowser component. and I discover this component have a property HtmlDocument (webBrowser1.Document). This is a good way to get info of the webpage , without parsing Html, the webbrowser component make it for you.
Background
I want to expose to you here, a little application showing a webpage ( in the screenshoot the codeproject page ) and get information in the HtmlDocument ( tree of HtmlElement ).
Showing theses in a treeview and show in right a preview , and display property in a propertygrid ( right and bottom ).
Using the code
Enter an URL in the text entry and press the Go button.
When the web page is loaded, then the Event Handler webBrowser1_DocumentCompleted is call.
So we catch all html tag of body , forms , links, Images, and CSS
For each type there's a method:
private void FillTree(HtmlElement hElmFather, TreeNodeHtmlElm t,TreeNodeHtmlElm.TypeNode type)
private void FillTreeForm(HtmlDocument doc, TreeNodeHtmlElm t) {
System.Collections.IEnumerator en = doc.Forms.GetEnumerator();
while (en.MoveNext())
{
FillTree((HtmlElement)en.Current,t,TreeNodeHtmlElm.TypeNode.Form);
}
private void FillTreeLink(HtmlDocument doc, TreeNodeHtmlElm t)
private void FillTreeImage(HtmlDocument doc, TreeNodeHtmlElm t)
At each time we use a tempory array to not concider same img or link.
private void FillTreeCss(HtmlDocument doc, TreeNodeHtmlElm t)
For the CSS, the test is :
if(e.TagName.ToLower() == "link")
{
if (e.GetAttribute("rel").ToLower() == "stylesheet")
So, the information are structured in a treeview, each element of treeview is a class TreeNodeHtmlElm : TreeNode.
Points of Interest
I found interesting to explore a webpage in this way, a different way to see one.
I have a problem with tree view because the text of the node a too huge, and then the application is really slow when tooltips appear so I limit the size of 100:
public TreeNodeHtmlElm(HtmlElement elm,TypeNode t) : base()
{
type = t;
mHtmlElement = elm;
try
{
if (elm.OuterText == null || elm.OuterText == "")
{
Text = elm.OuterHtml;
}
else
{
if (elm.OuterText.Length > 100)
{
Text = elm.OuterText.Substring(0, 100);
}
else
{
Text = elm.OuterText;
}
}
}
catch (Exception e)
{
Text = "";
}
If you click on the treenode, the application make a preview a the piece of html, in the windows a the right top.
You can right click, and the there's a content menu , and you can save ( SaveTreeNodeHtml ) the Text of the subnodes.
It don't work for image , it doesn't save image only url of image, it could be inteesting in another version to download and save the image , the same for the CSS.
Please take a look of my different page
http://www.cmb-soft.com/ a css editor
My homepage http://vidalcharles.free.fr/
I'm looking for a job, if anybody have a job proposition please email me at charles.vidal(at)gmail.com thanks.