Download the demo class of this example :
Introduction
Hello again !
In this article we will be working on HTML parsing using a single .NET Framework Assembly.
The namespace is called mshtml, ill be showing how this assembly and its objects can be so helpfull in some basics ways.
The assembly
The assembly is located on the folder below :
If you are working on a remote website/shared drive letters from another environment, just copy and paste this assembly to the /bin folder.
First we need to add the assembly reference to our project.
Project Properties >> Add reference >> .NET >> Microsoft.mshtml
The HTML block
This is the simple example of a HTML block that we will be working with :
Dim myHTML$ = _
vbCrLf & "<html>" _
& vbCrLf & "<body>" _
& vbCrLf & "<input type=""text"" name=""myTextBox""/>" _
& vbCrLf & "<input type=""button"" name=""myButton""/>" _
& vbCrLf & "<input type=""checkbox"" name=""myCheckBox"" class=""removeMe""/>" _
& vbCrLf & "</body>" _
& vbCrLf & "</html>"
How you can see, its simple as 3 inputs.
The objects
In this assembly, we have all objects that can represent our HTML tags in this example ill be focusing on the INPUT tag, using the object: IHTMLInputElement
.
But we have a generic object that give us extra properties, the : IHTMLElement
.
To access those objects dont forget the next :
Imports mshtml
The function
And here we have the full function that we will be using, with some comments :
Function parseMyHtml(ByVal htmlToParse$)
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
htmlDocument.write(htmlToParse)
htmlDocument.close()
Dim allElements As IHTMLElementCollection = htmlDocument.body.all
Dim myTextBox As IHTMLInputElement = allElements.item("myTextBox")
myTextBox.value = "This is my text box!"
Dim myButton As IHTMLElement = allElements.item("myButton")
myButton.setAttribute("onClick", "javascript:alert('This is the button!')")
Dim myButton2 As IHTMLInputElement = allElements.item("myButton")
myButton2.value = "Click me!"
Dim allInputs As IHTMLElementCollection = allElements.tags("input")
Dim element As IHTMLElement
For Each element In allInputs
element.style.border = "1px solid red"
element.style.fontFamily = "Verdana"
If element.className = "removeMe" Then
element.outerHTML = ""
End If
Next
Return htmlDocument.body.parentElement.outerHTML
End Function
In this function we can see the use of some objects and its functions/properties.
The result HTML
After calling the function and showing to the user the result we have the next HTML processed block :
<HTML><HEAD></HEAD>
<BODY><INPUT style="BORDER-RIGHT: red 1px solid; BORDER-TOP: red 1px solid; BORDER-LEFT: red 1px solid;
BORDER-BOTTOM: red 1px solid; FONT-FAMILY: Verdana" value="This is my text box!" name=myTextBox>
<INPUT style="BORDER-RIGHT: red 1px solid; BORDER-TOP: red 1px solid; BORDER-LEFT: red 1px solid;
BORDER-BOTTOM: red 1px solid; FONT-FAMILY: Verdana" type=button value="Click me!"
name=myButton onClick="javascript:alert('This is the button!')"> </BODY></HTML>
A bit nasty code, because the mshtml process it and every attribute such as style:border and sets every single border instead, but believe me, the browser will recognize that.
Conclusion
After testing this assembly with another page, a big one, almost 120kb of pure HTML with javascript/CSS, images and some other stuff, the assembly showed very strong, and didn't report any error.
Very usefull, when you have some html inputed by an user and you need to change some properties.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.