Download the demo class of this example :
Hello again !
In this article we will be working on HTML parsing using a single .NET Framework Assembly.
The namespace is called mshtml, ill be showing how this assembly and its objects can be so helpfull in some basics ways.
The assembly is located on the folder below :
If you are working on a remote website/shared drive letters from another environment, just copy and paste this assembly to the /bin folder.
First we need to add the assembly reference to our project.
Project Properties >> Add reference >> .NET >> Microsoft.mshtml
The HTML block
This is the simple example of a HTML block that we will be working with :
Dim myHTML$ = _
vbCrLf & "<html>" _
& vbCrLf & "<body>" _
& vbCrLf & "<input type=""text"" name=""myTextBox""/>" _
& vbCrLf & "<input type=""button"" name=""myButton""/>" _
& vbCrLf & "<input type=""checkbox"" name=""myCheckBox"" class=""removeMe""/>" _
& vbCrLf & "</body>" _
& vbCrLf & "</html>"
How you can see, its simple as 3 inputs.
In this assembly, we have all objects that can represent our HTML tags in this example ill be focusing on the INPUT tag, using the object:
But we have a generic object that give us extra properties, the :
To access those objects dont forget the next :
And here we have the full function that we will be using, with some comments :
Function parseMyHtml(ByVal htmlToParse$)
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
Dim allElements As IHTMLElementCollection = htmlDocument.body.all
Dim myTextBox As IHTMLInputElement = allElements.item("myTextBox")
myTextBox.value = "This is my text box!"
Dim myButton As IHTMLElement = allElements.item("myButton")
Dim myButton2 As IHTMLInputElement = allElements.item("myButton")
myButton2.value = "Click me!"
Dim allInputs As IHTMLElementCollection = allElements.tags("input")
Dim element As IHTMLElement
For Each element In allInputs
element.style.border = "1px solid red"
element.style.fontFamily = "Verdana"
If element.className = "removeMe" Then
element.outerHTML = ""
In this function we can see the use of some objects and its functions/properties.
The result HTML
After calling the function and showing to the user the result we have the next HTML processed block :
<BODY><INPUT style="BORDER-RIGHT: red 1px solid; BORDER-TOP: red 1px solid; BORDER-LEFT: red 1px solid;
BORDER-BOTTOM: red 1px solid; FONT-FAMILY: Verdana" value="This is my text box!" name=myTextBox>
<INPUT style="BORDER-RIGHT: red 1px solid; BORDER-TOP: red 1px solid; BORDER-LEFT: red 1px solid;
BORDER-BOTTOM: red 1px solid; FONT-FAMILY: Verdana" type=button value="Click me!"
A bit nasty code, because the mshtml process it and every attribute such as style:border and sets every single border instead, but believe me, the browser will recognize that.
Very usefull, when you have some html inputed by an user and you need to change some properties.