Small Web Agents using VB - Part II






3.80/5 (5 votes)
Small Web Agents using VB - Part II
Introduction
In the previous article, we saw a simple VB application which will pull out the HTML page of a particular URL. In this article, we will build a small web crawler which will crawl through all the links in the given URL.
- Setting up the Visual Basic Environment with required Components and Libraries:
- Open Visual Basic and create a new project (user Standard EXE).
- Select Project -> References from the main menu and add the following Microsoft Libraries:
- Microsoft HTML Object Library
- Add Microsoft Windows Common Controls to the toolbox as follows. Select Project -> Components from the main menu. The Components window will open. With the controls tab selected, scroll down and click the check box preceding the components:
- Microsoft Windows Common Control 6.x
- Set up the UI for the Crawler
- Add the code for the Crawler:
- On click of the start button, populate the list box with all the links under the given URL:
Private Sub cmdStart_Click() ' '1 will populate lstlinks with all the parent links 'in the requested URL getLinks txtURL.Text, 1 ' End Sub
- The
getlinks
function based on the second parameter populates either thelistbox
or thetreeview
. Here since the parameter is1
, it populates thelistbox
with all the links under the URL: -
Private Sub getLinks(strURL As String, iParentChild As Integer, _ Optional iParentNo As Integer) ' Dim objLink As HTMLLinkElement Dim objMSHTML As New MSHTML.HTMLDocument Dim objDoc As New MSHTML.HTMLDocument Dim objNode As Node ' Set objDoc = objMSHTML.createDocumentFromUrl(txtURL.Text, vbNullString) ' MousePointer = vbHourglass While objDoc.readyState <> "complete" DoEvents Wend 'get all Links For Each objLink In objDoc.links ' If iParentChild = 1 Then ' lstLinks.AddItem objLink ' ElseIf iParentChild = 2 Then ' 'lstInnerLinks.AddItem objLink Set objNode = trvLinks.Nodes.Add(iParentNo, tvwChild) objNode.Text = objLink 'objNode.Image = "leaf" ' End If ' Next MousePointer = vbNormal ' End Sub
- If the user wishes to go further down with some of the links, then she/he can select the links and press Get Inner Links Button:
Private Sub cmdGet_Click() ' Dim iCount As Integer 'Dim objNode As New Node If lstLinks.SelCount = 0 Then ' MsgBox "Please Select a Link" Exit Sub Else ' 'objNode.Text = lstLinks.Text 'For iCount = 0 To lstLinks.ListCount - 1 iCount = 0 While iCount <= lstLinks.ListCount - 1 If lstLinks.Selected(iCount) Then trvLinks.Nodes.Add , , , lstLinks.List(iCount) getLinks lstLinks.List(iCount), 2, trvLinks.Nodes.Count lstLinks.RemoveItem (iCount) lstLinks.Refresh Else iCount = iCount + 1 End If Wend 'Next ' End If ' End Sub
- All the inner links will get populated inside the
Treeview
. Now if the user further wishes to drilldown, he can double click on those URLs in thetreeview
:Private Sub trvLinks_DblClick() ' getLinks trvLinks.SelectedItem.Text, 2, trvLinks.SelectedItem.Index ' End Sub
- Then finally the screen would look something like this:
- On click of the start button, populate the list box with all the links under the given URL:
History
- 12th August, 2002: Initial post