Click here to Skip to main content
Licence CPOL
First Posted 12 Aug 2002
Views 126,120
Downloads 1,459
Bookmarked 26 times

Small Web Agents using VB - Part II

By | 12 Aug 2002 | Article
Small Web Agents using VB - Part II

Introduction

In the previous article, we saw a simple VB application which will pull out the HTML page of a particular URL. In this article, we will build a small web crawler which will crawl through all the links in the given URL.

  1. Setting up the Visual Basic Environment with required Components and Libraries:
    • Open Visual Basic and create a new project (user Standard EXE).
    • Select Project -> References from the main menu and add the following Microsoft Libraries:
      • Microsoft HTML Object Library
    • Add Microsoft Windows Common Controls to the toolbox as follows. Select Project -> Components from the main menu. The Components window will open. With the controls tab selected, scroll down and click the check box preceding the components:
      • Microsoft Windows Common Control 6.x
  2. Set up the UI for the Crawler
    • Add a label, two button controls, a listbox, and a treeview control as below:

      Click to enlarge image

  3. Add the code for the Crawler:
    • On click of the start button, populate the list box with all the links under the given URL:
      Private Sub cmdStart_Click()
      '
      	'1 will populate lstlinks with all the parent links 
               'in the requested URL
      	getLinks txtURL.Text, 1
      			'
      End Sub
    • The getlinks function based on the second parameter populates either the listbox or the treeview. Here since the parameter is 1, it populates the listbox with all the links under the URL:
    • Private Sub getLinks(strURL As String, iParentChild As Integer, _
      	Optional iParentNo As Integer)
      '
          Dim objLink As HTMLLinkElement
          Dim objMSHTML As New MSHTML.HTMLDocument
          Dim objDoc As New MSHTML.HTMLDocument
          Dim objNode As Node
          '
          Set objDoc = objMSHTML.createDocumentFromUrl(txtURL.Text, vbNullString)
          '
          MousePointer = vbHourglass
          While objDoc.readyState <> "complete"
              DoEvents
          Wend
          'get all Links
          For Each objLink In objDoc.links
          '
              If iParentChild = 1 Then
              '
                  lstLinks.AddItem objLink
              '
              ElseIf iParentChild = 2 Then
              '
                  'lstInnerLinks.AddItem objLink
                 
                  Set objNode = trvLinks.Nodes.Add(iParentNo, tvwChild)
                  objNode.Text = objLink
                  'objNode.Image = "leaf"
              '
              End If
          '
          Next
          MousePointer = vbNormal
      '
      End Sub
    • If the user wishes to go further down with some of the links, then she/he can select the links and press Get Inner Links Button:
      Private Sub cmdGet_Click()
      '
          Dim iCount As Integer
          'Dim objNode As New Node
          If lstLinks.SelCount = 0 Then
          '
              MsgBox "Please Select a Link"
              Exit Sub
          Else
          '
              'objNode.Text = lstLinks.Text
              'For iCount = 0 To lstLinks.ListCount - 1
              iCount = 0
              While iCount <= lstLinks.ListCount - 1
              
                  If lstLinks.Selected(iCount) Then
                      
                      trvLinks.Nodes.Add , , , lstLinks.List(iCount)
                      getLinks lstLinks.List(iCount), 2, trvLinks.Nodes.Count
                      lstLinks.RemoveItem (iCount)
                      lstLinks.Refresh
                  Else
                      iCount = iCount + 1
                  End If
                  
               Wend
              'Next
               
          '
          End If
      '
      End Sub
    • All the inner links will get populated inside the Treeview. Now if the user further wishes to drilldown, he can double click on those URLs in the treeview:
      Private Sub trvLinks_DblClick()
      '
          getLinks trvLinks.SelectedItem.Text, 2, trvLinks.SelectedItem.Index
      '
      End Sub
    • Then finally the screen would look something like this:

      Click to enlarge image

History

  • 12th August, 2002: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Gopi Subramanian

Web Developer

India India

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralMy vote of 5 Pinmembermanoj kumar choubey0:07 28 Mar '12  
GeneralMy vote of 5 Pinmemberviral.sharma18:32 13 Feb '11  
QuestionGopi - Are you interested? Search Engine Expert: Building Web 3.0 Social Networking People Search Engine, Do You Want To Be The King Of The Internet? Pinmemberrapiddata4:12 30 Dec '06  
GeneralcreateDocumentFromUrl Pinmemberfra_mimi22:04 10 Sep '06  
GeneralThe double-click code is not working Pinmemberlegoman5517:08 1 Apr '06  
QuestionIs there a part I for this article. Pinmemberlegoman5520:07 27 Mar '06  
GeneralProgrammaticaly open the links PinsussAman Bhandari10:05 28 Feb '05  
GeneralRe: Programmaticaly open the links PinmemberMamta Suri5:03 15 Jul '05  
GeneralcreateDocumentFromUrl PinmemberJack Clift21:10 27 Feb '05  
GeneralRe: createDocumentFromUrl PinsussAnonymous21:58 23 Jun '05  
GeneralRe: createDocumentFromUrl Pinmembertienpv15:40 6 Nov '05  
Generalusing activex dll in asp page PinsussR Keller19:26 24 Jul '04  
GeneralMore features Pinsusscadessi4:00 23 Jan '04  
QuestionHow to get the Plain text from the link Pinmemberdsdon1014:15 23 Oct '03  
GeneralRuntime Error! Pinmembervtk18:39 26 May '03  
GeneralRe: Runtime Error! Pinmembersatyaprakashrathore21:39 24 Jun '08  
GeneralVB.net PinsussAnonymous4:05 5 Nov '02  
GeneralRe: VB.net PinsussAdi Scale Arumugam20:19 13 Oct '04  
GeneralRe: VB.net Pinmemberitsmuthu23:21 29 Oct '09  
GeneralWeb components instead of PinsussAnonymous7:34 13 Aug '02  
GeneralRe: Web components instead of PinmemberGopi Subramanian19:21 15 Aug '02  
GeneralGetting errors using VB 6 PinmemberAmmar23:50 12 Aug '02  
GeneralRe: Getting errors using VB 6 PinmemberGopi Subramanian0:07 13 Aug '02  
GeneralRe: Getting errors using VB 6 PinmemberAmmar0:13 13 Aug '02  
GeneralRe: Getting errors using VB 6 PinmemberGopi Subramanian0:17 13 Aug '02  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web03 | 2.5.120517.1 | Last Updated 13 Aug 2002
Article Copyright 2002 by Gopi Subramanian
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid