Click here to Skip to main content
12,503,829 members (31,306 online)
Click here to Skip to main content
Add your own
alternative version


26 bookmarked

Small Web Agents using VB - Part II

, 12 Aug 2002 CPOL
Rate this:
Please Sign up or sign in to vote.
Small Web Agents using VB - Part II


In the previous article, we saw a simple VB application which will pull out the HTML page of a particular URL. In this article, we will build a small web crawler which will crawl through all the links in the given URL.

  1. Setting up the Visual Basic Environment with required Components and Libraries:
    • Open Visual Basic and create a new project (user Standard EXE).
    • Select Project -> References from the main menu and add the following Microsoft Libraries:
      • Microsoft HTML Object Library
    • Add Microsoft Windows Common Controls to the toolbox as follows. Select Project -> Components from the main menu. The Components window will open. With the controls tab selected, scroll down and click the check box preceding the components:
      • Microsoft Windows Common Control 6.x
  2. Set up the UI for the Crawler
    • Add a label, two button controls, a listbox, and a treeview control as below:

      Click to enlarge image

  3. Add the code for the Crawler:
    • On click of the start button, populate the list box with all the links under the given URL:
      Private Sub cmdStart_Click()
      	'1 will populate lstlinks with all the parent links 
               'in the requested URL
      	getLinks txtURL.Text, 1
      End Sub
    • The getlinks function based on the second parameter populates either the listbox or the treeview. Here since the parameter is 1, it populates the listbox with all the links under the URL:
    • Private Sub getLinks(strURL As String, iParentChild As Integer, _
      	Optional iParentNo As Integer)
          Dim objLink As HTMLLinkElement
          Dim objMSHTML As New MSHTML.HTMLDocument
          Dim objDoc As New MSHTML.HTMLDocument
          Dim objNode As Node
          Set objDoc = objMSHTML.createDocumentFromUrl(txtURL.Text, vbNullString)
          MousePointer = vbHourglass
          While objDoc.readyState <> "complete"
          'get all Links
          For Each objLink In objDoc.links
              If iParentChild = 1 Then
                  lstLinks.AddItem objLink
              ElseIf iParentChild = 2 Then
                  'lstInnerLinks.AddItem objLink
                  Set objNode = trvLinks.Nodes.Add(iParentNo, tvwChild)
                  objNode.Text = objLink
                  'objNode.Image = "leaf"
              End If
          MousePointer = vbNormal
      End Sub
    • If the user wishes to go further down with some of the links, then she/he can select the links and press Get Inner Links Button:
      Private Sub cmdGet_Click()
          Dim iCount As Integer
          'Dim objNode As New Node
          If lstLinks.SelCount = 0 Then
              MsgBox "Please Select a Link"
              Exit Sub
              'objNode.Text = lstLinks.Text
              'For iCount = 0 To lstLinks.ListCount - 1
              iCount = 0
              While iCount <= lstLinks.ListCount - 1
                  If lstLinks.Selected(iCount) Then
                      trvLinks.Nodes.Add , , , lstLinks.List(iCount)
                      getLinks lstLinks.List(iCount), 2, trvLinks.Nodes.Count
                      lstLinks.RemoveItem (iCount)
                      iCount = iCount + 1
                  End If
          End If
      End Sub
    • All the inner links will get populated inside the Treeview. Now if the user further wishes to drilldown, he can double click on those URLs in the treeview:
      Private Sub trvLinks_DblClick()
          getLinks trvLinks.SelectedItem.Text, 2, trvLinks.SelectedItem.Index
      End Sub
    • Then finally the screen would look something like this:

      Click to enlarge image


  • 12th August, 2002: Initial post


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Gopi Subramanian
Web Developer
India India
No Biography provided

You may also be interested in...

Comments and Discussions

QuestionDecided to learn VB Pin
Member 1127369529-Nov-14 8:30
memberMember 1127369529-Nov-14 8:30 
GeneralMy vote of 5 Pin
manoj kumar choubey28-Mar-12 0:07
membermanoj kumar choubey28-Mar-12 0:07 
GeneralMy vote of 5 Pin
viral.sharma13-Feb-11 18:32
memberviral.sharma13-Feb-11 18:32 
QuestionGopi - Are you interested? Search Engine Expert: Building Web 3.0 Social Networking People Search Engine, Do You Want To Be The King Of The Internet? Pin
rapiddata30-Dec-06 4:12
memberrapiddata30-Dec-06 4:12

Search Engine Expert: Building Web 3.0 Social Networking People Search Engine, Do You Want To Be The King Of The Internet?

Company Description: The Superior MySpace Alternative ..... is combining proprietary Natural Language Extraction, Artificial Intelligence Algorithms and Information Integration logic to build a Social Networking Search Engine.
Using Natural Language Extraction tools, our programs are able to read English sentences and understand what they mean. then extracts relevant pieces of information about people, such as the companies they work for and their job titles or a social networking page like a person's page on MySpace.
Artificial Intelligence Algorithms allow our computers to analyze a Web site and extract information based on an understanding of how the Web site is constructed. can deduce that a specific paragraph describes a company, or a social networking page like a person's page on MySpace.

Position Purpose:
This person will work with the Search Technology Team to develop the core search engine and web crawlers. This individual will be the search engineer on the design and implementation of a large scale crawling, processing and serving system. Tasks include implementing search algorithms, data mining, improving relevancy or search results, managing terabytes of data and scaling algorithms to work on very large data sets, and serving search results using a large network of Windows 2003 Servers.

This position is an integral part of's core technology team involving the design, development and implementation of's search engine: the crawling, indexing and ranking of billions of documents on the Internet. As such, this person will be expected to make a significant contribution to this effort by designing innovative technical solutions to this significant challenge.
*Must have experience in building a search engine crawler and indexer
*Configuring crawlers and indexing content
*Must have the desire and commitment to build a leading-edge search technology
*Must have extensive programming experience in Microsoft C#.NET and SQL Server.
*Must have experience with search engine relevance and information retrieval techniques
*Must have a minimum of 5 years experience in software development in either an academic or corporate environment
*Must be able to communicate and work with both technical and non-technical people

This position is open to telecommuting, consulting or full time work.
Send your resume to, attention Leo Loiacono.

GeneralcreateDocumentFromUrl Pin
fra_mimi10-Sep-06 22:04
memberfra_mimi10-Sep-06 22:04 
GeneralThe double-click code is not working Pin
legoman551-Apr-06 17:08
memberlegoman551-Apr-06 17:08 
QuestionIs there a part I for this article. Pin
legoman5527-Mar-06 20:07
memberlegoman5527-Mar-06 20:07 
GeneralProgrammaticaly open the links Pin
Aman Bhandari28-Feb-05 10:05
sussAman Bhandari28-Feb-05 10:05 
GeneralRe: Programmaticaly open the links Pin
Mamta Suri15-Jul-05 5:03
memberMamta Suri15-Jul-05 5:03 
GeneralcreateDocumentFromUrl Pin
Jack Clift27-Feb-05 21:10
memberJack Clift27-Feb-05 21:10 
GeneralRe: createDocumentFromUrl Pin
Anonymous23-Jun-05 21:58
sussAnonymous23-Jun-05 21:58 
GeneralRe: createDocumentFromUrl Pin
tienpv6-Nov-05 15:40
membertienpv6-Nov-05 15:40 
Generalusing activex dll in asp page Pin
R Keller24-Jul-04 19:26
sussR Keller24-Jul-04 19:26 
GeneralMore features Pin
cadessi23-Jan-04 4:00
susscadessi23-Jan-04 4:00 
QuestionHow to get the Plain text from the link Pin
dsdon1023-Oct-03 14:15
memberdsdon1023-Oct-03 14:15 
GeneralRuntime Error! Pin
vtk26-May-03 18:39
membervtk26-May-03 18:39 
GeneralRe: Runtime Error! Pin
satyaprakashrathore24-Jun-08 21:39
membersatyaprakashrathore24-Jun-08 21:39 Pin
Anonymous5-Nov-02 4:05
sussAnonymous5-Nov-02 4:05 
GeneralRe: Pin
Adi Scale Arumugam13-Oct-04 20:19
sussAdi Scale Arumugam13-Oct-04 20:19 
GeneralRe: Pin
itsmuthu29-Oct-09 23:21
memberitsmuthu29-Oct-09 23:21 
GeneralWeb components instead of Pin
Anonymous13-Aug-02 7:34
sussAnonymous13-Aug-02 7:34 
GeneralRe: Web components instead of Pin
Gopi Subramanian15-Aug-02 19:21
memberGopi Subramanian15-Aug-02 19:21 
GeneralGetting errors using VB 6 Pin
Ammar12-Aug-02 23:50
memberAmmar12-Aug-02 23:50 
GeneralRe: Getting errors using VB 6 Pin
Gopi Subramanian13-Aug-02 0:07
memberGopi Subramanian13-Aug-02 0:07 
GeneralRe: Getting errors using VB 6 Pin
Ammar13-Aug-02 0:13
memberAmmar13-Aug-02 0:13 
GeneralRe: Getting errors using VB 6 Pin
Gopi Subramanian13-Aug-02 0:17
memberGopi Subramanian13-Aug-02 0:17 
GeneralRe: Getting errors using VB 6 Pin
Ammar13-Aug-02 1:29
memberAmmar13-Aug-02 1:29 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.160919.1 | Last Updated 13 Aug 2002
Article Copyright 2002 by Gopi Subramanian
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid