Click here to Skip to main content
12,828,476 members (45,635 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as


9 bookmarked
Posted 3 Oct 2013

Creating Link Extractor and Filter in C#: Part 1

, 3 Oct 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
How to extract all the links from a webpage using a web client.


In this article we will learn how to extract all the links from a webpage using a web client. At the end of this article you will be able to create an application that can extract links from pages and filter those links on the basis of parameters you want. So without wasting much time let’s dive directly into the code.

Creating the Link Grabber

So we are creating a link grabber. For that we need some logic and it’s always a good idea to clarify the logic before creating something. So let’s define the logic.

The logic is:

  • We need a link for the page to crawl. We can get that link from a TextBox.
  • Now we have the link. The next step will be to download the web page to crawl. We can either use a web client for it or a WebBrowser control.
  • Now we have the HTML document. The next step is to extract the links from that page.
  • As we know most of the useful links are contained in the href attribute of the anchor tags.
  • Now up to that point we know that we want to grab the anchor elements of the page. So we can do this using getElementsByTagName().
  • Now we have the collection of all anchor elements.
  • The next step is get the href attribute and add it to a list. Let this list be a check box list.
  • Now we have all the extracted links.

Before proceeding let’s code the preceding logic.

The Code

The following is the code for the grabber.

  1. Open Visual Studio and choose "New project".
  2. 662492/Clipboard05.jpg

  3. Now choose "Visual C#" -> Windows -> "Windows Forms application".
  4. 662492/Clipboard06.jpg

  5. Now drop a text box from the Toolbar onto the form.
  6. 662492/Clipboard01.jpg

  7. Now drop a button from the Toolbar onto the form and name it "grab".
  8. 662492/Clipboard02.jpg

  9. Now add one check list box from the Toolbar menu onto the form.
  10. 662492/Clipboard03.jpg

  11. Now double-click on the button to generate the click handler.
  12. Add the following code for the click handler:
  13. using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Windows.Forms;
    namespace linkGrabber
        public partial class Form1 : Form
            public Form1()
            private void button1_Click(object sender, EventArgs e)
                WebBrowser wb = new WebBrowser();
                wb.Url = new Uri(textBox1.Text);
                wb.DocumentCompleted += wb_DocumentCompleted;
            void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
                HtmlDocument source = ((WebBrowser)sender).Document;
            private void extractLink(HtmlDocument source)
                HtmlElementCollection anchorList = source.GetElementsByTagName("a");
                foreach (var item in anchorList)



That’s it; all done. Now you have successfully made a link grabber. You can further extend it by adding a filter to it. In my next part I will show how to add a filter and how to download files. Thanks for reading and don’t forget to comment and share.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Arpit Jain
India India
No Biography provided

You may also be interested in...

Comments and Discussions

Questionabsolutely great work. Pin
natarajbangalore15-Mar-15 21:04
membernatarajbangalore15-Mar-15 21:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170326.1 | Last Updated 3 Oct 2013
Article Copyright 2013 by Arpit Jain
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid