Click here to Skip to main content
14,876,304 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi everyone,
I'm currently working on a 'windows forms' project that requires me to get some text from a website and display it within the program.

I have my public sub below, which basically reads the source code of the site in question, converts it to a string from a byte, and displays it in a multi-line textbox on the form.
VB
Public Sub LoadSiteContent(ByVal url As String)
  Dim client As New WebClient
  Dim html As Byte() = client.DownloadData(url)
  Dim webString As String = System.Text.Encoding.UTF8.GetString(html)
  TextBox1.Text = webString
End Sub

This sub gets all the source code, whereas I only want a specific paragraph on the site, so is there a way to scale down the string I converted to just that paragraph within the page source? Maybe through using regular expressions or substrings?

I also have this import at the top of my class:
VB
Imports System.Net

Any response is greatly appreciated, thanks.
Posted
Comments
David Goebet 5-Dec-12 9:20am
   
so you want to try to read some special text in the "html-code" right ?
so you could search your string for a specific phrase

something like "<span> News News News </span>"
there you can search "<span>" ... read till "</span>"

This is probably a bit more that you think you wanted, but...
The process is called "Web scraping", and there is a nice article about it here: Web Scraping in ASP.NET with Regular Expression Matching and XML Transformation[^] - it's in C#, but the code is easily translatable, and the description is very clear.
   
If there is a fixed text before and after that paragraph on the webpage (ex- some tag with id='xyz') you can find it in the string returned and then get the required paragraph from that. I have done this in one of my application hope it will be helpful for u also. please mark as answer if helped.
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900