Click here to Skip to main content
15,897,371 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello all,

I have a string, it is a page source from a website.
I need a regular expression to get out some news items from the page source.
The website didn't have RSS, so I'm having to do it this way.

I think it'll be something like this:
VB
"(?<=(<div id=""newsItem"">)).*?(?=(</div>))"

But I'm very knew to regular expressions, I've always steered away from them until now.

Can anyone help with this issue please?

Any replies are greatly appreciated,
Tom.
Posted

Get a copy of Expresso[^] and start writing and testing expressions. There's no better time to learn than when you need it!
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 11-Jan-13 22:03pm    
Right advice. And last sentence is just wise. My 5.
—SA
Hi,

Your expression is correct. Use a Match to get the news from the HTML tags. First, add this at the top of your code file:
VB
Imports System.Text.RegularExpressions

Then, use this code to get the news from the HTML tags:
VB
Dim newsAndHtmlTags As String = "<p><div id=""newsItem"">This is news!</div></p>"
Dim pattern As String = "(?<=(<div id=""newsItem"">)).*?(?=(</div>))"
Dim match As System.Text.RegularExpressions.Match = Regex.Match(newsAndHtmlTags, pattern)
Dim news As String = match.Value

Hope this helps.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900