Click here to Skip to main content
15,895,746 members
Please Sign up or sign in to vote.
2.00/5 (1 vote)
See more:
A web page contains strings like this:
<a href='someurl.html'><b>need to capture this text</b></a> between '.html'><b>' and '</b></a>'

Please, help.
Posted
Updated 28-Sep-10 10:15am
v2
Comments
[no name] 28-Sep-10 16:40pm    
I tried: .html'>([\S\s]+), but it captures from first '.html'>' to the LAST ''
Toli Cuturicu 28-Sep-10 16:46pm    
Never, ever, lock your questions again!
The locked questions may get deleted!

1 solution

The trick is non-greedy evaluation: (.*?)

C#
class Program
{
    static void Main(string[] args)
    {
        string strInput = @"<a href='someurl.html'><b>need to capture this text</b></a>between '.html'><b>' and '</b></a>";
        var objMatch = Regex.Match(strInput, @"<a\s+href=[\s\S]*?><b>([\s\S]*?)</b></a>");
        if (objMatch.Success)
        {
            Console.WriteLine("Match: {0}", objMatch.Groups[1].Value);
        }
    }
}


Cheers

Andi
 
Share this answer
 
v3
Comments
[no name] 28-Sep-10 16:46pm    
Reason for my vote of 5
Automatic vote of 5 for accepting answer.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900