Click here to Skip to main content
14,978,898 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hello all,
I have been struggling with a regular expression for a long long time now. I have a very large text file (that has HTML code in it) and i need to pull out pages from it based on the following:


I need EVERYTHING between 2 points:
Start:
TEXT (up to 5 single spaced words) + SPACE + $ (literal dollar sign) + positive number

through the page break, represented in HTML:

So,

Regex Match Starts here:
Shoes and Shorts $ 2324 324

Middle:
(ALL CHARACTERS IN BETWEEN)

End:
""...-break-before:always'>



Thank you in advance. I'll trade a month's good karma for this (-:

Cheers,

Suraci


=====

Edit: There should be 3 matches (within a 10-100 page document)
Posted
Updated 6-Oct-10 18:00pm
v2

1 solution

I think this is the regex which satisfies your requirement.

(\w+\s{1}){1,5}\$[0-9 ]+
which is having 1 Match with 'Shoes and Shorts $ 2324 324 '

For more information regarding regex please go through below link.
http://www.codeproject.com/KB/dotnet/regextutorial.asp[^]

Please vote and Accept Answer if it Helped.
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900