Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi...
I want to parse HTML tags and store tag's contents in database....but I don't know how do it!!!
can help me?
I want to fetch information for example news's date,news's name,news's title and news's content from news sites,how do this work?
thanks
Posted

1 solution

Please see:
http://stackoverflow.com/a/590789[^]
Entire HTML parsing is not possible with regular expressions, since it depends on matching the opening and the closing tag which is not possible with regexps.

Regular expressions can only match regular languages but HTML is a context-free language. The only thing you can do with regexps on HTML is heuristics but that will not work on every condition. It should be possible to present a HTML file that will be matched wrongly by any regular expression.

Use the Html Agility Pack[^] to parse HTML.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900