Click here to Skip to main content
15,884,084 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
Hi,

I am writing program(VC++) where i need remove unwanted html tags from a input HTML file.

I think i can use regular expressions to match these tags and then delete.

I am using VC++(2010) to impliment. Please guide me with a sample program so that with which i can reach my need.


Thanks.
Posted
Updated 20-Aug-13 4:57am
v2
Comments
[no name] 20-Aug-13 10:49am    
We do not write code to order here. You need try this yourself and then ask a specific question if you have a problem.
Member 10220837 21-Aug-13 9:14am    
Thanks for your reply. I just wanted to have an idea on how i can start on this. If there is one illustration sample program that would be a great start.
Sergey Alexandrovich Kryukov 20-Aug-13 12:46pm    
Do you mean all tags? Or only some tags are "unwanted"? Do you mean to extract pure text from HTML, make it unformatted text?
—SA
Member 10220837 21-Aug-13 9:10am    
Not all tags, i need to remove only specific tags
Sergey Alexandrovich Kryukov 21-Aug-13 9:33am    
Than you probably need to find HTML parser; parse, filter out unwanted parts, generate HTML back.
Please see Solution 1.
—SA

1 solution

Quote:
I think i can use regular expressions to match these tags and then delete.
Indeed it is a viable solution.


You might Google for HTML parsing with C++[^].
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 21-Aug-13 9:34am    
Agree, a 5.
—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900