Click here to Skip to main content
15,890,897 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi;

Does anyone know of any decent source code examples (in either VB or C#), to reformat HTML source and output a "cleaned" version of the HTML, that is correctly indented and line breaked at the relevant tags?

I have found plently of websites that will do same, however I need to be able to integrate this functionality in my application.

I have tried implementing this myself, yet can not get it to work correctly - to the point that my head is ready to explode!

Any pointers would be greatly appreciated.

Thanks in advance

Kind Regards

Dave
Posted
Comments
Yusuf 23-Feb-11 9:12am    
I moved your answer to SA's comment. Don't put the Answer button if it is not a solution. Utilize the 'Add Comment' instead.

1 solution

You could split a problem in two big cases: if original file a well-formed XML or not. It it is (if does not even matter if it is compliant with XHTML schema or not), the problem is solved very easy: use the classes System.Xml.XmlTextReader/System.Xml.XmlTextWriter or System.Xml.XmlDocument. If it is not, this is a boring manual task without 100% guarantee of the result. Search for something like this: http://en.lmgtfy.com/?q=html+tidy[^].

—SA
 
Share this answer
 
Comments
Yusuf 23-Feb-11 9:11am    
[MOVED FROM OP ANSWER]

Hi SaKryukov;

The app im developing is a WYSIWYG XML Comments editor for VS. The app creates an XMLDocument then converts each relevant element value e.g.
etc into HTML and displays the lot into a WYSIWYG editor.

At the end of the day, I need to re-parse the generated Html, and convert back to a neat XML value for updating of the relevant element.

I have looked at tidy and have had a stab at the 100% boring method. As you correctly state, sometimes I get the right result, yet there are sooooooo many catches, that what starts as a relatively simple code block, ends up as spaghetti code.

I appreciate you taking the time to respond, and if anyone else has any ideas it would be greatly appreciated.

Kind Regards

Dave
Sergey Alexandrovich Kryukov 23-Feb-11 15:11pm    
Entering a muddy water of non-well-formed text is not fun and is best avoided.
By the way, my respect: your work can be very interesting and useful. Any chance to see your results when you're ready?

Now, how about a different approach: a VS plug-in to force well-formed input while typing. Just showing errors would be fine, as in case of VS XML/XHTM editing, see. I would say, it would be nearly perfect result if you manage to fail project compilation if the user enters invalid XML comments and to show errors. Is it possible?

--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900