Click here to Skip to main content
15,885,309 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
I am using libxml2 in my VS2010 project to generate tree from HTML, find some nodes, modify it and dump tree back to HTML.
The main logic is:
C++
// create parser
htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, XML_CHAR_ENCODING_UTF8);
// set parser options
htmlCtxtUseOptions(parser, HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
// parse HTML from pData
htmlParseChunk(parser, pData, dataLen, 0);
// get root node for generated tree
xmlNode* node = xmlDocGetRootElement(parser->myDoc);
// make some changes in tree, e.g. change content for node with name 'title'
// ...
// return back to HTML, result located in 'newHtml'
htmlDocDumpMemory(parser->myDoc, &newHtml, &len);


When I use HTML from http://www.youtube.com/watch?v=S77UrnEGs_g[^] as input data I get one excess in output.

I have checked above URL on http://validator.w3.org/[^] and get error:
Line 562, Column 31: Unclosed element div.
HTML
<div class="content">

My question is: could I configure libxml2 so it would not automatically close unclosed tags?
Posted

1 solution

I found some non-validating parser: http://htmlcxx.sourceforge.net/[^], it solve the problem for me.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900