Click here to Skip to main content
13,043,654 members (52,996 online)
Rate this:
Please Sign up or sign in to vote.
See more:
I am using libxml2 in my VS2010 project to generate tree from HTML, find some nodes, modify it and dump tree back to HTML.
The main logic is:
// create parser
htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, XML_CHAR_ENCODING_UTF8);
// set parser options
// parse HTML from pData
htmlParseChunk(parser, pData, dataLen, 0);
// get root node for generated tree
xmlNode* node = xmlDocGetRootElement(parser->myDoc);
// make some changes in tree, e.g. change content for node with name 'title'
// ...
// return back to HTML, result located in 'newHtml'
htmlDocDumpMemory(parser->myDoc, &newHtml, &len);

When I use HTML from[^] as input data I get one excess in output.

I have checked above URL on[^] and get error:
Line 562, Column 31: Unclosed element div.
<div class="content">

My question is: could I configure libxml2 so it would not automatically close unclosed tags?
Posted 1-Apr-13 19:49pm

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

I found some non-validating parser:[^], it solve the problem for me.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web01 | 2.8.170713.1 | Last Updated 3 Oct 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100