Click here to Skip to main content
15,307,242 members
Articles / Desktop Programming / MFC
Posted 4 Oct 2013


31 bookmarked

HTML Parser C++ (Demo Project)

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
4 Oct 2013CPOL1 min read
This is a sample project for "HTML Reader C++ Class Library"


This is a sample project developed using this tiny HTML Parser library. Its main purpose is to show the use of that library. However I have added some additional features to the lib. The project has support for UNICODE builds. The code warps the HTML tags to a tree model, exposing a function to retrieve a specific HTML element.  

An HTML element is an individual component of an HTML document or "web page", once this has been parsed into the Document Object Model.

In the HTML syntax, most elements are written with a start tag and an end tag, with the content in between. An HTML tag is composed of the name of the element, surrounded by angle brackets. An end tag also has a slash after the opening angle bracket, to distinguish it from the start tag.

<p>In the HTML syntax, most elements are written ...</p>

Between the starting/ending tags, any number of other tags may exist. This project offers a way to search for a specific tag, and also specify an attribute with a value for that tag. Then extract the content of that element. It's a cheap alternative to Microsoft's MSHTML parser (full of leaks).

Image 1

Using the Code

Add to your project the files form AClass directory.

Include some headers you may need like:

#include "AClass/LiteHTMLReader.h"  
#include "AClass/HtmlElementCollection.h"

Instantiate the reader which will parse the HTML string.

CLiteHTMLReader theReader;
CHtmlElementCollection theElementCollectionHandler;

If you want to get a specific set of tags with a specific attrib use:

theElementCollectionHandler.InitWantedTag(_T("style"), _T("id"),_T("sss"));

Call the parser function. At the end, the theElementCollectionHandler will be filled with the parsed structure.


Now start retrieving the elements' text to a CString var.

for (int i=0;i<theElementCollectionHandler.GetNumElementsFiltered();i++){
    theElementCollectionHandler.GetOuterHtml(i, szTxt, 1);


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Software Developer
Romania Romania
No Biography provided

Comments and Discussions

QuestionHow to delete a html tag from all the HtmlTree-tree tags? Pin
jpkfox4-Aug-16 1:23
Memberjpkfox4-Aug-16 1:23 
GeneralMy vote of 5 Pin
nvect29-Jan-14 19:35
Membernvect29-Jan-14 19:35 
QuestionIt's really great Pin
nvect29-Jan-14 19:26
Membernvect29-Jan-14 19:26 
AnswerRe: It's really great Pin
dchris_med29-Jan-14 20:23
Memberdchris_med29-Jan-14 20:23 
GeneralRe: It's really great Pin
nvect31-Jan-14 12:33
Membernvect31-Jan-14 12:33 
GeneralRe: It's really great Pin
dchris_med31-Jan-14 12:54
Memberdchris_med31-Jan-14 12:54 
GeneralRe: It's really great Pin
nvect1-Feb-14 1:20
Membernvect1-Feb-14 1:20 
QuestionI do not know the cause of the failure symptoms Pin
shint10-Nov-13 3:51
Membershint10-Nov-13 3:51 
AnswerRe: I do not know the cause of the failure symptoms Pin
dchris_med10-Nov-13 8:41
Memberdchris_med10-Nov-13 8:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.