Click here to Skip to main content
15,867,568 members
Articles / Desktop Programming / MFC
Article

Retrieving the HTML source code

Rate me:
Please Sign up or sign in to vote.
4.19/5 (18 votes)
30 Nov 2004 110.9K   21   25
An article on how to retrieve the full source code of a web page.

Introduction

An app I was writing needed to store the full HTML of a web page. I looked all over the web and the MSDN library on how to get the complete HTML from a CHtmlView. I found out how to get the <BODY></BODY> data, but not how to get the <HTML></HTML> data. After lots of stumbling, I hit on the following very simple technique.

Examples of getting the outer HTML of the <BODY> tag abound. While exploring the IHTMLDocument2 interface, I noticed the get_ParentElement method. I realized that the parent of <BODY> is <HTML>.

This function took care of my problem:

bool CMyHtmlView::GetDocumentHTML(CString &str)
{
    IHTMLDocument2 *lpHtmlDocument = NULL;
    LPDISPATCH lpDispatch = NULL;

    lpDispatch = GetHtmlDocument();
    if(!lpDispatch)
        return false;

    lpDispatch->QueryInterface(IID_IHTMLDocument2, (void**)&lpHtmlDocument);
    ASSERT(lpHtmlDocument);
    lpDispatch->Release();

    IHTMLElement *lpBodyElm;
    IHTMLElement *lpParentElm;

    lpHtmlDocument->get_body(&lpBodyElm);
    ASSERT(lpBodyElm);
    lpHtmlDocument->Release();
    // get_body returns all between <BODY> and </BODY>. 
    // I need all between <HTML> and </HTML>.

    // the parent of BODY is HTML
    lpBodyElm->get_parentElement(&lpParentElm);
    ASSERT(lpParentElm);
    BSTR    bstr;
    lpParentElm->get_outerHTML(&bstr);
    str = bstr;

    lpParentElm->Release();
    lpBodyElm->Release();

    return true;
}

Points of Interest

There is bound to be a better way of doing this. If you know it, please share it with me.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralMy vote of 1 Pin
sunnic1234567884-Nov-13 15:47
sunnic1234567884-Nov-13 15:47 
GeneralAnother way Pin
sohacc7-Feb-13 3:00
sohacc7-Feb-13 3:00 
QuestionDisplay content after body Pin
uttamnadiyapara4-Sep-12 20:48
uttamnadiyapara4-Sep-12 20:48 
AnswerRe: Display content after body Pin
Geno Carman5-Sep-12 18:11
Geno Carman5-Sep-12 18:11 
GeneralRe: Display content after body Pin
uttamnadiyapara5-Sep-12 22:33
uttamnadiyapara5-Sep-12 22:33 
Questionerror C2065: 'GetHtmlDocument' : undeclared identifier Pin
uttamnadiyapara4-Sep-12 3:12
uttamnadiyapara4-Sep-12 3:12 
AnswerRe: error C2065: 'GetHtmlDocument' : undeclared identifier Pin
Geno Carman4-Sep-12 13:05
Geno Carman4-Sep-12 13:05 
AnswerExcellent solution Pin
mrbll5-Jun-12 8:31
mrbll5-Jun-12 8:31 
GeneralThanks you! Pin
Stoney Tian26-Nov-10 21:56
Stoney Tian26-Nov-10 21:56 
GeneralYou saved my day too ! Pin
johan.julien24-Feb-09 21:57
johan.julien24-Feb-09 21:57 
GeneralMy Way of Doing It in C# Pin
Sniper16718-Jul-07 19:28
Sniper16718-Jul-07 19:28 
GeneralAnother way Pin
vinhie473-Jul-07 16:15
vinhie473-Jul-07 16:15 
Questionhow can i get a http-header Pin
emmi6-Feb-07 23:51
emmi6-Feb-07 23:51 
QuestionExplaination of the code Pin
Glen_CodeProj28-Mar-06 18:40
Glen_CodeProj28-Mar-06 18:40 
GeneralRe: Explaination of the code Pin
RancidCrabtree28-Mar-06 19:36
RancidCrabtree28-Mar-06 19:36 
QuestionRe: Explaination of the code Pin
Glen_CodeProj29-Mar-06 17:27
Glen_CodeProj29-Mar-06 17:27 
GeneralThanks you saved my day Pin
Damir Valiulin23-Sep-05 5:43
Damir Valiulin23-Sep-05 5:43 
GeneralMy method Pin
HughJampton10-May-05 2:24
HughJampton10-May-05 2:24 
GeneralRe: My method Pin
RancidCrabtree12-May-05 12:29
RancidCrabtree12-May-05 12:29 
GeneralRe: My method Pin
Sam NG17-Apr-06 23:29
Sam NG17-Apr-06 23:29 
GeneralOne-line way of doing it Pin
jocool13-Dec-04 3:37
jocool13-Dec-04 3:37 
GeneralRe: One-line way of doing it Pin
RancidCrabtree13-Dec-04 17:47
RancidCrabtree13-Dec-04 17:47 
GeneralThe way I do it Pin
Uwe Keim30-Nov-04 19:51
sitebuilderUwe Keim30-Nov-04 19:51 
GeneralRe: The way I do it Pin
RancidCrabtree1-Dec-04 7:38
RancidCrabtree1-Dec-04 7:38 
GeneralRe: The way I do it Pin
Uwe Keim1-Dec-04 17:03
sitebuilderUwe Keim1-Dec-04 17:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.