Click here to Skip to main content
Click here to Skip to main content

Retrieving the HTML source code

By , 30 Nov 2004
 

Introduction

An app I was writing needed to store the full HTML of a web page. I looked all over the web and the MSDN library on how to get the complete HTML from a CHtmlView. I found out how to get the <BODY></BODY> data, but not how to get the <HTML></HTML> data. After lots of stumbling, I hit on the following very simple technique.

Examples of getting the outer HTML of the <BODY> tag abound. While exploring the IHTMLDocument2 interface, I noticed the get_ParentElement method. I realized that the parent of <BODY> is <HTML>.

This function took care of my problem:

bool CMyHtmlView::GetDocumentHTML(CString &str)
{
    IHTMLDocument2 *lpHtmlDocument = NULL;
    LPDISPATCH lpDispatch = NULL;

    lpDispatch = GetHtmlDocument();
    if(!lpDispatch)
        return false;

    lpDispatch->QueryInterface(IID_IHTMLDocument2, (void**)&lpHtmlDocument);
    ASSERT(lpHtmlDocument);
    lpDispatch->Release();

    IHTMLElement *lpBodyElm;
    IHTMLElement *lpParentElm;

    lpHtmlDocument->get_body(&lpBodyElm);
    ASSERT(lpBodyElm);
    lpHtmlDocument->Release();
    // get_body returns all between <BODY> and </BODY>. 
    // I need all between <HTML> and </HTML>.

    // the parent of BODY is HTML
    lpBodyElm->get_parentElement(&lpParentElm);
    ASSERT(lpParentElm);
    BSTR    bstr;
    lpParentElm->get_outerHTML(&bstr);
    str = bstr;

    lpParentElm->Release();
    lpBodyElm->Release();

    return true;
}

Points of Interest

There is bound to be a better way of doing this. If you know it, please share it with me.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Geno Carman
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralAnother waymembersohacc7 Feb '13 - 3:00 
QuestionDisplay content after bodymemberuttamnadiyapara4 Sep '12 - 20:48 
AnswerRe: Display content after bodymemberGeno Carman5 Sep '12 - 18:11 
GeneralRe: Display content after bodymemberuttamnadiyapara5 Sep '12 - 22:33 
Questionerror C2065: 'GetHtmlDocument' : undeclared identifiermemberuttamnadiyapara4 Sep '12 - 3:12 
AnswerRe: error C2065: 'GetHtmlDocument' : undeclared identifiermemberGeno Carman4 Sep '12 - 13:05 
AnswerExcellent solutionmembermrbll5 Jun '12 - 8:31 
GeneralThanks you!memberStoney Tian26 Nov '10 - 21:56 
GeneralYou saved my day too !memberjohan.julien24 Feb '09 - 21:57 
GeneralMy Way of Doing It in C#memberSniper16718 Jul '07 - 19:28 
GeneralAnother waymembervinhie473 Jul '07 - 16:15 
Questionhow can i get a http-headermemberemmi6 Feb '07 - 23:51 
QuestionExplaination of the codememberGlen_CodeProj28 Mar '06 - 18:40 
GeneralRe: Explaination of the codememberRancidCrabtree28 Mar '06 - 19:36 
QuestionRe: Explaination of the codememberGlen_CodeProj29 Mar '06 - 17:27 
GeneralThanks you saved my daymemberDamir Valiulin23 Sep '05 - 5:43 
GeneralMy methodmemberHughJampton10 May '05 - 2:24 
GeneralRe: My methodmemberRancidCrabtree12 May '05 - 12:29 
GeneralRe: My methodmemberSam NG17 Apr '06 - 23:29 
GeneralOne-line way of doing itsussjocool13 Dec '04 - 3:37 
GeneralRe: One-line way of doing itmemberRancidCrabtree13 Dec '04 - 17:47 
GeneralThe way I do itsitebuilderUwe Keim30 Nov '04 - 19:51 
GeneralRe: The way I do itmemberRancidCrabtree1 Dec '04 - 7:38 
GeneralRe: The way I do itsitebuilderUwe Keim1 Dec '04 - 17:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 1 Dec 2004
Article Copyright 2004 by Geno Carman
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid