Click here to Skip to main content
Licence 
First Posted 30 Nov 2004
Views 58,080
Bookmarked 18 times

Retrieving the HTML source code

By | 30 Nov 2004 | Article
An article on how to retrieve the full source code of a web page.

Introduction

An app I was writing needed to store the full HTML of a web page. I looked all over the web and the MSDN library on how to get the complete HTML from a CHtmlView. I found out how to get the <BODY></BODY> data, but not how to get the <HTML></HTML> data. After lots of stumbling, I hit on the following very simple technique.

Examples of getting the outer HTML of the <BODY> tag abound. While exploring the IHTMLDocument2 interface, I noticed the get_ParentElement method. I realized that the parent of <BODY> is <HTML>.

This function took care of my problem:

bool CMyHtmlView::GetDocumentHTML(CString &str)
{
    IHTMLDocument2 *lpHtmlDocument = NULL;
    LPDISPATCH lpDispatch = NULL;

    lpDispatch = GetHtmlDocument();
    if(!lpDispatch)
        return false;

    lpDispatch->QueryInterface(IID_IHTMLDocument2, (void**)&lpHtmlDocument);
    ASSERT(lpHtmlDocument);
    lpDispatch->Release();

    IHTMLElement *lpBodyElm;
    IHTMLElement *lpParentElm;

    lpHtmlDocument->get_body(&lpBodyElm);
    ASSERT(lpBodyElm);
    lpHtmlDocument->Release();
    // get_body returns all between <BODY> and </BODY>. 
    // I need all between <HTML> and </HTML>.

    // the parent of BODY is HTML
    lpBodyElm->get_parentElement(&lpParentElm);
    ASSERT(lpParentElm);
    BSTR    bstr;
    lpParentElm->get_outerHTML(&bstr);
    str = bstr;

    lpParentElm->Release();
    lpBodyElm->Release();

    return true;
}

Points of Interest

There is bound to be a better way of doing this. If you know it, please share it with me.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Geno Carman



United States United States

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralThanks you! PinmemberStoney Tian21:56 26 Nov '10  
GeneralYou saved my day too ! Pinmemberjohan.julien21:57 24 Feb '09  
GeneralMy Way of Doing It in C# PinmemberSniper16719:28 18 Jul '07  
GeneralAnother way Pinmembervinhie4716:15 3 Jul '07  
Questionhow can i get a http-header Pinmemberemmi23:51 6 Feb '07  
QuestionExplaination of the code PinmemberGlen_CodeProj18:40 28 Mar '06  
hi,
pl. send us the complete explaination of your code ro get the html content.
 
rdgs ,
glen_codeproject
 
(glenvdsilva@gmail.com)
-- modified at 23:18 Wednesday 29th March, 2006
GeneralRe: Explaination of the code PinmemberRancidCrabtree19:36 28 Mar '06  
QuestionRe: Explaination of the code PinmemberGlen_CodeProj17:27 29 Mar '06  
GeneralThanks you saved my day PinmemberDamir Valiulin5:43 23 Sep '05  
GeneralMy method PinmemberHughJampton2:24 10 May '05  
GeneralRe: My method PinmemberRancidCrabtree12:29 12 May '05  
GeneralRe: My method PinmemberSam NG23:29 17 Apr '06  
GeneralOne-line way of doing it Pinsussjocool3:37 13 Dec '04  
GeneralRe: One-line way of doing it PinmemberRancidCrabtree17:47 13 Dec '04  
GeneralThe way I do it PinsitebuilderUwe Keim19:51 30 Nov '04  
GeneralRe: The way I do it PinmemberRancidCrabtree7:38 1 Dec '04  
GeneralRe: The way I do it PinsitebuilderUwe Keim17:03 1 Dec '04  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web01 | 2.5.120529.1 | Last Updated 1 Dec 2004
Article Copyright 2004 by Geno Carman
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid