|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionFirst let me explain why I called the article "3rd Way". I've already seen such articles on CodeGuru, explaining how to load and parse HTML file from memory. You may ask, so why I'm writing another guide? Well, below I'll show advantages and disadvantages that I found in those ways. First one, which is also shown in MSDN , is to load HTML code using
When I realized this I went to look for another way that will give me
document immediately after submitting a code. And yes, I found it! You can look
at the great article by Asher
Kobin at CodeGuru. It uses a new interface called Thus I came to MSDN again and found another, third way to load and parse HTML. I was so happy, so I decided to write my first article to CodeProject about it, which you are reading now :) CodeFor those, advanced programmers, that don't want to read a whole article, I
will give a hint: loading HTML code is made by Now I'll explain how to do this from beginning. Headers and importsI'll assume here, that you have a standard MFC application (such as Dialog ,
SDI or MDI applications). First of all you have to initialize COM, since we
gonna use MSHTML COM interfaces. This can be done in
BOOL CYourApp::InitInstance()
{
CoInitialize(NULL);
...
}
int CYourApp::ExitInstance()
{
...
CoUninitialize();
return CWinApp::ExitInstance();
}
Now in the file you are going to use MSHTML interfaces,
include #include <comdef.h> #include <mshtml.h> #pragma warning(disable : 4146) //see Q231931 for explaintation #import <mshtml.tlb> no_auto_exclude Where do I get a document?Now let's get a pointer to
MSHTML::IHTMLDocument2Ptr pDoc;
HRESULT hr = CoCreateInstance(CLSID_HTMLDocument, NULL, CLSCTX_INPROC_SERVER,
IID_IHTMLDocument2, (void**)&pDoc);
Validate that you have a valid pointer (not
Converting your HTML codeI'll assume that you have all HTML code you want to load
in some variable called SAFEARRAY* psa = SafeArrayCreateVector(VT_VARIANT, 0, 1); VARIANT *param; bstr_t bsData = (LPCTSTR)lpszHTMLCode; hr = SafeArrayAccessData(psa, (LPVOID*)¶m); param->vt = VT_BSTR; param->bstrVal = (BSTR)bsData; Last jumpNow we are ready to pass our hr = pDoc->write(psa); //write your buffer hr = pDoc->close(); //and closes the document, "applying" your code //Don't forget to free the SAFEARRAY! SafeArrayDestroy(psa); Of course, remember to check every your step, so your program never crush, I skipped it to keep the code simple. Now, after all this work you have a pointer to IHTMLDocument2 interface, which gives you a lot of features, like getting particular tag, searching, inserting, replacing, deleting tag, just like you do it in JavaScript. And remember, if you are using smart pointers (like I do here) you don't need to call Release() function, the object will be freed automatically. "about:blank" bug workaround
Well, since we have no site "attached" to our document interface, all links (href, src) that are relative to document, will start with "about:blank" if you'll try to use Of course same way you should work with IMG, LINK and other tags. The example project updated with this fix also. You can download it and see how I did it. ReferencesAhser Kobin's article about parsing with IMarkupServices
(CodeGuru)
|
||||||||||||||||||||||