While browsing a message board a few days back, I noticed that someone wanted a utility to download only selected files for offline browsing. Seemed to me like an useful tool to have, so I wrote one.
While IE does let you download files for offline browsing, you have no control over which files are downloaded, you may just want a few, not all of the pages. This utility will give you that control.
To run this application, you will require IE 4.01 or above, as it is based on the
WebBroswer2 control provided by IE. What the app does is pretty simple. You can specify a list of file specifications (*.html, r*.htm, etc...) and then choose a start page to start downloading from. Also you can specify the number of levels you want to download. The default is 0, which is rather useless. So for e.g., if you ask for 2 levels - all links which match the specifications you have entered will be followed down to 2 levels.
The application by itself is based on a
CHTMLView which creates an IE
WebBrowserApp control and runs it. All I really have done is create a class
CHTTPLineHolder which takes a
LPDISPATCH (interface to a
WebBrowserApp) and uses it to download pages into your local system cache.
So, the basic logic followed is as follows:
NavigateToPage - ( URL to navigate to )
Store Page Into Cache
For Each Link which satisfies File Spec given
NavigateToPage ( Current Link)
This by itself will download the file - but to store it, a couple of calls to the WinInet library have to be made. These calls get a cache file name and store the page into the file. As this file name is got through WinInet, any application (including IE) which uses WinInet can use this cache.
Also, the whole operation is done on a separate thread, so as to allow painting by the main view. So as each page is got, you get to see it in the main view window.
The methods which do the work are:
CHTTPLinkHolder::ReadPage(const CString& szUrl)
CHTTPLinkHolder::ReadPageIntoCache(const CString& szURL)
To use the application, first choose View | File Extensions.. and add or delete items from the list until you have only the specs you need. You can use wildcards here so, "*.htm", "a*b?c.htm" etc. are valid specs. Next, choose View | Start... You will be asked for a starting page and the number of levels to download, enter this info and click OK. And once this process is over, go to IE, make sure your Offline Browsing option is set and choose the same start page you entered in the application. And you should be able to view the files you downloaded. Exceptions of course are dynamic content files which may not be downloaded as expected.
So - Happy Offline Browsing.