IntelliLink - An Alternative Windows Version to Online Link Managers






4.96/5 (17 votes)
A look at the URLDownloadToFile function and architecture of IntelliLink
Introduction
Have you ever owned a website? Did you do some sysadmin work for somebody else? Have you made link exchange? If so, you probably wish to monitor the internal/external links from/to your site.
Background
The easiest way to check the contents of a web page is to download it locally (to a file) and search through it. To accomplish this, we use URLDownloadToFile
function, which has the following syntax:
HRESULT URLDownloadToFile(
LPUNKNOWN pCaller,
LPCTSTR szURL,
LPCTSTR szFileName,
_Reserved_ DWORD dwReserved,
LPBINDSTATUSCALLBACK lpfnCB);
Parameters
pCaller
: A pointer to the controllingIUnknown
interface of the calling ActiveX component, if the caller is an ActiveX component. If the calling application is not an ActiveX component, this value can be set toNULL
. Otherwise, the caller is aCOM
object that is contained in another component, such as an ActiveX control in the context of an HTML page. This parameter represents the outermostIUnknown
of the calling component. The function attempts the download in the context of the ActiveX client framework, and allows the caller container to receive callbacks on the progress of the download.szURL
: A pointer to astring
value that contains the URL to download. Cannot be set toNULL
. If the URL is invalid,INET_E_DOWNLOAD_FAILURE
is returned.szFileName
: A pointer to astring
value containing the name or full path of the file to create for the download. IfszFileName
includes a path, the target directory must already exist.dwReserved
: Reserved. Must be set to0
.lpfnCB
: A pointer to theIBindStatusCallback
interface of the caller. By usingIBindStatusCallback::OnProgress
, a caller can receive download status.URLDownloadToFile
calls theIBindStatusCallback::OnProgress
andIBindStatusCallback::OnDataAvailable
methods as data is received. The download operation can be cancelled by returningE_ABORT
from any callback. This parameter can be set toNULL
if status is not required.
Return Value
This function can return one of these values:
S_OK
: The download started successfully.E_OUTOFMEMORY
: The buffer length is invalid, or there is insufficient memory to complete the operation.INET_E_DOWNLOAD_FAILURE
: The specified resource or callback interface was invalid.
So, our implementation using the above function will be:
BOOL ProcessHTML(CString strFileName, CString strSourceURL,
CString strTargetURL, CString strURLName)
{
CString strURL;
CString strFileLine;
CString strLineMark;
BOOL bRetVal = FALSE;
try
{
CStdioFile pInputFile(strFileName, CFile::modeRead | CFile::typeText);
while (pInputFile.ReadString(strFileLine))
{
int nIndex = strFileLine.Find(_T("href="), 0);
while (nIndex >= 0)
{
const int nFirst = strFileLine.Find(_T('\"'), nIndex);
if (nFirst >= 0)
{
const int nLast = strFileLine.Find(_T('\"'), nFirst + 1);
if (nLast >= 0)
{
strURL = strFileLine.Mid(nFirst + 1, nLast - nFirst - 1);
if (strURL.CompareNoCase(strTargetURL) == 0)
{
TRACE(_T("URL found - %s\n"), strTargetURL);
strLineMark.Format(_T(">%s<"), strURLName);
if (strFileLine.Find(strLineMark, nLast + 1) >= 0)
{
TRACE(_T("Name found - %s\n"), strURLName);
bRetVal = TRUE;
}
}
}
}
nIndex = (nFirst == -1) ? -1 : strFileLine.Find(_T("href="), nFirst + 1);
}
}
pInputFile.Close();
}
catch (CFileException* pFileException)
{
TCHAR lpszError[MAX_STR_LENGTH] = { 0 };
pFileException->GetErrorMessage(lpszError, MAX_STR_LENGTH);
pFileException->Delete();
OutputDebugString(lpszError);
bRetVal = FALSE;
}
VERIFY(DeleteFile(strFileName));
return bRetVal;
}
BOOL CLinkData::IsValidLink()
{
BOOL bRetVal = TRUE;
TCHAR lpszTempPath[MAX_STR_LENGTH] = { 0 };
TCHAR lpszTempFile[MAX_STR_LENGTH] = { 0 };
const DWORD dwTempPath = GetTempPath(MAX_STR_LENGTH, lpszTempPath);
lpszTempPath[dwTempPath] = '\0';
if (GetTempFileName(lpszTempPath, _T("html"), 0, lpszTempFile) != 0)
{
TRACE(_T("URLDownloadToFile(%s)...\n"), GetSourceURL());
if (URLDownloadToFile(NULL, GetSourceURL(), lpszTempFile, 0, NULL) == S_OK)
{
if (!ProcessHTML(lpszTempFile, GetSourceURL(), GetTargetURL(), GetURLName()))
{
TRACE(_T("ProcessHTML(%s) has failed\n"), lpszTempFile);
bRetVal = FALSE;
}
}
else
{
TRACE(_T("URLDownloadToFile has failed\n"));
bRetVal = FALSE;
}
}
else
{
TRACE(_T("GetTempFileName has failed\n"));
bRetVal = FALSE;
}
return bRetVal;
}
The Architecture
What do Source URL, Target URL, and URL Name mean in the above piece of code?
- Source URL = what web page to check
- Target URL = what link should be on the above web page
- URL Name = what name should be for the above link
Each URL definition is contained in a CLinkData
class, with the following interface:
DWORD GetLinkID();
- Gets ID of the current URL definitionvoid SetLinkID(DWORD dwLinkID);
- Sets ID for the current URL definitionCString GetSourceURL();
- Gets Source URL for current URL definitionvoid SetSourceURL(CString strSourceURL);
- Sets Source URL for current URL definitionCString GetTargetURL();
- Gets Target URL for current URL definitionvoid SetTargetURL(CString strTargetURL);
- Sets Target URL for current URL definitionCString GetURLName();
- Gets URL Name for current URL definitionvoid SetURLName(CString strURLName);
- Sets URL Name for current URL definitionint GetPageRank();
currently not implementedvoid SetPageRank(int nPageRank);
currently not implementedBOOL GetStatus();
- Gets status for current URL definitionvoid SetStatus(BOOL bStatus);
- Sets status for current URL definition
Then, we define CLinkList
as typedef CArray<CLinkData*> CLinkList;
.
This list is managed inside the CLinkSnapshot
class, with the following interface:
BOOL RemoveAll();
- Removes all URL definitions from listint GetSize();
- Gets the size of URL definition listCLinkData* GetAt(int nIndex);
- Gets an URL definition from listBOOL Refresh();
- Updates the status for each URL definition from listCLinkData* SelectLink(DWORD dwLinkID);
- Searches for a URL definition by its IDDWORD InsertLink(CString strSourceURL, CString strTargetURL, CString strURLName, int nPageRank, BOOL bStatus);
- Inserts a URL definition into listBOOL DeleteLink(DWORD dwLinkID);
- Removes an URL definition from listBOOL LoadConfig();
- Loads the URL definition list from XML fileBOOL SaveConfig();
- Saves the URL definition list to XML file
The Good, the Bad, and the Ugly
The good thing is that I learned to use Windows ribbons. The bad thing is that I still don't know how to get a web page's PageRank
value. The ugly thing is that the processing (i.e., checking link validity) should be done in a separate working thread, but I am planning this change for the next release. Stay tuned!
Final Words
IntelliLink application uses many components that have been published on CodeProject. Many thanks to:
- My
CMFCListView
form view (see source code) - Lee Thomason for his
TinyXML2
class - PJ Naughter for his
CTrayNotifyIcon
class - PJ Naughter for his
CVersionInfo
class
Further plans: I would like to add support for Google's PageRank as soon as possible.
History
- Version 1.04 (November 9th, 2014): Initial release
- Moved source code from CodeProject to GitLab (April 10th, 2020)
- Moved source code from GitLab to GitHub (February 23rd, 2022)
- Version 1.05 (April 28th, 2022): Added setup project
- Version 1.06 (May 23rd, 2022): Added program to Startup Apps
- Version 1.07 (August 20th, 2022): Updated font size of About dialog
- Version 1.08 (August 26th, 2022): Removed program from Startup Apps
- Version 1.09 (September 9th, 2022): Added Contributors hyperlink to AboutBox dialog
- Version 1.10 (January 23rd, 2023): Updated PJ Naughter's
CVersionInfo
library to the latest version availableUpdated the code to use C++ uniform initialization for all variable declarations
- Version 1.11 (January 24rd, 2023): Updated PJ Naughter's
CInstanceChecker
library to the latest version availableUpdated the code to use C++ uniform initialization for all variable declarations
- Replaced
NULL
throughout the codebase withnullptr
ReplacedBOOL
throughout the codebase withbool
This means that the minimum requirement for the application is now Microsoft Visual C++ 2010 - Version 1.12 (May 27th, 2023): Updated About dialog with GPLv3 notice
- Version 1.13 (June 16th, 2023): Made persistent the length of columns from interface
- Version 1.14 (June 24th, 2023): Updated PJ Naughter's
CTrayNotifyIcon
library to the latest version available - Version 1.15 (July 20th, 2023): Extended application's functionality with two new buttons: Website Review and Webmaster Tools
- Version 1.16 (August 20th, 2023):
- Changed article's download link. Updated the About dialog (email & website)
- Added social media links: Twitter, LinkedIn, Facebook, and Instagram
- Added shortcuts to GitHub repository's Issues, Discussions, and Wiki
- Version 1.17 (October 29th, 2023): Updated PJ Naughter's
CTrayNotifyIcon
library to the latest version availableFixed an issue where the
CTrayNotifyIcon::OnTrayNotification
callback method would not work correctly if them_NotifyIconData.uTimeout
member variable gets updated during runtime of client applications. This can occur when you callCTrayNotifyIcon::SetBalloonDetails
. Thanks to Maisala Tuomo for reporting this bug. - Version 1.18 (January 27th, 2024): Added ReleaseNotes.html and SoftwareContentRegister.html to GitHub repo
- Version 1.19 (February 21st, 2024): Switched MFC application' theme back to native Windows
- Version 1.20 (September 6th, 2024):
- Replaced old
XML
library from CodeProject with Lee Thomason'sTinyXML2
library. - Implemented User Manual option into Help menu.
- Implemented Check for updates... option into Help menu.
- Replaced old