Filtering out HTML content
HTML Filter is a tool I wrote after being fed up with pop up windows of all kinds.
Applying HTML filtering to close automatically pop up windows, is of course a quite effective application. But that's not the only one, which actually brings a lot more interest to the technique.
I have spent much time trying to close pop ups automatically by using "external tools" such as this one. External tools check out open windows on a regular time basis against a known dictionary of banned window names. This works fine, as long as you're happy with being forced to add new entries every other day since ad names keep changing all the time.
So I had to find out a more internal way of doing it. I found that, after trying to work along with IE a couple of times already, there were many limits and weird things happening there with subscribed events, which for any reason sometimes don't trigger at all. I thought I had to find something more radical, and less coupled with the navigator I was using.
I finally successfully came up with a proxy filter, a systray tool which, once configured, sends back and forth, every HTTP packet, with the unique opportunity of seeing the HTML content itself.
window.open (url, "xyz", ...) things.
Configuring the tool
Once installed, it starts listening on the default 8010 port. If you are already using this port, change it, that's what the dialog box is for. Of course, you must let the navigator know that you are listening there, so let's open the Windows control panel, then double-click on Internet Options. In the Connections tab, just edit the Proxy Settings, click on Advanced, and type 127.0.0.1 in front of HTTP Proxy address to use - Server field, and type 8010 in the Port field. Press "Apply". Ok, you're done. You can go back and surf the web as you previously did, without notable changes (at least on surface).
If you are using Netscape or even Opera, just change the proxy settings using a similar procedure. For Netscape, go in the Edit / Preferences, then in Advanced / Proxy, and edit the HTTP Proxy field.
The filter is automatically activated, which means the HTML content going through it is filtered, and rules are applied. The source code provided filters
window.open statements, replacing them with faked
//ndow.open and it is up to you to add any other relevant rules in the
CHtmlFilterRules class implementation. To disable filtering, just right-click in the systray and choose the option.
I also wanted the tool not to slow down the surfing experience. This goal is achieved by using simple sockets instead of MFC wrappers such like
CAsyncSocket (which in turn mess a lot around with the
This tool acts as a proxy server. It basically implements a double-threaded socket line. The code is based on Nish's pop proxy server. How things work is depicted below :
How the html filter works
The main class is declared as below :
BOOL StartProxy(int port);
void EnableFiltering(BOOL bEnable=TRUE);
static DWORD ServerThread(void *arg);
static DWORD ClientThread(DWORD arg);
void StartClientThread(SOCKET sock);
static void StartDataThread(void *parm);
static DWORD DataThread(void *parm);
socket_pair(SOCKET s1, SOCKET s2, BOOL bIsServerResponse)
srcsock = s1;
dstsock = s2;
m_bIsServerResponse = bIsServerResponse;
n = 0;
What's funny is when you start working with threads, suddenly, everything comes so messed up. Indeed, every variable is under the potential fire of being accessed by several threads at the same time, making it just harder to code practically anything. I ended up associating a socket pair instance to each thread and basically referring to this object in every line of code, so to make sure I was sort of thread-safe. But it sucks, what one needs at this particular moment is an easy framework to attach variables and maps to the running thread. It becomes so amazing just because under Win32 the thread callback is a
static (read global) function, thus used and reused by each thread.
In the end, I have code like this when it comes to sending server responses back to the client :
DWORD CHttpProxyMT::DataThread(void *parm)
socket_pair* spair = (socket_pair*) parm;
while( (spair->n=recv(spair->srcsock, spair->buff, 16384, 0))>0 )
spair->buff[spair->n] = 0;
if (g_bFilteringEnabled && spair->m_bIsServerResponse)
CHtmlFilterRules filter( spair->buff,spair->n );
send(spair->dstsock, spair->buff, spair->n, 0);
CHtmlFilterRules::CHtmlFilterRules(char *buffer, int nLength)
m_cpBuffer = buffer;
m_nLength = nLength;
if (!m_cpBuffer || !m_nLength) return FALSE;
char *buf = new char[m_nLength+1];
if (!buf) return FALSE;
memcpy(buf, m_cpBuffer, m_nLength);
char *szPattern = buf;
char *szFirstByte = buf;
while ( (szPattern=strstr(szPattern,"window.open"))!=NULL )
m_cpBuffer[szPattern-szFirstByte+0] = '/';
m_cpBuffer[szPattern-szFirstByte+1] = '/';
delete  buf;
Code listing: (both VC6 and VC7 workspaces provided)
- HtmlFilterRules.cpp : HTML filter
- HttpProxyMT.cpp : based on Nish's PopProxyMT multi-threaded POP proxy server
- htmlfilterdlg.cpp : port configuration, menu commands
- htmlfilter.cpp : Win app
- TrayNot.cpp : simple systray implementation