This was actually an idea that somehow entered my head after getting hundreds of e-mails from people asking me to help them clean their systems infected with the
Win32.Redlof.A HTML based script following my own article on the script based virus at NewOrder Security titled
"Paper on the Win32.Redlof.A virus" (or so).
The virus code is actually in VBScript, and appended to every infected file. Anti-virus programs can detect the presence of the virus but I am NOT aware of any which REMOVES the offending code - so till now every victim has to manually search and replace the offending string from each file, and even with ONE missed file - you get infected again !
I set out to write a small utility which would find the infected files and remove the offending code, and so
ScanR was born.
What this utility does is search a specific string occurrence inside TEXT (ASCII) files and replace them with another string which may also be NULL (which amounts to replacing the search string fully).
Thus to remove the virus string, I put the Search string to the virus script and the replacement string to NULL. Thankfully this approach worked, and everybody lived happily ever after.
This program is that very stuff, with a few bugs removed (and more introduced), and some redundant code eliminated, which REALLY has boosted up program performance.
Note: I have made a MORE feature rich implementation of this utility called CleanR, which operates on a single file, thus allowing me to focus more on text search-replace performance than File IO Performance. This should be available at CodeProject itself, if Chris lets it be so ;)
Though, BOTH the programs operate on the SAME search-replace engine code-base, I had to do minor modifications in each to optimize the code where it was pinching most.
Though not absolutely essential, you can check out my article detailing the virus, it's at NewOrder Security - you can get it by clicking on the "..older posts" link.(I have forgotten the link actually).
The paper should ALSO available here .
This tool actually consists of two parts:
- The file list generation class -
- The string search and replace engine class -
The CGetFileList is declared as:
CGetFileList(const char* szStartingDir,const char* szDirWildCard,
const char* szFileWildCard,DWORD dwMaxFileSize,
DWORD FindnRecurseDir(const char* szStartingDir,
const char* szDirWildCard,
const char* szFileWildCard);
DWORD FindFileMatching(const char*szPathToFindFiles,
const char* szFileWildCard);
BOOL ShouldIReadThisFile(WIN32_FIND_DATA *FileData,
BOOL ProcessThisFile(const char* szPathName,const char* szFileName);
As evident from the class constructor, the class takes 4 arguments:
const char* szStartingDir : Defines the directory/drive under which the program should start searching.
const char* szDirWildCard : Defines the directory wildcard
Note: If szDirWildCard is NULL, then it scans the root directory also.('*' DOES NOT resolve to NULL, i.e. G:\*\*.ABC does not resolve to G:\*.ABC. In my program however, if you search for G:\*.ABC, then it amounts to searching G:\*.ABC AND G:\*\*.ABC - What do you think about it ?)
const char* szFileWildCard : Defines the filename wildcard
DWORD dwMaxFileSize : Defines the maximum file size that the program should process.
This class just iterates though child directories matching
szStartingDir searching for filenames matching
szFileWildCard, and verifying if it should process the file by calling
BOOL ShouldIReadThisFile(WIN32_FIND_DATA *FileData, BOOL bJudgeByExtension);
This function sees if the filename or extension is/are blacklisted. File extensions are checked for if
bJudgeByExtension is TRUE, else, it will just compare filenames.
You can set custom blacklists by modifying the appropriate variables in
Settings.h file, and recompiling the application.
ShouldIReadThisFile returns TRUE, then
BOOL ProcessThisFile(const char* szPathName,const char* szFileName); is called, which calls
CCleanR to do the dirty work of text search and replace.
CCleanR class is defined as :
BOOL SetReplacementString(const char *szReplacementString);
BOOL SetSearchString(const char *szSearchString);
BOOL SetFileName(const char *szFileName);
CCleanR(CCleanRboolSet boolSet,LPCTSTR szOutputFileName);
BOOL IsCharBelongingToSet(char cCharToTest,char *szSetValues);
I think that the names are good enough to let you know what they do. For a detailed discussion, please refer to my article on the
CCleanR class which I will soon be sending in at CodeProject.
The source code is also well commented in case you want to know more.
Using the code
It's recommended that you first check out the
CCleanR engine on which this program is based, and it's accompanying demonstration program. It's here at CodeProject itself !
You should have realized by now that the application is based on two standalone classes which are ready to be used in any application without or with little modification.
However, as in any computer code, these two classes may also have something which could have been avoided or added or removed or just plain complicated - I leave it to you to please review my code and send in your constructive criticisms, bug reports and possible betterment of code.
To make it easier for you, I have packaged a sample MFC application with source code implementing all the code we just discussed.
Those text files which contain the supplied search string, have the matching string replaced and the resulting file copied into this program's directory. You can also choose to Overwrite the original file with the modified one by setting
bOverWriteFile of the
theR defined in
GenFileList.cpp to TRUE.
If the original file is left untouched, all modifications are reflected in the copy of the file present in this program's directory.
The filenames are also logged in a text file named
InfectedFiles.log. You can change this name by editing
Settings.h and recompiling the program. Those filenames which could NOT be processed are logged in
IgnoredFiles.txt alongwith the reason for them being ignored.
Comments have been added generously whenever applicable, and if they are not enough I will be happy to update this article with more discussions about the code.
- 14th June 2003 - Replaced edit box showing list of matching files with an edit box where you can enter the replacing string (to be put in place of the matching string)
- 21st June 2003 - Added more search options which had not been implemented previously.
There are still a LOT of features of the CCleanR class STILL not implemented here, check out my dedicated article on the CCleanR engine and it's abilities. It's here at CodeProject !