Searching on Text Files






1.43/5 (8 votes)
This program is to search for words on text files.
Introduction
This is one of my projects that has a program to search on a text file. Assume that you have a set of text files stored somewhere in the hard disk. You want to find some text files, but you don't remember the file name. However, you know the content that you're looking for so that you have some keywords to search for. This is like the search function of Windows.
Background
Some of the requirements are:
- Create the
FileList
: Create a text file namedFileList
to store all of the text file paths. Each line of this file is a file path. Every line has an ID to identify the file path. The ID number starts at 0. - Indexing: Scan all text files and store each word into a Binary Search Tree for searching quickly. Every node in the tree contains a word, a list of ID numbers, and left and right pointers.
- Display: Only output a little portion of the text files that contain the keywords and the ID to know which file was searched.
Using the Code
To create the FileList
, I use the CStdioFile
class:
// Create file
CStdioFile file;
file.Open("FileList.txt",CFile::modeCreate|CFile::modeReadWrite);
CFileFind Finder; // Find file path
BOOL bWorking = Finder.FindFile(m_PATH + "\\*.txt"); // Only file text files
while(bWorking)
{
bWorking = Finder.FindNextFile();
if (!Finder.IsDirectory())
{
file.WriteString(Finder.GetFilePath()); // Write file path
file.WriteString("\n");
}
}
file.Close();
For searching, I use a Binary Search Tree to store the words. Firstly, I scan the directory stores text files to create FileList
. Then, open every text file in FileList
to scan for words. Every word is stored in the BST. A word can have many IDs, so I use a Linear Linked List to store the ID numbers.
// Search word
ListID* CTinyGoogleDlg::SearchWord(string key)
{
tree* current;
ListID *tmp = NULL;
// Find word
if (head)
{
current = head;
while (current)
{
if (strcmp(current->word,key) == 0)
break;
else
if (strcmp(current->word,key) < 0)
current = current->right;
else
if (strcmp(current->word,key) > 0)
current = current->left;
}
}
else
MessageBox("Something's wrong!");
// Return list of IDs
if (!current)
return tmp;
else
return current->IDs;
}
Then, ask the user to input keywords to search. Search on the Binary Tree to find whether the keywords exist or not. If yes, use the ID to open the text file. Then, print out some lines of the text file in the result.
// Display results
int CTinyGoogleDlg::Display(ListID *curr)
{
CStdioFile file;
CString sText;
m_RESULT = "";
if (curr)
{
if (file.Open("FileList.TXT",CFile::modeRead))
{
int count = -1;
while (curr)
{
// Find file path to open text file by checking ID
CString path;
do
{
file.ReadString(path);
count += 1;
}while(count < curr->ID);
CString DocID;
DocID.Format("%d",curr->ID);
m_RESULT = m_RESULT + "\r\nDocID:" + DocID + "\r\n";
// Open file and display a part of paragraph
CStdioFile read;
read.Open(path,CFile::modeRead);
for (short nLineCount = 0; nLineCount < 16; nLineCount++)
{
read.ReadString(sText);
m_RESULT = m_RESULT + sText + "\r\n";
}
// Set lines in edit
GetDlgItem(IDC_EDIT_RESULT)->SetWindowText(m_RESULT);
read.Close();
curr = curr->next;
}
}
file.Close();
}
else
MessageBox("NOT FOUND!");
return 0;
}
Points of Interest
In the beginning, I met with some trouble on how to find the file paths. This wasn't very difficult, but at my level, it's not very easy. However, I found some ways on the Internet, and CodeProject helped me very much. Now, I am sharing my little program with others.
History
The first version of this program was written as a Win32 console app. This version is an MFC app.