Click here to Skip to main content
14,697,191 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
hi, I am not good with coding - in vb.net, I need to search one directory (c:\K_txt) of almost 3,000 .txt files for either a word or exact phrase that the user enters, and then load up a listbox with the names of those txt files that contain the word or phrase. Then when the file in the listbox is clicked it will be loaded into a text file with each instance of the word or phrase highlighted. Thank you.

What I have tried:

I have tried nothing, except trying to find the code that does this as it seems like a routine that will likely have been written many many times, I am collecting snippets of this or that aspect, so that I might be able to fit pieces together, but since there are so many files to through (3000) I know that any routine that I kludge together will be slow and inefficient. I have two books on programming in vb, but nothing to help with this. Thank you
Posted
Updated 11-Dec-18 1:44am

Quote:
I need to search one directory (c:\K_txt) of almost 3,000 .txt files
Searching this on the fly is a really very bad idea, and would have to make your users wait each time they make a change in the query. A good approach would be to read your files once, and create tokens (words, in English) in the files. This will tell your algorithm which file contains which words—you can specialize this into getting sentences, period separated let's say.

This will help you search for the words in your own data structure; a tree, trie, heap, you pick. This will help your users easily check which words are available in which files, because now your application will only have to go to your own data structure, instead of traversing the file system once again.

File system will be traversed once, only. Your structure will contain the data in an ordered and search-friendly way.

Quote:
either a word or exact phrase that the user enters,
Exactly my point, what happens when user wanted to search for "file" and entered "fole", your algorithm would be searching for "fole" in the directory, and then for "file" after it has traversed directory once. Not a good approach, and you need an alternate. One of such approaches is with MapReduce, in this approach you will be reading the files one by one, counting the overall words that exist and their number of occurrences. You can then feed this result in your own structure and query that, for a really better approach that the approach you are considering.

See the following links and learn something from there,
mapreduce - Hadoop searching words from one file in another file - Stack Overflow[^]
algorithms - Hadoop MapReduce Word Counting Example - Computer Science Stack Exchange[^] (You can call it, word finding)
Quote:
then load up a listbox with the names of those txt files that contain the word or phrase
Your structure will return everything they need, it will know which files contain "file", and will return them in a list—or however you have specified.
Quote:
a text file with each instance of the word or phrase highlighted
That depends on the app framework, and I will leave you with that here. :-)

Good luck.
   
v2
You could adapt the technique used in this CodeProject article File Searcher in C#[^]
or this one WinSearchFile: how to search files on your PC[^]

Or the techniques discussed here .net - c# Fastest string search in all files - Stack Overflow[^]
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900