search engine for documents finding keywords

Question

1.00/5 (1 vote)

See more:

I have a task on my hands.

Basically what I have to do is to create a simple search engine that goes through a group of text documents and record for each word in the document collection all documents that contain a particular word.

The simple search engine must accept a search query (containing a set of keywords) and identify each document that contain all or some keywords.

It should then print documents names in descending order of keywords found, this means the document that contains all keywords should appear at the top of the list

I'm struggling with the pseudocode let alone the program for it.

Posted 15-Apr-15 11:03am

Member 11610671

Updated 15-Apr-15 14:00pm

Sergey Alexandrovich Kryukov

v2

Add a Solution

Comments

Nelek 15-Apr-15 17:15pm

Don't think we can read minds or do astral projections to see your monitor. If you need help, the least you could do is to add some relevant code to your question or to explain your problem in such a way, that the users of CP can understand it. Otherwise, nobody will be able to help you.

You just gave a list of requirements and say I am stuck with the pseudocode. Ok, perfect... where? why? What have you tried?[^]

Sascha Lefèvre 15-Apr-15 18:13pm

If it is not your task to develop this yourself but to get any solution then go for Lucene.

Member 11610671 16-Apr-15 7:09am

Apologies, to be honest I'm pretty weak at programming and I'm not too sure what to do. To basically sum it up, the user has a search query and if they input a word or words they can find out if the keywords exists in the document/documents

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

BacchusBeale · Answer 1 · 2015-04-15T15:06:00

For example, the pseudocode might be:

> define a class Result with variables int count and string filename
> make an ArrayList or other collection to add Results to
> get List of file names from directory
> get list of keywords from user
> for each file in file names do:
>> for each keyword do:
>>> search for keyword
>>>> if found: count++
>>> end
>>if count>0: add Result to list
>>end

>sortByCount
>print List

PIEBALDconsult · Answer 2 · 2015-04-17T08:10:00

I don't know Java, but in C# I'd read the whole file with System.IO.File.ReadAllText(String) then use a RegularExpression.
I definitely would not use IndexOf -- that will lead to false-positives.

For example:

C#

System.Text.RegularExpressions.Regex reg = 
  new System.Text.RegularExpressions.Regex
  ( @"(?i)\b(a)|(the)|(this)\b" ) ; // Create the expresion from the provided terms

System.Text.RegularExpressions.MatchCollection mat = reg.Matches ( args [ 0 ] ) ;
          
System.Console.WriteLine ( mat.Count ) ;

search engine for documents finding keywords

2 solutions

Solution 1

Solution 2

Add your solution here

Preview 0