Click here to Skip to main content
15,867,594 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Suppose we have a text document with us and if we input a string then what sort of algorithms and parsing techniques should we use to find the text that which is relevant to the input string.
Posted
Comments
Guirec 14-Feb-13 2:08am    
what do you mean by "relevant" ?

if you mean equals you just can do : text.Contains(searchedThing);

1 solution

There are a number of answers here, all mostly beyond the scope of a question here.

1. As Guirec says, if you're looking for a word/phrase in the text, use String.Contains(), or whatever a particular language offers for finding substrings.

<edit> that will just tell you if the string contains the string, to locate it you'll need to use something more like String.IndexOf(), which will return the position of the substring in the larger string.


2. If you need to match things like numbers, words, etc. Lookup Regular Expressions. Wikipedia will have a reasonable description.

If your needs are more complex than that, and you want to parse text in some computer language, you'll need to get into some serious reading. Parsing languages is a heavy task. The standard text on that is the Dragon Book - a google search will direct you to it - but there are many alternatives.

If you're trying to parse natural language text, you're attempting a task that stumps even the best computer scientists, and is not solvable by an algorithm, but needs advanced Artificial Intelligence approachs that I don't even begin to understand, and probably a hell of a lot of computing power. IBM's Watson managed a passable attempt, but most of us don't have that kind of budget.
 
Share this answer
 
v2
Comments
Arjun Abco 15-Feb-13 0:13am    
i am actually lookin for effective algorithms which can extract the keywords from a string through parsing and then as a next step i am trying to make the compiler map all the related docs related to the keywords. More or less like a IDF algo... I am beginner in these things so i really dont have an idea about how to pull it off.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900