Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: Algorithms Parsing
Suppose we have a text document with us and if we input a string then what sort of algorithms and parsing techniques should we use to find the text that which is relevant to the input string.
Posted 13-Feb-13 20:56pm
Comments
Guirec Le Bars at 14-Feb-13 2:08am
   
what do you mean by "relevant" ?
 
if you mean equals you just can do : text.Contains(searchedThing);

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

There are a number of answers here, all mostly beyond the scope of a question here.
 
1. As Guirec says, if you're looking for a word/phrase in the text, use String.Contains(), or whatever a particular language offers for finding substrings.
 
that will just tell you if the string contains the string, to locate it you'll need to use something more like String.IndexOf(), which will return the position of the substring in the larger string.

 
2. If you need to match things like numbers, words, etc. Lookup Regular Expressions. Wikipedia will have a reasonable description.
 
If your needs are more complex than that, and you want to parse text in some computer language, you'll need to get into some serious reading. Parsing languages is a heavy task. The standard text on that is the Dragon Book - a google search will direct you to it - but there are many alternatives.
 
If you're trying to parse natural language text, you're attempting a task that stumps even the best computer scientists, and is not solvable by an algorithm, but needs advanced Artificial Intelligence approachs that I don't even begin to understand, and probably a hell of a lot of computing power. IBM's Watson managed a passable attempt, but most of us don't have that kind of budget.
  Permalink  
v2
Comments
Arjun Abco at 15-Feb-13 0:13am
   
i am actually lookin for effective algorithms which can extract the keywords from a string through parsing and then as a next step i am trying to make the compiler map all the related docs related to the keywords. More or less like a IDF algo... I am beginner in these things so i really dont have an idea about how to pull it off.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 6,165
1 DamithSL 4,658
2 Maciej Los 4,107
3 Kornfeld Eliyahu Peter 3,649
4 Sergey Alexandrovich Kryukov 3,382


Advertise | Privacy | Mobile
Web04 | 2.8.141220.1 | Last Updated 14 Feb 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100