Click here to Skip to main content
13,046,281 members (115,527 online)
Rate this:
Please Sign up or sign in to vote.
See more:
Suppose we have a text document with us and if we input a string then what sort of algorithms and parsing techniques should we use to find the text that which is relevant to the input string.
Posted 13-Feb-13 19:56pm
Guirec Le Bars 14-Feb-13 2:08am
what do you mean by "relevant" ?

if you mean equals you just can do : text.Contains(searchedThing);

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

There are a number of answers here, all mostly beyond the scope of a question here.

1. As Guirec says, if you're looking for a word/phrase in the text, use String.Contains(), or whatever a particular language offers for finding substrings.

<edit> that will just tell you if the string contains the string, to locate it you'll need to use something more like String.IndexOf(), which will return the position of the substring in the larger string.

2. If you need to match things like numbers, words, etc. Lookup Regular Expressions. Wikipedia will have a reasonable description.

If your needs are more complex than that, and you want to parse text in some computer language, you'll need to get into some serious reading. Parsing languages is a heavy task. The standard text on that is the Dragon Book - a google search will direct you to it - but there are many alternatives.

If you're trying to parse natural language text, you're attempting a task that stumps even the best computer scientists, and is not solvable by an algorithm, but needs advanced Artificial Intelligence approachs that I don't even begin to understand, and probably a hell of a lot of computing power. IBM's Watson managed a passable attempt, but most of us don't have that kind of budget.
Arjun Abco 15-Feb-13 0:13am
i am actually lookin for effective algorithms which can extract the keywords from a string through parsing and then as a next step i am trying to make the compiler map all the related docs related to the keywords. More or less like a IDF algo... I am beginner in these things so i really dont have an idea about how to pull it off.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web02 | 2.8.170713.1 | Last Updated 14 Feb 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100