Hi,
A reliable, professional-grade solution requires a lot of programming, and is not a trivial task. One good example you can find online in my free Semantic Analyzer, which extracts words and sentences from arbitrary text (btw, multilingual) and then apply concordance calculator to compute the frequency of word occurences:
Semantic Analyzer[
^]
In general, you first must get a string containing the plain text of interest (no formatting etc), then remove all special characters (like ",", ":", ";", etc.) using either
String.Replace()
or regular expression, then apply
String.Split()
using " " separator. You will get an array of strings containing words in the text. In real world solution, you must do much more of string processing, for e.g., replacing trailing blank spaces " " with just a single one " ", etc. As mentioned above, entire production-grade solution goes far beyond the boundary of just a single article, and is also subject/domain-specific. You should probably start with simple proto and then trim it to fit your particular case. For your immediate needs, you can use my free online semantic analyzer, which provides a reasonable accuracy.
Kind regards,
AB