A reliable, professional-grade solution requires a lot of programming, and is not a trivial task. One good example you can find online in my free Semantic Analyzer, which extracts words and sentences from arbitrary text (btw, multilingual) and then apply concordance calculator to compute the frequency of word occurences: Semantic Analyzer
In general, you first must get a string containing the plain text of interest (no formatting etc), then remove all special characters (like ",", ":", ";", etc.) using either
or regular expression, then apply
using " " separator. You will get an array of strings containing words in the text. In real world solution, you must do much more of string processing, for e.g., replacing trailing blank spaces " " with just a single one " ", etc. As mentioned above, entire production-grade solution goes far beyond the boundary of just a single article, and is also subject/domain-specific. You should probably start with simple proto and then trim it to fit your particular case. For your immediate needs, you can use my free online semantic analyzer, which provides a reasonable accuracy.