I am try to make a program(by C and C++ language), which input an article and analize the article's content and find some article which has same content.(English article)
I already made R-tree to index but I don't know how to get the content from article.
how can I get feature value from text by article?
The way I tried is build an dictionary from the text is inputed(only include noun), and from articles find some article which has same words. and output those article file's name.
But I think my method is not very well so I want to try an advansed one.