Hi people. I am beginner in R Language.
I have the following problem: from many PDF files containing technical reports (in Portuguese language) from many authors (all is in Natural Language) how can I develop an Intelligent System to identify the Author(s) Name(s) by an input of small set of Keywords that are nearly matched with their works done?
For example, I know that to read and start to process this text in R I can use the following line codes: (where yyyyyyyyyyyyyy is the URL or the drive path where is my PDF file, for ex. XXX.pdf)
text <- pdt_text("./XXX.pdf")
I know that I will need to make a NLP (Natural Language Processing) from here, but how is the best way to do this? Will I need use ontology?
After this, after structured this text processing how can I develop an Intelligent System to identify the Author(s) Name(s) by an input of small set of Keywords that are nearly matched with their works done?
Thanks for any help
What I have tried:
I tried read the text in Natural Language inside a PDF report and it looks ok, but after this I don't know how to proceed.