You may consider using Text Mining (TM) to detect the similarity of two documents. It is impossible to go into the details here. Briefly, TM involves, among other things, removing unnecessary or meaningless words, such as punctuation, stop words, trivial words, looking out for synonyms, etc as a way to transform free-form unstructured textual content into structured data that can be used by the computer to machine-learn of any patterns using appropriate AI techniques. It is a precursor to data mining. I have not even started talking about coding here.
To begin with, you should sign up for some AI modules, esp Text Mining, in your college to prepare yourself for such a project.
For your reference
Text Mining:The state of the art and the challenges[
^]