Click here to Skip to main content
14,240,857 members
Rate this:
Please Sign up or sign in to vote.
See more:
hi, i want to compare two pdf files in java and highlight the  difference.but i am confused about which steps to be followed. 1)convert pdf into html 1.1)compare two html and highlight difference. 2) 1.convert pdf to xml 2.compare two xmls 3.convert xml to xsl or create xsl stylesheet  


What I have tried:

hi, i want to compare two pdf files in java and highlight the  difference.but i am confused about which steps to be followed. 1)convert pdf into html 1.1)compare two html and highlight difference. 2) 1.convert pdf to xml 2.compare two xmls 3.convert xml to xsl or create xsl stylesheet  
Posted
Updated 13-Jun-19 1:24am
Comments
Gerry Schmitz 9-Jun-19 19:09pm
   
How will those "differences" be reported? Does a missing period count as a difference? "Warning! File A has an extra period!" ... like that?
Richard MacCutchan 10-Jun-19 5:05am
   
Your first problem is how to do the actual conversion of PDF to HTML, XML etc.

1 solution

Rate this:
Please Sign up or sign in to vote.

Solution 1

Possible approaches in Java:
- compare binary files in Java
https://dzone.com/articles/comparing-files-in-java
- (more challenging) use Java iText to extract text contents from PDF's and compare text instead of binary files
- (still more challenging) structural comparison of PDF's per page using iText

Cheers,
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100