Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: General software
Hi All,
 
I have near about 116222 .pdf files. Among them I need to find out the corrupted files. Can any one please tell me is there any software (free or paid) to get those files which are corrupted or vice versa. I googled a lot but could not find any. All the result showing the fixing software.
 
Any suggestion will be very much helpful for me.
Posted 21-Aug-11 21:44pm
Comments
Richard MacCutchan at 22-Aug-11 3:49am
   
Chances are that the only way to do this is to open every file with a PDF reader, or write your own application to analyse them.
arindamrudra at 22-Aug-11 3:53am
   
But the number of file is very high, that is the issue.
Richard MacCutchan at 22-Aug-11 4:14am
   
If these files already exist on your disk then there is nothing you can do without reading each individual file to check it. How else could you tell if it was corrupt?
arindamrudra at 22-Aug-11 4:29am
   
Yes all the files is there in my disc. Can you please have a look at OriginalGriff's solution (very good tip) and the 2nd and the 3rd link from walterhevedeich those are also of high quality. So I am trying to follow these ways.
Richard MacCutchan at 22-Aug-11 4:37am
   
Well one thing you may notice from all these links and suggestions is that you will have to read every file; there is no possible way to avoid this.
arindamrudra at 22-Aug-11 4:42am
   
Yes that is correct. But if I implement the "SHA hash value" checking then it will be very easy. I will create a service that will call the files serially and will be checked by .NET. It will take the least time to check (I think so). But there is delivery timeline, thats why I am searching for some easy way.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

The problem is in deciding if the file is "corrupted".
 
If you don't have a SHA hash value for each file, or something similar, then the only way you can tell if the file is corrupted is to try to read it as a PDF file - if you can't then it is either corrupt, or uses a later version of the PDF specification that your reader software.
 
If you can read them, then they probably aren't corrupt - you would need a human to reader them and ensure they look as they should I suspect - so you could ignore them.
 
I would process them through a reader and then set up an SHA hash for them, so that any changes can be detected immediately next time.
  Permalink  
Comments
arindamrudra at 22-Aug-11 3:55am
   
Thanks very good tip. I am going to search for "SHA hash value for each file".
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

  Permalink  
Comments
arindamrudra at 22-Aug-11 4:17am
   
I have gone through your first link before your post. But the third link seems very good. The second link may fail due to the number of files. The system may hang.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

Hi,
For anyone still seeking a solution to arindamrudra problem should take a look at this free, open source and small program called 'Recursive finder of corrupted PDF files' (download link: http://sourceforge.net/projects/corruptedpdfinder/[^]) which will do just that: find recursively corrupted or password protected PDF files within a folder of a user's selection.
 
Good luck.
CSilva.
  Permalink  
Comments
arindamrudra at 26-May-14 7:47am
   
Nice one...
William van Velde at 23-Jun-14 7:55am
   
Perfect, this is the tool i needed.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 6,696
1 Sergey Alexandrovich Kryukov 6,675
2 CPallini 5,315
3 George Jonsson 3,584
4 Gihan Liyanage 2,650


Advertise | Privacy | Mobile
Web03 | 2.8.140921.1 | Last Updated 15 May 2014
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100