Click here to Skip to main content
12,947,834 members (86,063 online)
Rate this:
Please Sign up or sign in to vote.
See more:
Hi All,

I have near about 116222 .pdf files. Among them I need to find out the corrupted files. Can any one please tell me is there any software (free or paid) to get those files which are corrupted or vice versa. I googled a lot but could not find any. All the result showing the fixing software.

Any suggestion will be very much helpful for me.
Posted 21-Aug-11 21:44pm
Richard MacCutchan 22-Aug-11 3:49am
Chances are that the only way to do this is to open every file with a PDF reader, or write your own application to analyse them.
arindamrudra 22-Aug-11 3:53am
But the number of file is very high, that is the issue.
Richard MacCutchan 22-Aug-11 4:14am
If these files already exist on your disk then there is nothing you can do without reading each individual file to check it. How else could you tell if it was corrupt?
arindamrudra 22-Aug-11 4:29am
Yes all the files is there in my disc. Can you please have a look at OriginalGriff's solution (very good tip) and the 2nd and the 3rd link from walterhevedeich those are also of high quality. So I am trying to follow these ways.
Richard MacCutchan 22-Aug-11 4:37am
Well one thing you may notice from all these links and suggestions is that you will have to read every file; there is no possible way to avoid this.
arindamrudra 22-Aug-11 4:42am
Yes that is correct. But if I implement the "SHA hash value" checking then it will be very easy. I will create a service that will call the files serially and will be checked by .NET. It will take the least time to check (I think so). But there is delivery timeline, thats why I am searching for some easy way.
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

The problem is in deciding if the file is "corrupted".

If you don't have a SHA hash value for each file, or something similar, then the only way you can tell if the file is corrupted is to try to read it as a PDF file - if you can't then it is either corrupt, or uses a later version of the PDF specification that your reader software.

If you can read them, then they probably aren't corrupt - you would need a human to reader them and ensure they look as they should I suspect - so you could ignore them.

I would process them through a reader and then set up an SHA hash for them, so that any changes can be detected immediately next time.
arindamrudra 22-Aug-11 3:55am
Thanks very good tip. I am going to search for "SHA hash value for each file".
Rate this: bad
Please Sign up or sign in to vote.

Solution 2

arindamrudra 22-Aug-11 4:17am
I have gone through your first link before your post. But the third link seems very good. The second link may fail due to the number of files. The system may hang.
Rate this: bad
Please Sign up or sign in to vote.

Solution 5

For anyone still seeking a solution to arindamrudra problem should take a look at this free, open source and small program called 'Recursive finder of corrupted PDF files' (download link:[^]) which will do just that: find recursively corrupted or password protected PDF files within a folder of a user's selection.

Good luck.
arindamrudra 26-May-14 7:47am
Nice one...
William van Velde 23-Jun-14 7:55am
Perfect, this is the tool i needed.
Rate this: bad
Please Sign up or sign in to vote.

Solution 6

Kornfeld Eliyahu Peter 12-Jan-15 4:56am
Have you recognized how old this post is? And already answered!!!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    Print Answers RSS
Top Experts
Last 24hrsThis month
OriginalGriff 5,344
CHill60 3,275
Maciej Los 2,628
Jochen Arndt 1,935
ppolymorphe 1,795

Advertise | Privacy | Mobile
Web02 | 2.8.170518.1 | Last Updated 12 Jan 2015
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100