Click here to Skip to main content
15,886,609 members
Please Sign up or sign in to vote.
3.00/5 (2 votes)
See more:
Guys,

Does any one knows good open source\third party .NET library to redact Pdf's for sensitive information.

I Google'ed for it but no one is of use. Lots of libraries have below limitation.
1) Can't redact pdf using Regex.
2) After redaction if we convert redacted-pdf to text then it shows sensitive information which is obviously not of use.


Thanks in advance :)
Posted
Updated 1-May-13 4:35am
v3
Comments
Kenneth Haugland 1-May-13 10:35am    
Have you looked at iText?
ganesh.rit 1-May-13 11:00am    
No... I am looking into how we can use itextSharp for pdf redaction... if you have any idea please let me know
Sergey Alexandrovich Kryukov 1-May-13 10:36am    
I think by "redact" you mean "edit". Please check with some English dictionary.
The question is really vague.
—SA
CHill60 1-May-13 10:40am    
Sergey Alexandrovich Kryukov 1-May-13 11:02am    
I know, I just thought it should be "edit", but Richard explained by another possible meaning below.
As in some languages a word like "redact" is used instead of "edit" (when there is no a word "edit"), I assumed it was the influence of OP's native language.
—SA

This 3rd party tool definitely allows for regex http://www.gnostice.com/nl_article.asp?id=238&t=PDF_Text_Redaction_Using_PDFOne_NET[^] although it's not cheap
 
Share this answer
 
Comments
ganesh.rit 1-May-13 11:04am    
Hi Chill60,

Thanks for your response,
I already tried PDFOne... but its not of use... first it doesn't succeed to redact everytime... if however succeeded and if we convert redacted pdf to text it shows redacted text which is obviously not of use

Thanks,
Ganesh.
CHill60 1-May-13 11:12am    
Have you tried their support center with the not working every time issue?
On the redacted text showing - A lot will depend on how you are converting the pdf to text - again something for the Gnostice support guys
Sergey Alexandrovich Kryukov 1-May-13 11:05am    
My 5.
—SA
CHill60 1-May-13 11:06am    
Thank you!
sangitasojitra 26-Mar-14 11:34am    
Ganesh,
I am also looking for some good redaction tool which redacts sensitive information from pdf and should not present after it is converted to text document.
Do you know if any tool provides same?
your help is appreciated a lot.
Thank you.
Sangita.
The only way I see this working is removing the text entirely and replacing it with black bar images.

As you're already found out, just drawing a black box over the text doesn't work as the text is still there, unprotected. All you have to do is convert the PDF to text to get it or open the PDF in some PDF editor and just remove the black boxes.

I don't know of any library that does this for you and I don't have any code samples to do this with any library. I think you're in uncharted ground here using free libraries.
 
Share this answer
 
Using iTextSharp I am able to redact area of pdf knowing page and area of raster image pdf.
 
Share this answer
 
Comments
Member 10070308 23-Jan-14 7:00am    
Can you please share the code ? I want exactly the same thing.
ganesh.rit 24-Jan-14 0:30am    
I drawn rectangle of black color over raster image pdf, where I know page and area to which i have to apply redaction. you can find code on internet to draw rectangle using iTextSharp.

Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900