Click here to Skip to main content
15,886,137 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,

Since last one day, I am exploring couple of third party components to work with PDF through C#. These are Aspose.pdf.net and iTextSharp. Following are the details about what I am exploring them for:

I have some PDFs that contain sensitive information in form of text, like name of person, city, etc.
These PDFs need to be duplicated into another copy but while creating duplicated copy, sensitive text needs to be found & replaced with some dummy text.
The replacement is essential to avoid tracing original information, by any fraudulent means.
Also, the replaced text requires to be redacted.

Finding text is expected to support RegEx, as there could be variations of text that needs to be masked.

Could you please assist me how can this be done using iTextShart.

Thanks in advance.

What I have tried:

I tried to explore various options through iTextSharp and succeeded in duplicating PDF file but yet am not able to search and replace text.
Posted
Updated 8-Sep-16 3:54am

There's a little discussion on that here replace string in PDF document (ITextSharp or PdfSharp) - Stack Overflow[^] - the code shown may or may not work ..

My approach would be 'different', and depends on how many document formats you have - note, under no circumstances just redact text by drawing/stamping a 'black box' over the text to be redacted, because the pdf document itself still holds the data, and a binary inspection could reveal the details.

I would parse all the text from a document Read Text from a PDF in C# with iTextSharp – Chris Schiffhauer[^] and build the redacted document from scratch - ok, its easy for me to say that, it depends how complicated your documents are
 
Share this answer
 
v2
Here’s another approach that you can follow if you don’t want to have searchable text on the result PDF file:
1- Parse the text from the original PDF file and keep record of the rectangles where the text you wish to redact is located.
2- Convert the PDF pages to raster images.
3- Draw redaction rectangles on the raster images using the rectangle information obtained from step 1.
4- Save the result image as new PDF pages that do not contain any of the original text.

This way you will guarantee 100% that the resulting file has none of the original text.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900