How do I duplicate a PDF with some text replacement and redaction

Question

0.00/5 (No votes)

See more:

Hi,

Since last one day, I am exploring couple of third party components to work with PDF through C#. These are Aspose.pdf.net and iTextSharp. Following are the details about what I am exploring them for:

I have some PDFs that contain sensitive information in form of text, like name of person, city, etc.
These PDFs need to be duplicated into another copy but while creating duplicated copy, sensitive text needs to be found & replaced with some dummy text.
The replacement is essential to avoid tracing original information, by any fraudulent means.
Also, the replaced text requires to be redacted.

Finding text is expected to support RegEx, as there could be variations of text that needs to be masked.

Could you please assist me how can this be done using iTextShart.

Thanks in advance.

What I have tried:

I tried to explore various options through iTextSharp and succeeded in duplicating PDF file but yet am not able to search and replace text.

Posted 1-Sep-16 20:18pm

kapil koli

Updated 8-Sep-16 3:54am

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Garth J Lancaster · Answer 1 · 2016-09-02T16:07:00

There's a little discussion on that here replace string in PDF document (ITextSharp or PdfSharp) - Stack Overflow[^] - the code shown may or may not work ..

My approach would be 'different', and depends on how many document formats you have - note, under no circumstances just redact text by drawing/stamping a 'black box' over the text to be redacted, because the pdf document itself still holds the data, and a binary inspection could reveal the details.

I would parse all the text from a document Read Text from a PDF in C# with iTextSharp – Chris Schiffhauer[^] and build the redacted document from scratch - ok, its easy for me to say that, it depends how complicated your documents are

LEADTOOLS Support · Answer 2 · 2016-09-08T03:54:00

Here’s another approach that you can follow if you don’t want to have searchable text on the result PDF file:
1- Parse the text from the original PDF file and keep record of the rectangles where the text you wish to redact is located.
2- Convert the PDF pages to raster images.
3- Draw redaction rectangles on the raster images using the rectangle information obtained from step 1.
4- Save the result image as new PDF pages that do not contain any of the original text.

This way you will guarantee 100% that the resulting file has none of the original text.

How do I duplicate a PDF with some text replacement and redaction

2 solutions

Solution 1

Solution 2

Add your solution here

Preview 0