How to extract images, text and font details from PDF file in C#

Question

0.00/5 (No votes)

See more:

Hi,

I am developing a tool in C# for PDF comparison which will compare two PDF files.
For this I need to extract the PDF contect such as images, text, font size, bookmarks, etc.

Any idea how to do this in C#.

Thanks In Advance,
Kane

Posted 22-Feb-13 2:02am

kanekhan

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

David_Wimbley · Answer 1 · 2013-02-22T05:59:00

To extract text/images from a PDF i would suggest using either PDF sharp or Itextsharp.

Download itextsharp dlls
http://sourceforge.net/projects/itextsharp/[^]

A documentation for Itextsharp api
http://www.afterlogic.com/mailbee-net/docs-itextsharp/[^]

Get text from all pages in itextsharp

C#

public static string GetTextFromAllPages(String pdfPath)
{
        PdfReader reader = new PdfReader(pdfPath); 

        StringWriter output = new StringWriter();  

        for (int i = 1; i <= reader.NumberOfPages; i++) 
            output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));

        return output.ToString();
}

How to extract images from PDF and save to file

http://kishor-naik-dotnet.blogspot.com/2011/01/cnet-extract-image-from-pdf-file.html[^]