Click here to Skip to main content
12,886,892 members (36,621 online)
Rate this:
Please Sign up or sign in to vote.
See more: C# Visual-Studio

I am developing a tool in C# for PDF comparison which will compare two PDF files.
For this I need to extract the PDF contect such as images, text, font size, bookmarks, etc.

Any idea how to do this in C#.

Thanks In Advance,
Posted 22-Feb-13 2:02am

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

To extract text/images from a PDF i would suggest using either PDF sharp or Itextsharp.

Download itextsharp dlls[^]

A documentation for Itextsharp api[^]

Get text from all pages in itextsharp
public static string GetTextFromAllPages(String pdfPath)
        PdfReader reader = new PdfReader(pdfPath); 
        StringWriter output = new StringWriter();  
        for (int i = 1; i <= reader.NumberOfPages; i++) 
            output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));
        return output.ToString();

How to extract images from PDF and save to file[^]
kanekhan 27-Feb-13 1:04am
Hi David,

Thanks for the reply. The above code looks fine, but I also need to get the font properties of the extracted pdf text like font size, font style, font colour.

Could you please reply me how to do that using iTextSharp or using any other way in C#.

Thanks in advance,

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web02 | 2.8.170424.1 | Last Updated 22 Feb 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100