Click here to Skip to main content
15,892,537 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,

I am developing a tool in C# for PDF comparison which will compare two PDF files.
For this I need to extract the PDF contect such as images, text, font size, bookmarks, etc.

Any idea how to do this in C#.

Thanks In Advance,
Kane
Posted

1 solution

To extract text/images from a PDF i would suggest using either PDF sharp or Itextsharp.

Download itextsharp dlls
http://sourceforge.net/projects/itextsharp/[^]

A documentation for Itextsharp api
http://www.afterlogic.com/mailbee-net/docs-itextsharp/[^]

Get text from all pages in itextsharp
C#
public static string GetTextFromAllPages(String pdfPath)
{
        PdfReader reader = new PdfReader(pdfPath); 

        StringWriter output = new StringWriter();  

        for (int i = 1; i <= reader.NumberOfPages; i++) 
            output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));

        return output.ToString();
}


How to extract images from PDF and save to file

http://kishor-naik-dotnet.blogspot.com/2011/01/cnet-extract-image-from-pdf-file.html[^]
 
Share this answer
 
Comments
kanekhan 27-Feb-13 1:04am    
Hi David,

Thanks for the reply. The above code looks fine, but I also need to get the font properties of the extracted pdf text like font size, font style, font colour.

Could you please reply me how to do that using iTextSharp or using any other way in C#.

Thanks in advance,
Kane

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900