Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# Visual-Studio
Hi,
 
I am developing a tool in C# for PDF comparison which will compare two PDF files.
For this I need to extract the PDF contect such as images, text, font size, bookmarks, etc.
 
Any idea how to do this in C#.
 
Thanks In Advance,
Kane
Posted 22-Feb-13 2:02am

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

To extract text/images from a PDF i would suggest using either PDF sharp or Itextsharp.
 
Download itextsharp dlls
http://sourceforge.net/projects/itextsharp/[^]
 
A documentation for Itextsharp api
http://www.afterlogic.com/mailbee-net/docs-itextsharp/[^]
 
Get text from all pages in itextsharp
public static string GetTextFromAllPages(String pdfPath)
{
        PdfReader reader = new PdfReader(pdfPath); 
 
        StringWriter output = new StringWriter();  
 
        for (int i = 1; i <= reader.NumberOfPages; i++) 
            output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));
 
        return output.ToString();
}
 
How to extract images from PDF and save to file
 
http://kishor-naik-dotnet.blogspot.com/2011/01/cnet-extract-image-from-pdf-file.html[^]
  Permalink  
Comments
kanekhan at 27-Feb-13 1:04am
   
Hi David,
 
Thanks for the reply. The above code looks fine, but I also need to get the font properties of the extracted pdf text like font size, font style, font colour.
 
Could you please reply me how to do that using iTextSharp or using any other way in C#.
 
Thanks in advance,
Kane

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 349
1 Nirav Prabtani 268
2 Richard Deeming 210
3 CHill60 170
4 _Amy 145
0 OriginalGriff 8,104
1 Sergey Alexandrovich Kryukov 7,067
2 Maciej Los 4,039
3 Peter Leow 3,738
4 CHill60 2,912


Advertise | Privacy | Mobile
Web04 | 2.8.140721.1 | Last Updated 22 Feb 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100