Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# Visual-Studio
Hi All,
 
I am developing a tool which reads PDF content for comparison. I am using itextsharp to read the content.
Along with the content, I also need to fetch the alignment properties of the PDF, like the alighment of the line, title, header, footer, images etc for comparison.
I also need to get the spacing between two lines.
 
Please give me some ideas or techniques to do that.
 
Thanks in advance,
Kane
Posted 3-Mar-13 21:04pm
Comments
Sandeep Mewara at 4-Mar-13 2:28am
   
Tried anything so far?
kanekhan at 4-Mar-13 3:44am
   
I am using Itextsharp, I have written below code so far,
 
public string ReadPdfFile()
{
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader(@"\\FilePath");

for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;

}
reader.Close();
}
catch (Exception ex)
{
}
return strText;
}

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 505
1 Maciej Los 325
2 Richard MacCutchan 265
3 Mathew Soji 220
4 BillWoodruff 210
0 OriginalGriff 8,804
1 Sergey Alexandrovich Kryukov 7,457
2 DamithSL 5,689
3 Maciej Los 5,279
4 Manas Bhardwaj 4,986


Advertise | Privacy | Mobile
Web01 | 2.8.1411028.1 | Last Updated 4 Mar 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100