Click here to Skip to main content
13,349,997 members (48,950 online)
Click here to Skip to main content
Add your own
alternative version


169 bookmarked
Posted 20 May 2006

Extract Text from PDF in C# (100% .NET)

, 20 May 2006
Rate this:
Please Sign up or sign in to vote.
A simple class to extract plain text from PDF documents with ITextSharp


This is a 100% .NET solution to extract text from PDF documents.


Dan Letecky posted a nice code on how to extract text from PDF documents in C# based on PDFBox. Although his solution works well it has a drawback, the size of the required additional libraries is almost 16 MB. Using iTextSharp the size of required additional libraries is only 2.3 MB.

Using the Code

In order to use this solution in your projects, you need to do the following steps:

  • Add references to itextsharp.dll and SharpZiplib.dll
  • Add the PDFParser.cs class to your project

Then you can use the newly added class in the following way:

// create an instance of the pdfparser class
PDFParser pdfParser = new PDFParser();
// extract the text
String result = pdfParser.ExtractText(pdfFile);

I also created a small console application which uses the class and shows the progress of the conversion. Please keep in mind that if you try to extract text from big PDF files, keeping all the resultant text in memory is not the best solution, in these cases you should write the extracted text to the file after parsing every page.

How Is It Working?

My code is based on the algorithm in C ExtractPDFText. Using iTextSharp's PdfReader class to extract the deflated content of every page, I use a simple function ExtractTextFromPDFBytes to extract the text contents from the deflated page.

Further Improvements

Although the code worked well for me, I didn't find in Adobe's PDF reference how to parse special characters. So if someone knows how to do this, just post it and I will update the class.


  • 20th May, 2006: Initial post


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Web Developer
Romania Romania
No Biography provided

You may also be interested in...


Comments and Discussions

QuestionWorks (kind of) Pin
Member 380812010-Dec-13 2:09
memberMember 380812010-Dec-13 2:09 
GeneralGreat job. Pin
Perry Orr7-Aug-13 18:20
memberPerry Orr7-Aug-13 18:20 
QuestionToo simplistic - why I voted 1 Pin
atlaste10-Jul-13 23:28
memberatlaste10-Jul-13 23:28 
AnswerRe: Too simplistic - why I voted 1 Pin
Perry Orr7-Aug-13 18:22
memberPerry Orr7-Aug-13 18:22 
GeneralRe: Too simplistic - why I voted 1 Pin
atlaste2-Mar-14 22:30
memberatlaste2-Mar-14 22:30 
QuestionHow it work!? Pin
reza2168116-Apr-13 7:51
memberreza2168116-Apr-13 7:51 
Questionlayout Pin
tmac1211-Mar-13 0:32
membertmac1211-Mar-13 0:32 
QuestionThank you! Pin
Joseph guidry8-Jan-13 10:15
memberJoseph guidry8-Jan-13 10:15 
Thanks your a life saver. Cool | :cool:
SuggestionPdf to text conversion in Pin
HighCommand18-Dec-12 9:24
memberHighCommand18-Dec-12 9:24 
GeneralRe: Pdf to text conversion in Pin
Whilone333-Feb-16 23:37
memberWhilone333-Feb-16 23:37 
BugFound bug Pin
MunissoR24-Apr-12 2:57
memberMunissoR24-Apr-12 2:57 
AnswerRe: Found bug Pin
fborelli4-Jul-12 9:33
memberfborelli4-Jul-12 9:33 
GeneralGreat Post, Works Great! Pin
Member 202264516-Apr-12 6:13
memberMember 202264516-Apr-12 6:13 
QuestionGreat post! Pin
Eric Castellon9-Apr-12 10:35
memberEric Castellon9-Apr-12 10:35 
GeneralMy vote of 5 Pin
brinda roy21-Feb-12 0:41
memberbrinda roy21-Feb-12 0:41 
GeneralMy vote of 1 Pin
mjkhan78620-Jan-12 22:35
membermjkhan78620-Jan-12 22:35 
Questionhow to export data from excel to PDF ? Pin
nimolZero28-Aug-11 7:10
membernimolZero28-Aug-11 7:10 
Questionnot work Pin
cutithongtin1-Aug-11 15:45
membercutithongtin1-Aug-11 15:45 
QuestionDosn't work. Pin
sasirekam29-Jun-11 20:13
membersasirekam29-Jun-11 20:13 
GeneralAlternate Solution Pin
kaaskop7-May-11 4:44
memberkaaskop7-May-11 4:44 
GeneralRe: Alternate Solution Pin
Wizdave052-Feb-12 10:04
memberWizdave052-Feb-12 10:04 
GeneralRe: Alternate Solution Pin
Member 864124213-Feb-12 16:45
memberMember 864124213-Feb-12 16:45 
GeneralRe: Alternate Solution Pin
Member 909494814-Aug-12 14:45
memberMember 909494814-Aug-12 14:45 
GeneralRe: Alternate Solution Pin
James Henze29-Nov-13 6:39
memberJames Henze29-Nov-13 6:39 
General(Solved) Error when reading some document (page missing) Pin
Lord TaGoH8-Apr-11 1:13
memberLord TaGoH8-Apr-11 1:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.180111.1 | Last Updated 20 May 2006
Article Copyright 2006 by Zollor
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid