Click here to Skip to main content
11,930,048 members (40,522 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as


4 bookmarked

Convert PDF file content into string using C#

, 18 May 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
Convert PDF content into text using C#, for beginners.


Hello friends, this is my first article in This article is mainly intended to read content from a PDF file and convert that into a string using C#.


This was actually assigned as a task for me. Actually I Googled about this and finally did it with a simple code. I'm sure this code will be very helpful for beginners.

Using the code

The following steps will guide you to read content from a PDF file:

  1. To start with this, you need to download itextsharp-all-5.2.1, which can be download from here.
  2. Extract the whole archive (inside itextsharp-all-5.2.1 folder also) to your local directory.
  3. You have successfully completed the initial step in the process..... hurrah.....! ! ! !

    Now open Microsoft Visual studio. For me it is Microsoft Visual C# 2010 Express.

  4. New project --> WindowsFormsApplication --> Give project name (I named mine PDF_To_Text).
  5. Add itextsharp-all-5.2.1.dll as reference.
  6. Select Project menu --> Select Browse tab --> Select itextsharp.dll from the local directory.

  7. Place a "richTextBox1" control in the Form work space.
  8. Now paste the following code in Form1.cs.
  9. using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using iTextSharp.text.pdf;
    using iTextSharp.text.pdf.parser;
    namespace WindowsFormsApplication1
        public partial class Form1 : Form
            public Form1()
                ExtractTextFromPDFPage("c:\sample.pdf", 1);
            public void ExtractTextFromPDFPage(string pdfFile, int pageNumber)
                PdfReader reader = new PdfReader(pdfFile);
                string text = PdfTextExtractor.GetTextFromPage(reader, pageNumber);
                try { reader.Close(); }
                catch { }
                richTextBox1.Text = text;

    Look how simple it is....!!! Smile | <img src= " src="" />

  10. Now Build the solution using Ctrl+Shift+B, or Build the solution by selecting the Build menu from the menu bar.
  11. Once succeeded, Run the application by pressing F5.
  12. You will find the file content is converted into text and displayed in the RichTextBox control.

That's it, you have successfully converted a PDF file into text.


Here c:\sample.pdf is where I kept my PDF file. So you should update the path to your file. The second parameter denotes which page you need to get converted. 


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Software Developer
India India
There are only 10 type of people in this programming world....
one who knows the binary and other who doesn't.

You may also be interested in...

Comments and Discussions

QuestionRegarding Content Pin
Member 109749613-Sep-14 1:17
memberMember 109749613-Sep-14 1:17 
GeneralNot always getting text from PdfTextExtractor.GetText Pin
Member 1032371117-Oct-13 8:57
memberMember 1032371117-Oct-13 8:57 
Questionmore pages Pin
saurabh49parikh27-Aug-12 9:03
membersaurabh49parikh27-Aug-12 9:03 
AnswerRe: more pages Pin
rk_prabakar15-Oct-12 19:46
memberrk_prabakar15-Oct-12 19:46 
Sorry for the late response....
Try the following code
public Form1()
//Iterate the calling function with number of pages in it.
for(int i=1;i<Count;i++)           
ExtractTextFromPDFPage("c:\sample.pdf", i);
And then just append the content to richtextbox control
public void ExtractTextFromPDFPage(string pdfFile, int pageNumber)
            PdfReader reader = new PdfReader(pdfFile);
            string text = PdfTextExtractor.GetTextFromPage(reader, pageNumber);
            try { reader.Close(); }
                  { //Exception handler here
//Append the read content in to the richtextbox control, or any other control that you want            
            richTextBox1.Text += text;
I hope this could work, i have'nt try this on my machine. I'm just giving some idea about it. Roll eyes | :rolleyes: Simple is'nt? Poke tongue | ;-P
Thanks and Regards,


GeneralRe: more pages Pin
voodark27-Nov-14 3:35
membervoodark27-Nov-14 3:35 
Questionno text in rich text box Pin
mlan sopno9-Jun-12 1:04
membermlan sopno9-Jun-12 1:04 
AnswerRe: no text in rich text box Pin
rk_prabakar29-Aug-12 0:01
memberrk_prabakar29-Aug-12 0:01 
QuestionHelpful post Pin
Member 287297829-May-12 7:50
memberMember 287297829-May-12 7:50 
AnswerRe: Helpful post Pin
rk_prabakar29-May-12 19:57
memberrk_prabakar29-May-12 19:57 
GeneralMy vote of 1 Pin
stooboo19-May-12 4:33
memberstooboo19-May-12 4:33 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.151126.1 | Last Updated 18 May 2012
Article Copyright 2012 by ♥…ЯҠ…♥
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid