Click here to Skip to main content
Click here to Skip to main content

Splitting and Merging PDF Files in C# Using iTextSharp

, 10 Dec 2013
Rate this:
Please Sign up or sign in to vote.
Splitting and merging PDF files in C# using the iTextSharp library.

I recently posted about using PdfBox.net to manipulate Pdf documents in your C# application. This time, I take a quick look at iTextSharp, another library for working with Pdf documents from within the .NET framework. 

Some Navigation Aids:

What is iTextSharp?

iTextSharp is a direct .NET port of the open source iText Java library for PDF generation and manipulation. As the project’s summary page on SourceForge states, iText “  . . . can be used to create PDF Documents from scratch, to convert XML to PDF . . . to fill out interactive PDF forms, to stamp new content on existing PDF documents, to split and merge existing PDF documents, and much more.”

iTextSharp presents a formidable set of tools for developers who need to create and/or manipulate Pdf files. This does come with a cost, however. The Pdf file format itself is complex; therefore, programming libraries which seek to provide a flexible interface for working with Pdf files become complex by default. iText is no exception.

I noted in my previous post on PdfBox that PdfBox was a little easier for me to get up and running with, at least for rather basic tasks such as splitting and merging existing Pdf files. I also noted that iText looked to be a little more complex, and I was correct. However, iTextSharp does not suffer some of the performance drawbacks inherent to PdfBox, at least on the .net platform.

Superior Performance vs. PdfBox

Aston-Martin-V8-Sports-Car-For-EveryAs I observed in my previous post, PdfBox.net is NOT a direct port of the PdfBox Java library, but instead is a Java library running within .net using IKVM. While I found it very cool to be able to run Java code in a .NET context, there was a serious performance hit, most notably the first time the PdfBox library was called, and the massive IKVM library spun up what amounts to a .Net implementation of the Java Virtual Machine, within which the Java code of the PdfBox library is then executed.

Needless to say, iTextSharp does not suffer this limitation. the library itself it relatively lightweight, and fast.

Extracting and Merging Pages from an Existing Pdf File

One of the most common tasks we need to do is extract pages from one Pdf into a new file. We’ll take a look at some relatively basic sample code which does just that, and get a feel for using the iTextSharp programming model.

In the following code sample, the primary iTextSharp classes we will be using are the PdfReader, Document, PdfCopy, and PdfImportedPage classes. 

My simplified understanding of how this works is as follows: The PdfReader instance contains the content of the source PDF file. The Document class, once initialized with the PdfReader instance and a new output FileStream, essentially becomes a container into which pages extracted from the source file represented in the PdfReader class will be copied. Note that the Document class represents the Pdf content as HTML, which will be used to construct a properly formatted Pdf file. The result is then output to the Filestream, and saved to disk at the location specified by the destination file name.

You can download the iTextSharp source code and binaries as a single package from Files page at the iTextSharp project site. Just click on the “Download itextsharp-all-5.4.0.zip” link. Extract the files from the .zip archive, and stash them somewhere convenient. Next, set a reference in your project to the itextsharp.dll. You will need to browse to the folder where you stashed the extracted contents of the iTextSharp download.

NOTE: The complete example code for this post is available at my Github Repo.

I went ahead and created a project named iTextTools, with a class file named PdfExtractorUtility. Add the following using statements at the top of the file:

Set up references and Using Statements to use iTextSharp

using iTextSharp.text;
using iTextSharp.text.pdf;
using System;
// CLASS DEPENDS ON iTextSharp: http://sourceforge.net/projects/itextsharp/

namespace iTextTools
{
    public class PdfExtractorUtility
    {

    }
}

 

First, I’ll add a simple method to extract a single page from an existing PDF file and save to a new file:

Extract Single Page from Existing PDF to a new File:

public void ExtractPage(string sourcePdfPath, string outputPdfPath, 
    int pageNumber, string password = "<span style="color: rgb(139, 0, 0);">")
{
    PdfReader reader = null;
    Document document = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    try
    {
        // Intialize a new PdfReader instance with the contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // Capture the correct size and orientation for the page:
        document = new Document(reader.GetPageSizeWithRotation(pageNumber));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(document, 
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

        document.Open();
 
        // Extract the desired page number:
        importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber);
        pdfCopyProvider.AddPage(importedPage);
        document.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

As you can see, simply pass in the path to the source document, the page number to be extracted, and an output file path, and you’re done.

If we want to be able to a range of contiguous pages, we might add another method defining a start and end point:

Extract a Range of Pages from Existing PDF to a new File:

public void ExtractPages(string sourcePdfPath, string outputPdfPath, 
    int startPage, int endPage)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    try
    {
        // Intialize a new PdfReader instance with the contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // For simplicity, I am assuming all the pages share the same size
        // and rotation as the first page:
        sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(sourceDocument, 
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
 
            sourceDocument.Open();
 
        // Walk the specified range and add the page copies to the output file:
        for (int i = startPage; i <= endPage; i++)
        {
            importedPage = pdfCopyProvider.GetImportedPage(reader, i);
            pdfCopyProvider.AddPage(importedPage);
        }
        sourceDocument.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

What if we want non-contiguous pages from the source document? Well, we might override the above method with one which accepts an array of ints representing the desired pages:

Extract multiple non-contiguous pages from Existing PDF to a new File:

public void ExtractPages(string sourcePdfPath, 
    string outputPdfPath, int[] extractThesePages)
{
    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage = null;

    try
    {
        // Intialize a new PdfReader instance with the 
        // contents of the source Pdf file:
        reader = new PdfReader(sourcePdfPath);
 
        // For simplicity, I am assuming all the pages share the same size
        // and rotation as the first page:
        sourceDocument = new Document(reader.GetPageSizeWithRotation(extractThesePages[0]));
 
        // Initialize an instance of the PdfCopyClass with the source 
        // document and an output file stream:
        pdfCopyProvider = new PdfCopy(sourceDocument,
            new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

        sourceDocument.Open();
 
        // Walk the array and add the page copies to the output file:
        foreach (int pageNumber in extractThesePages)
        {
            importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber);
            pdfCopyProvider.AddPage(importedPage);
        }
        sourceDocument.Close();
        reader.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

 

Scratching the Surface

Obviously, the example(s) above are a simplistic first exploration of what appears to be a powerful library. What I notice about iText in general is that, unlike some API’s, the path to achieving your desired result is often not intuitive. I believe this is as much to do with the nature of the PDF file format, and possibly the structure of lower-level libraries upon which iTextSharp is built.

That said, there is without a doubt much to be discerned by exploring the iTextSharp source code. Additionally, there are a number of resources to assist the erstwhile developer in using this library:

Additional Resources for iTextSharp

Lastly, there is a book authored by one of the primary contributors to the iText project, Bruno Lowagie:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

John Atten
Software Developer XIV Solutions
United States United States
My name is John Atten, and my username on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, Java, SQL Server 2012, learning ASP.NET MVC, html 5/CSS/Javascript. I am always looking for new information, and value your feedback (especially where I got something wrong!)
Follow on   Twitter   Google+

Comments and Discussions

 
QuestionMessage Removed PinmemberMember 1090854126-Jun-14 9:22 
Suggestionanother guide for splitting,merging pdfs Pinmembernevil_11918-Mar-14 18:26 
QuestionI used iTextsharp to combine pdf files PinmemberSajitha N Rathnayake1-Mar-14 15:43 
AnswerRe: I used iTextsharp to combine pdf files PinmemberJohn Atten1-Mar-14 15:51 
GeneralRe: I used iTextsharp to combine pdf files PinmemberSajitha N Rathnayake1-Mar-14 15:57 
QuestionThanx Pinmemberjimpar11-Dec-13 5:53 
QuestionImportant Question about Splitting PDFs in iTextSharp/vb PinmemberTarey Wolf29-Dec-13 16:10 
AnswerRe: Important Question about Splitting PDFs in iTextSharp/vb Pinmemberjimpar22-Apr-14 7:52 
QuestionThanks and a query Pinmemberumeshfaq10-Oct-13 5:00 
Questiondoubt Pinmembervineethnair14-Jun-13 0:19 
AnswerRe: doubt PinmemberJohn Atten14-Jun-13 4:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140709.1 | Last Updated 10 Dec 2013
Article Copyright 2013 by John Atten
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid