C# - Generate and Deliver PDF Files On-Demand from a Template Using iTextSharp

John Atten

5.00/5 (12 votes)

Dec 3, 2013

CPOL

13 min read

103798

In this article, we will examine the specifics of this "Just-In-Time" PDF generation.

Often we need to provide a mechanism for users of our site to download content in the form of PDF files. While this in itself is not technically challenging, I recently had a need to generate customized PDF's on a per-user basis. The application in question basically provides a way for attendees of particular trainings to access their individualized certificates of completion through a link which is emailed to them.

It was decided that persisting the many individual certificates which might accumulate on the server over many trainings (we do a LOT of trainings each year) made no sense whatsoever.

Image by Curtis Ellis | Some Rights Reserved

12/9/2013 - NOTE: Several people have commented on the fact that there is not a source code download available with this article. Ordinarily, I try to provide a link to a functional code example on my Github repo. In this case, the code discussed in the article is a small piece of a functional site example, and it was not practical at the time of writing to build out an example project without losing the important bits amongst the models, controllers, views, etc. necessary to properly demo the code. I may do a walk-through as a separate series in the future. Meanwhile, if you have questions about implementing the code below, p[lease feel free to contact me in the comments, or at the email address on my blog: http://typecastexception.com.

Considering the application database already contains all the information required to generate an attendee-specific certificate from a template when one is needed, we decided that certificates would be created on a "just-in-time" basis, populated with the training and attendee specifics, and streamed to the user as a download, without ever being persisted to the server.

In this article, we will examine the specifics of this "Just-In-Time" PDF generation.

The Heart of the Matter – iTextSharp

As we discussed in a Splitting and Merging Pdf Files in C# Using iTextSharp, iTextSharp is a port of the Java-based iText library for working with and manipulating PDF files programmatically. We will need to pull iTextSharp into our project in order to do what we need to do. You can get iTextSharp binaries as described in the previous post, or you can take the easier approach and use Nuget.

Get iTextSharp Using Nuget

Note that there are two versions of the iTextSharp library available via Nuget – the standard iTextSharp library, and the iTextSharp LGPL/MPL version. The licensing of the standard iTextSharp library is more restrictive, having moved to the GNU Affero General Public License (AGPL). In this project I used the LGPL/MPL version.

You can quickly install iTextSharp via the Nuget Package Manager Console by doing:

Install iTextSharp via Nuget Package Manager Console:

PM> Install-Package iTextSharp-LGPL

Basic Application Structure

For our purposes, we are intending to merge our data into a pre-existing PDF form template, and then flatten the PDF document, including the populated form fields. We will do this entirely in memory, and then stream the result back to the caller.

We will first set up a core class, PdfMergeStreamer, which will take in a file path (which points to the PDF Form template), an IEnumerable<IPdfMergeData> , and a System.IO.MemoryStream. IPdfMergeData is an interface which will represent an object containing a Dictionary mapping text values (the Dictionary values) to the Pdf Form fields in our template by name (the Dictionary keys).

Our PdfMergeStreamer class will basically consume a list of one or more merge data items, populate and flatten the template for each, and add each to a single output document (think Word Merge, but with PDF). Of course, there's more to it than that, and we'll walk through it in a moment.

The basic idea was to create a generalized class which can be used to populate any PDF form template, so long as it is provided with the correct values for each field in the template.

The PdfMergeStreamer Class

PdfMergeStreamer Class:

using iTextSharp.text;
using iTextSharp.text.pdf;
using System.Collections.Generic;
using TrainingDepartmentAzure.Models;
  
namespace TrainingDepartmentAzure
{
    public class PdfMergeStreamer
    {      
        public void fillPDF(string templatePath, IEnumerable<IPdfMergeData> mergeDataItems,
            System.IO.MemoryStream outputStream)
        {
            // Agggregate successive pages here:
            var pagesAll = new List<byte[]>();
  
            // Hold individual pages Here:
            byte[] pageBytes = null;
  
            foreach (var mergeItem in mergeDataItems)
            {
                // Read the form template for each item to be output:
                var templateReader = new iTextSharp.text.pdf.PdfReader(templatePath);
                using (var tempStream = new System.IO.MemoryStream())
                {
                    PdfStamper stamper = new PdfStamper(templateReader, tempStream);
                    stamper.FormFlattening = true;
                    AcroFields fields = stamper.AcroFields;
                    stamper.Writer.CloseStream = false;
  
                    // Grab a reference to the Dictionary in the current merge item:
                    var fieldVals = mergeItem.MergeFieldValues;
  
                    // Walk the Dictionary keys, fnid teh matching AcroField, 
                    // and set the value:
                    foreach (string name in fieldVals.Keys)
                    {
                        fields.SetField(name, fieldVals[name]);
                    }
  
                    // If we had not set the CloseStream property to false, 
                    // this line would also kill our memory stream:
                    stamper.Close();
  
                    // Reset the stream position to the beginning before reading:
                    tempStream.Position = 0;
  
                    // Grab the byte array from the temp stream . . .
                    pageBytes = tempStream.ToArray();
  
                    // And add it to our array of all the pages:
                    pagesAll.Add(pageBytes);
                }
            }
  
            // Create a document container to assemble our pieces in:
            Document mainDocument = new Document(PageSize.A4);
  
            // Copy the contents of our document to our output stream:
            var pdfCopier = new PdfSmartCopy(mainDocument, outputStream);
  
            // Once again, don't close the stream when we close the document:
            pdfCopier.CloseStream = false;
  
            mainDocument.Open();
            foreach (var pageByteArray in pagesAll)
            {
                // Copy each page into the document:
                mainDocument.NewPage();
                pdfCopier.AddPage(pdfCopier.GetImportedPage(new PdfReader(pageByteArray), 1));
            }
            pdfCopier.Close();
  
            // Set stream position to the beginning before returning:
            outputStream.Position = 0;
        }
    }
}

As we can see, this result in a rather monolithic chunk of code, and could probably be refactored. For now, though, we will leave it as is.

Now let's take a look at what we are passing in. The templatePath is, obviously, a path to a file on the local file system. outputStream is passed in by the caller, and will be used once this method returns to consume or otherwise use the resulting PDF file (in my case, streaming it to the end user as a file download). Which leaves our IEnumerable<IPdfMergeData>.

In my case I decided I wanted to be able to create different merge templates, which required different sets of merge fields, all of which would need to be mapped according to the specific data required for the merge. While we could have simply passed in a Dictionary<string, string> straight way, I decided that creating a specific interface would make the intent more clear, and also force me to write a concrete implementation for each mapping.

The interface itself is straightforward:

The IPdfMergeData Interface:

public interface IPdfMergeData
{
    IDictionary<string, string> MergeFieldValues { get; }
}

Creating a PDF Form Template

NOTE - A commentor expressed concern born of some confusion that you might need to have Scirbus, LibreOffice, or whatever program used to create the template installed onm the remote server when using the code in this article to generate PDFs from the template. Let me be clear that you do not. You need one of these programs only to create the template itself, which is then saved on the server and consumed by the code in this article.

There are a number of options for creating a PDF Form template, the first of which that comes to mind being (obviously) Adobe Acrobat. However, Acrobat is a little spendy, so if you don't have access to that, you might avail yourself of the Open-Source program, Scribus.

Scribus is not a PDF creation program per se, in fact it is a page layout application. However, it present an Export to PDF option, and allows us to place named form fields on our page, along with text, images, and other content.

There is a small learning curve to using Scribus to produce a workable PDF form, and I will discuss that in another post. However, three suggestions to consider when using Scribus to make your form:

Place form fields on their own layer in the Scribus Document. This keeps the fields separated from other content, and makes editing much easier.
When you export, use use the PDF 1.4 (Acrobat 5) setting. There seem to be issues with the other PDF settings.
Be careful with font choices when using Scribus. While most fonts will appear correctly in the Scribus document itself, when exporting to PDF in the older format(s) currently supported by Scribus, they do not always display correctly on the output PDF document.
Make sure to name your fields carefully when adding them to the PDF Form. The field names must match the keys you provide in your concrete implementation of IPdfMergeData

Obviously, using Acrobat is preferred for creating a template for our purpose. However, I had to use Scribus to create a template for my own application, and after a little trial and error, it worked just fine.

A Real-World Example

Ok, now that we have the basic pieces in place, let's walk through a simplified example of how I implemented the rest of this. This will help you see how you might create your own implementation to meet your own domain-specific requirements.

In my ASP.NET MVC application, one set of my (simplified) requirements are:

Training attendees should be able to download a personalized certificate via a link received by email
System Users should be able to create and download a batch of certificates for all attendees for a particular training (sometimes the training sponsor wants the whole thing printed and mailed to them)
System Users should be able to download individual attendee certificates (sometimes attendees don't have email, or need us to send a certificate directly to a third party. Or, have trouble downloading from the link and need the certificate sent as an actual attachment.
Certificate PDF files should not be persisted on the server, but instead will be streamed directly to the client browser after generation.

We won't cover all the details of building out this application here – we will focus on the implementation of the certificate generation on demand. Let's start with the methods needed on AttendeeCertificatesController.

The Attendee Certificates Controller – Where It All Begins

Among the other methods on our controller, two are of specific interest to us here, DownloadCertificate and DownloadBatch. These will a return File object in the response body representing an individual certificate, or a batch or certificates respectively, which will be downloaded by the client's browser.

What is important to note here is that all of the processing happens in memory, and the resulting certificate is not persisted locally on the server.

Note that for simplicity, I have not implemented much exception handling here, or async processing (which might be in order, since processing a batch of a few hundred certificates could take some time). We'll look at DownloadCertificate first.

The DownloadCertificate Method of AttendeeCertificatesController

[AllowAnonymous]
public ActionResult DownloadCertificate(int trainingId, string attendeeGuid = null)
{
    string validationMessage = "";
    Guid guid;
    try
    {
        if (Guid.TryParse(attendeeGuid, out guid))
        {
            var Db = new TrainingDbAzureEntities();
            var table = Db.Attendees;
            var attendee = table.First(a => a.AttendeeGuid == guid);
            trainingId = attendee.TrainingId;
  
            string templatePath = @AppDomain.CurrentDomain.BaseDirectory 
            + @"PdfTemplates\CertificateTemplateForPdf.pdf";
            var streamer = new AttendeeCertificateStreamer();
            var pdfMemoryStream = streamer.GetPdfStream(attendee, templatePath);
  
            string contentType = "application/pdf";
            var cd = new System.Net.Mime.ContentDisposition();
            cd.Inline = false;
            cd.FileName = this.getPdfFileName(attendee.FullName, attendee.AttendeeGuid);
            Response.AppendHeader("Content-Disposition", cd.ToString());
            return File(pdfMemoryStream.ToArray(), contentType);
        }
        else
        {
            validationMessage = "The certificate you have requested does not exist.";
            return Index(trainingId, validationMessage);
        }
    }
    catch (Exception)
    {
        return null;
    }
}

In the above, notice we have defined a helper method, getPdfFileName which takes some attendee-specific input and creates a suitable name for the file prior to download. The idea was to create a file name with a user-friendly component, but which would be suitably unique. The method is used by both DownloadCertificate and DownloadBatch, and is as follows:

The getPdfFileName Method:

string getPdfFileName(string friendlyName, Guid guid)
{
    var rgx = new System.Text.RegularExpressions.Regex("[^a-zA-Z0-9 -]");
    string result = rgx.Replace(friendlyName, "");
    return string.Format("{0}-{1}.pdf", friendlyName, guid.ToString());
}

We route requests to DownloadCertificate by placing the following custom route definition in our RouteConfig.RegisterRoutes method:

Route Definition for DownloadCertificate:

routes.MapRoute(
    name: "AttendeeCertificateDownload",
    url: "AttendeeCertificates/Download/{trainingId}/{attendeeGuid}",
    defaults: new { controller = "AttendeeCertificates", action = "DownloadCertificate" }
);

When a request is routed to the DownloadCertificate method above, we can see that the attendeeGuid route parameter is used to look up the specific attendee record in the database (the Guid in the database is not used as the primary key, hence the slightly more clumsy lookup). Next, we grab the path to our PDF form template, and then initialize an instance of a new class, AttendeeCertificateStreamer. What, you say? We haven't discussed this one yet.

The Attendee Certificate Streamer Class

The AttendeeCertificateStreamer provides a layer of abstraction between our controller and the PdfMergeStreamer class we examined previously, and handles the domain-specific implementation details related to creating certificates of attendance before calling into the more general PdfMergeStreamer. As we can see in the following code, AttendeeCertificateStreamer accepts attendee data from our controller and maps it into a form useable by the first class we examined, PdfMergeStreamer.

The AttendeeCertificateStreamer Class:

public class AttendeeCertificateStreamer
{
    public System.IO.MemoryStream GetPdfStream(IEnumerable<Attendee> attendees, 
        string templatePath)
    {
        var util = new PdfMergeStreamer();
        var pdfMemoryStream = new System.IO.MemoryStream();
  
        IPdfMergeData mergeData = this.getAttendeeMergeData(attendees);
        util.fillPDF(templatePath, mergeData, pdfMemoryStream);
        return pdfMemoryStream;
    }
  
  
    public System.IO.MemoryStream GetPdfStream(Attendee attendee, 
        string templatePath)
    {
        var attendees = new List<Attendee>();
        attendees.Add(attendee);
        return this.GetPdfStream(attendees, templatePath);
    }
  
  
    IEnumerable<IPdfMergeData> getAttendeeMergeData(IEnumerable<Attendee> attendees)
    {
        var output = new List<IPdfMergeData>();
        foreach (var attendee in attendees)
        {
            output.Add(new AttendeeCertificateMergeData(attendee));
        }
        return output;
    }
}

In the above, we pass our template path and a single instance of Attendee to the second of the two GetPdfStream methods. The single instance is added to a list and passed to getAttendeeMergeData, which performs the mapping we need for our template. As we can see, the getAttendeeMergeData method uses each instance of Attendee to initialize a new instance of AttendeeCertificateMergeData.

Implementing IPdfMergeData: Template-Specific Mapping

Remember our interface, IPdfMergeData? AttendeeCertificateMergeData is the concrete implementation we will use specifically for mapping to our PDF Form template.

Template-Specific Implementation of IPdfMergeData:

public class AttendeeCertificateMergeData : IPdfMergeData
{
    Attendee _attendee;
    
    public AttendeeCertificateMergeData(Attendee attendee)
    {
        _attendee = attendee;
    }

    public IDictionary<string, string> MergeFieldValues
    {
        get { return this.getMergeDictionary(); }
    }

    IDictionary<string, string> getMergeDictionary()
    {
        var output = new Dictionary<string, string>();
        var training = _attendee.Training;
        output.Add("FullName", _attendee.FullName);
        output.Add("CourseTitle", training.CourseTitle);
        var dyl = string.Format("{0}, {1} in {2}", 
            training.TrainingPeriod, training.Year, training.Location);
        output.Add("DatesYearLocation", dyl);
        output.Add("EndDate", training.EndDate.ToShortDateString());
        output.Add("CEHours", training.CEHours.ToString());
        return output;
    }
}

As we can see in the above, I am able to simply pass an instance of Attendee in to the constructor, and I am ready to go. The private method getMergeDictionary is called when the MergeFieldValues property is accessed, and returns a Dictionary containing key-value pairs for each of my template fields. Clearly, there is not a one-to-one mapping between properties of Attendee and fields in my template form. In some cases, the form template requires concatenations, and/or other manipulations of attendee data to make it suitable for presentation. Obviously, each dictionary key must be carefully mapped by name to the corresponding Form field in the PDF Form template.

Application Flow

If we look through the code for AttendeeCertificateController, AttendeeCertificateSreamer, and PdfMergeStreamer, we find we have a basic flow that looks like the diagram below.

Simplified Application Flow:

An HTTP request is routed to our controller, where the incoming route parameters are used to retrieve an instance of a specific attendee. From there, the attendee is passed, along with a path pointing to a locally persisted PDF form template file, to an instance of AttendeeCertificateStreamer. AttendeeCertificateStreamer accepts the attendee instance and creates an instance of AttendeeCertificateMergeData, our concrete implementation of IPdfMergeData. This, along with the template path and a newly-created System.IO.MemoryStream are then passed to PdfMergeStreamer. PdfMergeStreamer processes the data, and adds the merged PDF file to the MemoryStream, which is returned by AttendeeCertificateStreamer back to our controller. The controller sets the content disposition, file name, and content type and returns a FileContentResult, which adds our file stream to the HTTP response body, and is ultimately downloaded by the client's browser.

The Download Batch Method of Attendee Certificates Controller

It was recognized that in addition to attendees being able to download certificates individually, it would also be necessary for system users to be able to create and download whole batches of certificates for all attendees. Sometimes the training sponsor wants these printed and mailed. In reality, the core class, PdfMergeStreamer is set up to handle batches from jump – in fact that is the only way it works. We have thus far examined what is actually the special case – a single certificate (in other words, a "batch" of one). Nonetheless, making a single certificate available to a specific attendee through a link is a different scenario than allowing internal system users to access certificate data in bulk, and hence requires a slightly modified controller method.

The Download Batch Method on Attendee Certificates Controller:

[Authorize]
public ActionResult DownloadBatch(Training training, IEnumerable<Attendee> attendees)
{
    string validationMessage = "";
    try
    {
        if (attendees.Count() > 0)
        {
            string templatePath = @AppDomain.CurrentDomain.BaseDirectory 
                + @"PdfTemplates\CertificateTemplateForPdf.pdf";
  
            var streamer = new AttendeeCertificateStreamer();

            // This line, and the method signature, are the only two things that
            // are different between this class and DownloadCertificate. Refactor?
            var pdfMemoryStream = streamer.GetPdfStream(attendees, templatePath);
  
            string contentType = "application/pdf";
            var cd = new System.Net.Mime.ContentDisposition();
            cd.Inline = false;
            cd.FileName = this.getPdfFileName(training.FullName, training.TrainingGuid);
            Response.AppendHeader("Content-Disposition", cd.ToString());
            return File(pdfMemoryStream.ToArray(), contentType);
        }
        else
        {
            validationMessage = "You must select at least one attendee.";
            return Index(training.TrainingId, validationMessage);
        }
    }
    catch (Exception)
    {
        throw;
    }
}

As we can see above, there is not much different between this and the previously discussed DownloadCertificate method. In fact, these two methods are probably candidates for refactoring, but we won't worry about that right now.

Primarily, this method accepts an IEnumerable<Attendee> as an argument, and is flagged with an [Authorize] attribute, meaning only authorized system users may access this method. Requests are routed to the DownloadBatch method by adding the following route definition to the Route.config file:

routes.MapRoute(
    name: "DownloadBatch",
    url: "AttendeeCertificates/DownloadBatch",
    defaults: new { controller = "AttendeeCertificates", action = "DownloadBatch" }
);

Wrapping Up

Obviously, the examples in this article are tuned to the specific needs of my own application (though a simplified things as much as possible). However, the first class we examined, PdfMergeStreamer, in conjunction with the interface IPdfMerge data, should get you started. Feel free to email me or comment with questions, or to point out errors. Feedback is welcome and appreciated.