Often we need to provide a mechanism for users of our site to download content in the form of PDF files. While this in itself is not technically challenging, I recently had a need to generate customized PDF's on a per-user basis. The application in question basically provides a way for attendees of particular trainings to access their individualized certificates of completion through a link which is emailed to them.
It was decided that persisting the many individual certificates which might accumulate on the server over many trainings (we do a LOT of trainings each year) made no sense whatsoever.
Image by Curtis Ellis | Some Rights Reserved
12/9/2013 - NOTE: Several people have commented on the fact that there is not a source code download available with this article. Ordinarily, I try to provide a link to a functional code example on my Github repo. In this case, the code discussed in the article is a small piece of a functional site example, and it was not practical at the time of writing to build out an example project without losing the important bits amongst the models, controllers, views, etc. necessary to properly demo the code. I may do a walk-through as a separate series in the future. Meanwhile, if you have questions about implementing the code below, p[lease feel free to contact me in the comments, or at the email address on my blog: http://typecastexception.com.
Considering the application database already contains all the information required to generate an attendee-specific certificate from a template when one is needed, we decided that certificates would be created on a "just-in-time" basis, populated with the training and attendee specifics, and streamed to the user as a download, without ever being persisted to the server.
In this article, we will examine the specifics of this "Just-In-Time" PDF generation.
As we discussed in a Splitting and Merging Pdf Files in C# Using iTextSharp, iTextSharp is a port of the Java-based iText library for working with and manipulating PDF files programmatically. We will need to pull iTextSharp into our project in order to do what we need to do. You can get iTextSharp binaries as described in the previous post, or you can take the easier approach and use Nuget.
Note that there are two versions of the iTextSharp library available via Nuget – the standard iTextSharp library, and the iTextSharp LGPL/MPL version. The licensing of the standard iTextSharp library is more restrictive, having moved to the GNU Affero General Public License (AGPL). In this project I used the LGPL/MPL version.
You can quickly install iTextSharp via the Nuget Package Manager Console by doing:
Install iTextSharp via Nuget Package Manager Console:
PM> Install-Package iTextSharp-LGPL
For our purposes, we are intending to merge our data into a pre-existing PDF form template, and then flatten the PDF document, including the populated form fields. We will do this entirely in memory, and then stream the result back to the caller.
We will first set up a core class, PdfMergeStreamer
, which will take in a file path (which points to the PDF Form template), an IEnumerable<IPdfMergeData>
, and a System.IO.MemoryStream
. IPdfMergeData
is an interface which will represent an object containing a Dictionary mapping text values (the Dictionary values) to the Pdf Form fields in our template by name (the Dictionary keys).
Our PdfMergeStreamer
class will basically consume a list of one or more merge data items, populate and flatten the template for each, and add each to a single output document (think Word Merge, but with PDF). Of course, there's more to it than that, and we'll walk through it in a moment.
The basic idea was to create a generalized class which can be used to populate any PDF form template, so long as it is provided with the correct values for each field in the template.
PdfMergeStreamer Class:
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.Collections.Generic;
using TrainingDepartmentAzure.Models;
namespace TrainingDepartmentAzure
{
public class PdfMergeStreamer
{
public void fillPDF(string templatePath, IEnumerable<IPdfMergeData> mergeDataItems,
System.IO.MemoryStream outputStream)
{
var pagesAll = new List<byte[]>();
byte[] pageBytes = null;
foreach (var mergeItem in mergeDataItems)
{
var templateReader = new iTextSharp.text.pdf.PdfReader(templatePath);
using (var tempStream = new System.IO.MemoryStream())
{
PdfStamper stamper = new PdfStamper(templateReader, tempStream);
stamper.FormFlattening = true;
AcroFields fields = stamper.AcroFields;
stamper.Writer.CloseStream = false;
var fieldVals = mergeItem.MergeFieldValues;
foreach (string name in fieldVals.Keys)
{
fields.SetField(name, fieldVals[name]);
}
stamper.Close();
tempStream.Position = 0;
pageBytes = tempStream.ToArray();
pagesAll.Add(pageBytes);
}
}
Document mainDocument = new Document(PageSize.A4);
var pdfCopier = new PdfSmartCopy(mainDocument, outputStream);
pdfCopier.CloseStream = false;
mainDocument.Open();
foreach (var pageByteArray in pagesAll)
{
mainDocument.NewPage();
pdfCopier.AddPage(pdfCopier.GetImportedPage(new PdfReader(pageByteArray), 1));
}
pdfCopier.Close();
outputStream.Position = 0;
}
}
}
As we can see, this result in a rather monolithic chunk of code, and could probably be refactored. For now, though, we will leave it as is.
Now let's take a look at what we are passing in. The templatePath
is, obviously, a path to a file on the local file system. outputStream
is passed in by the caller, and will be used once this method returns to consume or otherwise use the resulting PDF file (in my case, streaming it to the end user as a file download). Which leaves our IEnumerable<IPdfMergeData>
.
In my case I decided I wanted to be able to create different merge templates, which required different sets of merge fields, all of which would need to be mapped according to the specific data required for the merge. While we could have simply passed in a Dictionary<string, string>
straight way, I decided that creating a specific interface would make the intent more clear, and also force me to write a concrete implementation for each mapping.
The interface itself is straightforward:
The IPdfMergeData Interface:
public interface IPdfMergeData
{
IDictionary<string, string> MergeFieldValues { get; }
}
NOTE - A commentor expressed concern born of some confusion that you might need to have Scirbus, LibreOffice, or whatever program used to create the template installed onm the remote server when using the code in this article to generate PDFs from the template. Let me be clear that you do not. You need one of these programs only to create the template itself, which is then saved on the server and consumed by the code in this article.
There are a number of options for creating a PDF Form template, the first of which that comes to mind being (obviously) Adobe Acrobat. However, Acrobat is a little spendy, so if you don't have access to that, you might avail yourself of the Open-Source program, Scribus.
Scribus is not a PDF creation program per se, in fact it is a page layout application. However, it present an Export to PDF option, and allows us to place named form fields on our page, along with text, images, and other content.
There is a small learning curve to using Scribus to produce a workable PDF form, and I will discuss that in another post. However, three suggestions to consider when using Scribus to make your form:
- Place form fields on their own layer in the Scribus Document. This keeps the fields separated from other content, and makes editing much easier.
- When you export, use use the PDF 1.4 (Acrobat 5) setting. There seem to be issues with the other PDF settings.
- Be careful with font choices when using Scribus. While most fonts will appear correctly in the Scribus document itself, when exporting to PDF in the older format(s) currently supported by Scribus, they do not always display correctly on the output PDF document.
- Make sure to name your fields carefully when adding them to the PDF Form. The field names must match the keys you provide in your concrete implementation of
IPdfMergeData
Obviously, using Acrobat is preferred for creating a template for our purpose. However, I had to use Scribus to create a template for my own application, and after a little trial and error, it worked just fine.
Ok, now that we have the basic pieces in place, let's walk through a simplified example of how I implemented the rest of this. This will help you see how you might create your own implementation to meet your own domain-specific requirements.
In my ASP.NET MVC application, one set of my (simplified) requirements are:
- Training attendees should be able to download a personalized certificate via a link received by email
- System Users should be able to create and download a batch of certificates for all attendees for a particular training (sometimes the training sponsor wants the whole thing printed and mailed to them)
- System Users should be able to download individual attendee certificates (sometimes attendees don't have email, or need us to send a certificate directly to a third party. Or, have trouble downloading from the link and need the certificate sent as an actual attachment.
- Certificate PDF files should not be persisted on the server, but instead will be streamed directly to the client browser after generation.
We won't cover all the details of building out this application here – we will focus on the implementation of the certificate generation on demand. Let's start with the methods needed on AttendeeCertificatesController
.
Among the other methods on our controller, two are of specific interest to us here, DownloadCertificate
and DownloadBatch
. These will a return File object in the response body representing an individual certificate, or a batch or certificates respectively, which will be downloaded by the client's browser.
What is important to note here is that all of the processing happens in memory, and the resulting certificate is not persisted locally on the server.
Note that for simplicity, I have not implemented much exception handling here, or async processing (which might be in order, since processing a batch of a few hundred certificates could take some time). We'll look at DownloadCertificate
first.
The DownloadCertificate Method of AttendeeCertificatesController
[AllowAnonymous]
public ActionResult DownloadCertificate(int trainingId, string attendeeGuid = null)
{
string validationMessage = "";
Guid guid;
try
{
if (Guid.TryParse(attendeeGuid, out guid))
{
var Db = new TrainingDbAzureEntities();
var table = Db.Attendees;
var attendee = table.First(a => a.AttendeeGuid == guid);
trainingId = attendee.TrainingId;
string templatePath = @AppDomain.CurrentDomain.BaseDirectory
+ @"PdfTemplates\CertificateTemplateForPdf.pdf";
var streamer = new AttendeeCertificateStreamer();
var pdfMemoryStream = streamer.GetPdfStream(attendee, templatePath);
string contentType = "application/pdf";
var cd = new System.Net.Mime.ContentDisposition();
cd.Inline = false;
cd.FileName = this.getPdfFileName(attendee.FullName, attendee.AttendeeGuid);
Response.AppendHeader("Content-Disposition", cd.ToString());
return File(pdfMemoryStream.ToArray(), contentType);
}
else
{
validationMessage = "The certificate you have requested does not exist.";
return Index(trainingId, validationMessage);
}
}
catch (Exception)
{
return null;
}
}
In the above, notice we have defined a helper method,
getPdfFileName
which takes some attendee-specific input and creates a suitable name for the file prior to download. The idea was to create a file name with a user-friendly component, but which would be suitably unique. The method is used by both
DownloadCertificate
and
DownloadBatch
, and is as follows:
The getPdfFileName Method:
string getPdfFileName(string friendlyName, Guid guid)
{
var rgx = new System.Text.RegularExpressions.Regex("[^a-zA-Z0-9 -]");
string result = rgx.Replace(friendlyName, "");
return string.Format("{0}-{1}.pdf", friendlyName, guid.ToString());
}
We route requests to
DownloadCertificate
by placing the following
custom route definition in our
RouteConfig.RegisterRoutes
method:
Route Definition for DownloadCertificate:
routes.MapRoute(
name: "AttendeeCertificateDownload",
url: "AttendeeCertificates/Download/{trainingId}/{attendeeGuid}",
defaults: new { controller = "AttendeeCertificates", action = "DownloadCertificate" }
);
When a request is routed to the
DownloadCertificate
method above, we can see that the
attendeeGuid
route parameter is used to look up the specific attendee record in the database (the Guid in the database is not used as the primary key, hence the slightly more clumsy lookup).
Next, we grab the path to our PDF form template, and then initialize an instance of a new class,
AttendeeCertificateStreamer
. What, you say? We haven't discussed this one yet.
The
AttendeeCertificateStreamer
provides a layer of abstraction between our controller and the
PdfMergeStreamer
class we examined previously, and handles the domain-specific implementation details related to creating certificates of attendance before calling into the more general
PdfMergeStreamer
.
As we can see in the following code,
AttendeeCertificateStreamer
accepts attendee data from our controller and maps it into a form useable by the first class we examined,
PdfMergeStreamer
.
The AttendeeCertificateStreamer Class:
public class AttendeeCertificateStreamer
{
public System.IO.MemoryStream GetPdfStream(IEnumerable<Attendee> attendees,
string templatePath)
{
var util = new PdfMergeStreamer();
var pdfMemoryStream = new System.IO.MemoryStream();
IPdfMergeData mergeData = this.getAttendeeMergeData(attendees);
util.fillPDF(templatePath, mergeData, pdfMemoryStream);
return pdfMemoryStream;
}
public System.IO.MemoryStream GetPdfStream(Attendee attendee,
string templatePath)
{
var attendees = new List<Attendee>();
attendees.Add(attendee);
return this.GetPdfStream(attendees, templatePath);
}
IEnumerable<IPdfMergeData> getAttendeeMergeData(IEnumerable<Attendee> attendees)
{
var output = new List<IPdfMergeData>();
foreach (var attendee in attendees)
{
output.Add(new AttendeeCertificateMergeData(attendee));
}
return output;
}
}
In the above, we pass our template path and a single instance of
Attendee
to the second of the two
GetPdfStream
methods. The single instance is added to a list and passed to
getAttendeeMergeData
, which performs the mapping we need for our template. As we can see, the
getAttendeeMergeData
method uses each instance of Attendee to initialize a new instance of AttendeeCertificateMergeData.
Remember our interface,
IPdfMergeData
?
AttendeeCertificateMergeData
is the concrete implementation we will use specifically for mapping to our PDF Form template.
Template-Specific Implementation of IPdfMergeData:
public class AttendeeCertificateMergeData : IPdfMergeData
{
Attendee _attendee;
public AttendeeCertificateMergeData(Attendee attendee)
{
_attendee = attendee;
}
public IDictionary<string, string> MergeFieldValues
{
get { return this.getMergeDictionary(); }
}
IDictionary<string, string> getMergeDictionary()
{
var output = new Dictionary<string, string>();
var training = _attendee.Training;
output.Add("FullName", _attendee.FullName);
output.Add("CourseTitle", training.CourseTitle);
var dyl = string.Format("{0}, {1} in {2}",
training.TrainingPeriod, training.Year, training.Location);
output.Add("DatesYearLocation", dyl);
output.Add("EndDate", training.EndDate.ToShortDateString());
output.Add("CEHours", training.CEHours.ToString());
return output;
}
}
As we can see in the above, I am able to simply pass an instance of
Attendee
in to the constructor, and I am ready to go. The private method
getMergeDictionary
is called when the
MergeFieldValues
property is accessed, and returns a Dictionary containing key-value pairs for each of my template fields.
Clearly, there is not a one-to-one mapping between properties of
Attendee
and fields in my template form. In some cases, the form template requires concatenations, and/or other manipulations of attendee data to make it suitable for presentation.
Obviously, each dictionary key must be carefully mapped by name to the corresponding Form field in the PDF Form template.
If we look through the code for
AttendeeCertificateController
,
AttendeeCertificateSreamer
, and
PdfMergeStreamer
, we find we have a basic flow that looks like the diagram below.
Simplified Application Flow:
An HTTP request is routed to our controller, where the incoming route parameters are used to retrieve an instance of a specific attendee. From there, the attendee is passed, along with a path pointing to a locally persisted PDF form template file, to an instance of AttendeeCertificateStreamer
.
AttendeeCertificateStreamer
accepts the attendee instance and creates an instance of AttendeeCertificateMergeData
, our concrete implementation of IPdfMergeData
. This, along with the template path and a newly-created System.IO.MemoryStream
are then passed to PdfMergeStreamer
.
PdfMergeStreamer
processes the data, and adds the merged PDF file to the MemoryStream
, which is returned by AttendeeCertificateStreamer
back to our controller. The controller sets the content disposition, file name, and content type and returns a FileContentResult
, which adds our file stream to the HTTP response body, and is ultimately downloaded by the client's browser.
It was recognized that in addition to attendees being able to download certificates individually, it would also be necessary for system users to be able to create and download whole batches of certificates for all attendees. Sometimes the training sponsor wants these printed and mailed.
In reality, the core class,
PdfMergeStreamer
is set up to handle batches from jump – in fact that is the only way it works. We have thus far examined what is actually the special case – a single certificate (in other words, a "batch" of one).
Nonetheless, making a single certificate available to a specific attendee through a link is a different scenario than allowing internal system users to access certificate data in bulk, and hence requires a slightly modified controller method.
The Download Batch Method on Attendee Certificates Controller:
[Authorize]
public ActionResult DownloadBatch(Training training, IEnumerable<Attendee> attendees)
{
string validationMessage = "";
try
{
if (attendees.Count() > 0)
{
string templatePath = @AppDomain.CurrentDomain.BaseDirectory
+ @"PdfTemplates\CertificateTemplateForPdf.pdf";
var streamer = new AttendeeCertificateStreamer();
var pdfMemoryStream = streamer.GetPdfStream(attendees, templatePath);
string contentType = "application/pdf";
var cd = new System.Net.Mime.ContentDisposition();
cd.Inline = false;
cd.FileName = this.getPdfFileName(training.FullName, training.TrainingGuid);
Response.AppendHeader("Content-Disposition", cd.ToString());
return File(pdfMemoryStream.ToArray(), contentType);
}
else
{
validationMessage = "You must select at least one attendee.";
return Index(training.TrainingId, validationMessage);
}
}
catch (Exception)
{
throw;
}
}
As we can see above, there is not much different between this and the previously discussed DownloadCertificate
method. In fact, these two methods are probably candidates for refactoring, but we won't worry about that right now.
Primarily, this method accepts an IEnumerable<Attendee>
as an argument, and is flagged with an [Authorize]
attribute, meaning only authorized system users may access this method.
Requests are routed to the DownloadBatch
method by adding the following route definition to the Route.config
file:
routes.MapRoute(
name: "DownloadBatch",
url: "AttendeeCertificates/DownloadBatch",
defaults: new { controller = "AttendeeCertificates", action = "DownloadBatch" }
);
Obviously, the examples in this article are tuned to the specific needs of my own application (though a simplified things as much as possible). However, the first class we examined,
PdfMergeStreamer
, in conjunction with the interface
IPdfMerge
data, should get you started. Feel free to email me or comment with questions, or to point out errors. Feedback is welcome and appreciated.
John on GoogleCodeProject