In this article, we will review how PDF optimization can help reduce file sizes for more efficient file sharing and faster downloads, while maintaining the fidelity of the original document.
Overview: what is PDF optimization and why do you need it?
What is one complaint that many PDF users share? Many PDF files are large. This makes them difficult to share over a network or as a downloadable content piece, and requires a large amount of space to store them. One question we get asked frequently is, “How do I reduce the size of my PDF?”
That’s where Datalogics PDF OPTIMIZER can help. The main goal: reduce file size, improve compatibility, and modify PDF files so they are better tailored toward the needs of each specific target audience. What this means is that optimized PDFs will be more convenient to work with over networks, helping facilitate more efficient file sharing and lowering long-term storage requirements.
Click here for a downloadable white paper which can provide additional insight into PDF optimization.
How does the Datalogics PDF optimization work? To effectively answer that question, let’s first go over the most common reasons PDF files can be larger than they might need to be.
Why do some PDF files take more space than others?
PDF files that seemingly don’t have much content end up taking a huge amount of space. Why is that? In the general case, this can be caused by one of two things (really more than two, but we will discuss the reasons in two general categories for convenience). Images and fonts. Images can be uncompressed, or poorly compressed, and their pixel density can be out of proportion with the purpose of the document. There could also be multiple copies of the same image, just sitting in a document, taking up space. Similarly, embedded fonts can be incorrectly subsetted, not subsetted at all, or even have multiple copies of the exact same font in a document. Let’s talk about each of these problems separately, and the solution we have for you.
Image downsampling is not a new problem. Solutions for image downsampling and compression are widely available. The real issue here is taking images out of the PDF document, while retaining the correct colors, then inserting the downsampled and compressed images back in the PDF document, preserving the correct color space and transparencies. This difficult task is easily achieved by the Adobe Color Engine working under the hood of the Adobe PDF Library.
The image below is an original test image from the Altona Test Suite version 1.2. The Altona Test Suite provides insight on color consistency, duotone images, spot colors, and overprint.
This image below has been optimized using the Adobe PDF Library’s optimization tool. As you can see, the color space and transparency are identical to the input image above.
This image below shows the same input PDF optimized by a different optimization tool. While the PDF file size has been reduced, the color space looks very saturated. On top of that, the spot colors are not represented correctly.
Optimized with PDF OPTIMIZER
Optimized with “Other Tool”
File size reduction
Here’s an example of how optimization can help reduce the overall file size using the Altona Test Suite sample file:
As PDF files are created, a number of fonts get embedded in them. This is a standard step in the creation process, and ensures PDF files can retain their visual appearance across platforms that might be missing those fonts. This is a great feature, but unfortunately, fonts can sometimes take up a lot of space - let alone if the tool used decides to embed multiple copies of the exact same font. This happens for a number of reasons, one of which is merging PDF documents. How do we aid that? To start, we can use the PDF optimization tool to subset a fully embedded font. We will go over each glyph, and retain only the ones we actually need in a document. This is a complicated task, and not one that can be completed correctly by most tools out there. Common mishaps using third-party tools are plenty; they vary from words missing characters, to the whole document looking like it’s in a different language.
Object cleanup and stream compression
In the beginning of the article, I mentioned that we can have duplicate font and image streams in a PDF file. Datalogics PDF OPTIMIZER will go over those, and remove any duplicates. Furthermore, it will compress all uncompressed streams.
PDF optimization is a fairly involved process. While we have a lot of knobs and levers to fine tune it, we also have a sensible set of defaults. This makes it easier to get started right out of the box.
Datalogics offers two options for PDF optimization – PDF OPTIMIZER is available as an API or a command-line utility built on the Adobe PDF Library. The Adobe PDF Library SDK is a low-level PDF library that contains a powerful set of native C/C++ APIs with interfaces for .NET and Java APIs.
Click here to download a free evaluation of the PDF OPTIMIZER API or command-line utility.
The PDF OPTIMIZER utility is a command-line application available on Windows and Linux. For the purposes of this example, we will provide instructions on how to optimize a PDF document on Windows.
PDF OPTIMIZER ships with two optimization profiles – standard and mobile. The standard profile is a more general optimization profile and it will provide a good balance between file size and image fidelity. The mobile profile is intended where file size is a priority – in mobile applications for example. You can also create your own profiles, or tweak the existing ones.
PDF OPTIMIZER is easy to use right out of the box. To optimize a PDF document, we need to provide an input file, and output file and an optimization profile. In this case we will use the standard profile. A typical optimization command-line looks like this:
pdfoptimizer input.pdf output.pdf standard.json
The Adobe PDF Library is available on multiple platforms and languages. Below, we are going to demonstrate how to optimize a PDF document using the .NET language interface. We will also provide links to the C++ and Java versions of the code.
PDF OPTIMIZER is a class in the Adobe PDF Library that lives in
Datalogics::PDFL::PDFOptimizer. For the purposes of this tutorial, though, we should import
Datalogics::PDFL, since we will be using other classes from this library.
To begin, let’s first open a PDF file named “sample.pdf”
String sInput = "sample.pdf"
Document doc = new Document(sInput);
Then we can call
PDFOptimizer to reduce the file size.
using (PDFOptimizer opt = new PDFOptimizer())
long beforeLength = new System.IO.FileInfo(sInput).Length;
bool linearizeBefore = opt.GetOption(OptimizerOption.Linearize);
bool linearizeAfter = opt.GetOption(OptimizerOption.Linearize);
if (linearizeBefore != linearizeAfter)
Console.WriteLine("Successfully set PDF Option Linearize to ON.");
Console.WriteLine("Failed to set PDF Option Linearize to ON!");
long afterLength = new System.IO.FileInfo(sOutput).Length;
Console.Write("Optimized file ");
Console.Write(afterLength * 100.0 / beforeLength);
Console.Write("% the size of the original.");
The actual optimization is performed with one line of code -
opt.Optimize(doc, sOutput);. This line optimizes the PDF document with the desired optimization options, and saves it to an output location. The default set of options will work well for a wide range of documents. If you need to tweak specific settings, those can be set using
(OptimizerOption option, bool value). In the sample above, we are setting the linearization option to true, as an example on how to use the PDF OPTIMIZER options.
For the purposes of this tutorial, we would like to save the file size before we start optimization, and compare it to the file size after optimization has been performed. The values are stored in “
linearizeBefore” and “
Here is the original sample file that was used:
Here is the optimized output file:
Benefits of using Datalogics PDF OPTIMIZER
PDF OPTIMIZER from Datalogics is built on the Adobe PDF Library, which is the same core technology that Adobe uses in Adobe Acrobat, Adobe Illustrator, Adobe Photoshop and Adobe InDesign. As a result, you are assured guaranteed color and font accuracy for the optimized PDF. With PDF OPTIMIZER, from Datalogics, you can:
- Ensure maximum compatibility with your users
- Reduce file size for fast delivery
- Optimize fonts and colors while maintaining the integrity of the original document
Datalogics fully stands behind its products. PDF OPTIMIZER is an enterprise-level tool with enterprise-level support – evaluators and customers receive immediate access to a dedicated technical support representative to address any questions or concerns. With PDF OPTIMIZER, you get functionality that you can rely on to achieve the optimal balance between document quality and reduced file sizes. Click here to download a free evaluation of the PDF OPTIMIZER API or command-line utility.