Click here to Skip to main content
Click here to Skip to main content

Tagged as

Compressing PDF Documents for Faster Display and Easier Storage

, , 6 Jun 2014
This whitepaper describes a proven method for conveniently compressing PDF documents to reduce their storage footprint and accelerate their transmission and display speed.

Editorial Note

This article is in the Product Showcase section for our sponsors at CodeProject. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

The PDF format is the go-to medium for document exchange around the world, whether it’s on a PC, phone, or tablet. But if the original author of a PDF isn’t careful when creating it, a PDF’s file size can quickly balloon. Many PDFs suffer from this condition of bloated file size, and are far bigger than they need to be. This presents huge problems in terms of storage, transmission, and shareability of such content for users, companies, and websites.

Bloated PDF documents take a long time to display, especially when viewed online or through a network, and they take a lot of processing power, as well. At a time when most content creators want their documents to be mobile-friendly, bloated PDFs can take an unacceptably long time to download to mobile devices, and can tax mobile devices’ processing power when displayed. Corporate repositories overflow their available storage space when thousands or millions of stored PDF files are bigger than they need to be.

All of these problems caused by oversize PDF documents can be effectively addressed through technology that compresses PDF documents for optimal size.

This whitepaper describes a proven method for conveniently compressing PDF documents to reduce their storage footprint and accelerate their transmission and display speed.

Compression is the Key

The majority of PDF bloat is due to embedded images in the PDF document. Many PDFs are nothing more than pages and pages of uncompressed scanned images, which can take up huge amounts of space. Even when they have been compressed, the images often still take up far more space than they should because non-optimal compression has been applied to them.

Besides compressing images, the method described in this whitepaper takes additional avenues to make documents even smaller. By removing embedded thumbnails, unneeded fonts and metadata, and by down-sampling images, this method can reduce a PDF document’s size even more.

While any method for compressing PDF documents must first address file size, convenience is a factor, as well. For example, if you have a repository of millions of scanned medical authorization forms from the last decade, rescanning the actual hardcopy with better compression is simply impractical. Any workable solution must not only reduce file sizes and preserve document quality, but also be convenient to apply to a large number of documents in a single operation.

Deficiencies in Some Solutions

Although there are many PDF compression applications available in today’s marketplace, document quality is not their strong suit. Many are written using open-source PDF libraries that in turn use open-source compressors, which can typically complete the task, but often at the cost of quality and accuracy. The resulting PDF document is smaller, but not as small as it could be, and the appearance of the document may have been significantly degraded in the process. When evaluating options for PDF compression, it’s important to consider both compression ratio and the quality of compressed documents.

Double Bracket: “When evaluating options for PDF compression, it’s important to consider both compression ratio and the quality of compressed documents.”

Another common weakness of PDF compression software is revealed in the compression of images that use a color space other than DeviceRGB for color images. Some compression applications can process only DeviceRGB; these applications typically convert images that use other color spaces to DeviceRGB for compression, with unpredictable results. PDF producers intend the color in a document to look a certain way, and radically changing the intended color space for no good reason is tantamount to sacrilege in the publishing industry.

Other common failings in some PDF compression tools include corrupted output with visible errors, an inability to compress secure PDFs, and compression that fails on certain types of documents in ways that actually make the file size larger, not smaller.

Example

To show what can be achieved with effective compression, the code sample below calls PDF Xpress, a software development kit for adding PDF functions to applications, including creation, modification and compression. PDF Xpress handles color spaces other than DeviceRGB properly, compresses secure documents, and avoids other problems common in PDF compression applications.

PDF Xpress enables you to customize compression to suit your needs. For example, the intuitive API allows you to choose whether you want to target JPEG or JPEG2000 compression for grayscale and color images. The toolkit applies JBIG2 compression for monochrome images, and empowers you to control how aggressively the images will be compressed. Without any customization, PDF Xpress automatically selects and applies a good compression ratio that yields visually lossless results in most cases. It can optionally apply lossless compression.

The following intuitive C# code opens a PDF document in PDF Xpress, compresses it, and saves it as a new, smaller PDF file:

   using (PdfXpress pdf = new PdfXpress())
      {
        pdf.Initialize();
 
        using (Document doc = new Document(pdf, "document.pdf"))
        {
          Accusoft.PdfXpressSdk.SaveOptions saveOptions = new Accusoft.PdfXpressSdk.SaveOptions();
          saveOptions.Compress = true;
          saveOptions.Linearized = true;
          saveOptions.Overwrite = true;
          saveOptions.Filename = "compressed.pdf";
 
          doc.Save(saveOptions);
        }
      }

Summary

PDF document compression is a popular feature in a PDF workflow, but it is often misunderstood and sometimes even grossly mishandled. In order to ensure both effective compression and preservation of the document’s quality, it’s important to select capable PDF compression tools and to apply them with settings customized to meet the requirements of your content management goals.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Authors

Accusoft

United States United States
Accusoft provides a full spectrum of document, content and imaging solutions as fully supported, enterprise-grade, best-in-class client-server applications, mobile apps, cloud services, and software development kits (SDKs). Products include the PDF Xpress SDK for empowering applications to create, modify and compress PDF and PDF/A files, and the Prizm Content Connect HTML5 document viewer that displays almost any document or image file type through any HTML5 browser, with no additional software required on the client. For more information, please visit www.accusoft.com.
Group type: Organisation

1 members


Joseph Argento

United States United States
Joseph Argento is the Technical Lead for PDF Xpress and ImagXpress at Accusoft. He joined the company in 2007 as a Support Engineer. Joseph contributes to the Native product team as a Software Engineer III and has a MS degree in Electrical Engineering.

Comments and Discussions

 
QuestionBenchmark Pinmemberfredatcodeproject13-Jun-14 0:28 
Great article but based on a commercial SDK Frown | :(
It would have been better if you had added a benchmark on how much space we can save with this compression solution and how much faster can a PDF can be loaded on a few big PDF.
QuestionGreat Joseph! PinpremiumVolynsky Alex6-Jun-14 8:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140827.1 | Last Updated 6 Jun 2014
Article Copyright 2014 by Accusoft, Joseph Argento
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid