TIFF and PDF are the two most popular formats for document imaging (documents that come from a scanner). The primary reasons are that they both can store multiple pages in one file and both allow each page to have different sizes and compression.
However, there are differences and knowing them will help you choose between the two formats. Here’s a brief overview of them:
A TIFF file is a container of compressed images and metadata. Each individual frame of the TIFF can be compressed with one of a few different compression methods depending on the nature of the image in the frame.
The compression formats available include JPEG, LZW (similar to GIF), Flate (similar to PNG), and a few that are tuned for 1 bpp (black and white) images (Group 3, Group 4, and a Huffman based run length encoding method). There is also the option for no compression if decoding and encoding speed are important. In addition, individual frames can be stored as strips or tiles so that parts of an image can be accessed without requiring the entire page to be decoded.
Because of ambiguity in the original specification, there was a proliferation of dialects, especially with concerns to what is called “Old-Style” JPEG (replaced with “New-style”). The standard Codec that comes with Windows XP and Vista doesn’t even try to read “Old-style” because of the problems in specifying what that exactly is. Support for the various old-style variants is usually only found in third-party toolkits (for example, the DotImage .NET TIFF SDK).
PDF was originally developed by Adobe as an archival format for documents, but is often used to hold images because it can contain multiple pages. PDF is a well-specified format, and the Acrobat Reader or equivalent is installed on nearly every machine and many mobile devices. Like TIFF, PDFs can contain multiple pages, and each individual page can be compressed based on its content.
Like TIFF, PDF can store an image as JPEG, Deflate (similar to PNG), and Group 4. In addition, it supports two advanced codecs. The first, JPEG 2000, is for photographs and loses much less quality compared to JPEG for the same file size. The second, JBIG2, is tuned for 1-bit black and white images.
Since PDF is designed to contain more than just images, they are ideal for storing the results of images that have had optical character recognition (OCR) performed on them. The recognized text can accurately be placed behind the pixels representing the text. If this is done, the resulting PDF is called a Searchable PDF, and search engines will index them, and some viewers will allow you to select and search for text within them. You can create Searchable PDF with the DotImage .NET OCR and Searchable PDF SDK.
When to Use TIFF over PDF
- You don’t OCR the images, or you do, and want to keep the results separate.
- Storage size is an important factor.
- Everyone who needs to see the image has access to viewing software (if you need to, you can create custom viewers with the DotImage .NET TIFF Viewing SDK).
- You need to maintain compatibility with the WANG annotation format.
- You have very large pages that need to be accessed in pieces (TIFF supports a tiled mode that allows very fast access to parts of an image – DotImage’s TIFF SDK supports creating and using this style directly)
When to Use PDF over TIFF
- You OCR the images and want to keep the results with the image (Create searchable PDF)
- Your organization mandates an archival format like PDF/A for compliance purposes.
- You want users to be able to view the documents without installing special viewers, and they likely have a PDF viewer, like Acrobat Reader, already.
- You annotate the images and want users with Acrobat Reader to be able to see the annotations, so you will store them in the PDF as PDF-compliant annotations (easy to do with DotImage’s .NET PDF SDK)
How to convert between TIFF and PDF
Converting between TIFF and PDF using .NET usually requires a third-party .NET PDF/TIFF SDK, like DotImage. Here’s sample code for going between the two formats:
Converting Multiframe (Multipage) Formats
Converting between multiframe formats in DotImage is simple. All multiframed image encoders inherit from a base class called MultiFramedImageEncoder, which has a Save method. In addition, there is a class called FileSystemImageSource, which can read multiframed formats from files and provides each page as an image to any class that needs them. One of the overloads of MultiFramedImageEncoder.Save() takes any ImageSource as a parameter and saves each page into a Stream in a memory efficient way.
Using these classes, we can write this C# function:
private void Convert(string inFile, string outFile, MultiFramedImageEncoder encoder)
using (ImageSource src = new FileSystemImageSource(inFile, true))
using (Stream s = File.OpenWrite(outFile))
encoder.Save(s, src, null);
In VB.NET, the function is:
Private Sub Convert(ByVal inFile As String, ByVal outFile As String, _
ByVal encoder As MultiFramedImageEncoder)
Using src As ImageSource = New FileSystemImageSource(inFile, True)
Using s As Stream = File.OpenWrite(outFile)
encoder.Save(s, src, Nothing)
To convert a TIFF to a PDF, call Convert() like this:
Convert(tiffFileName, pdfFileName, new PdfEncoder());
To convert a PDF to a TIFF, call it like this
Convert(pdfFileName, tiffFileName, new TiffEncoder());
In order to read a PDF, you will need the DotImage PDFReader Add-on SDK, and will need to call this code one time in your application before any PDFs are read:
This registers the add-on’s PdfDecoder with the system so that PDFs can be detected and read.
By default, DotImage will pick the best compression method based on the content of the image. However, if you want more control, each encoder class offers many hooks for you to change or refine the compression as the images are saved.
To convert TIFF to PDF, you need DotImage Document Imaging, and to convert PDF to TIFF, you need to add the PDFReader Add-on to DotImage Document Imaging. Both products are available with a runtime-royalty free license for desktop deployments.