(untagged)

Convert PDF pages to image files using the Solid Framework

Greg Greenaae

0.00/5 (No votes)

11 Sep 2009

Using the Solid Framework SDK to convert PDF pages to image files.

Download source code - 7.7 KB

Introduction

This article is a walkthrough for converting pages in a PDF file to image files. With the Solid Framework .NET SDK, it's fast and easy to convert pages from a PDF file to images.

Background

This is the first of I hope many articles that I will be writing on using the Solid Framework .NET SDK for all your PDF document needs. In this article, I will show you how to generate images from pages in a PDF document. We use these images to generate thumbnail images for a bookmark viewer, and also the current page in the main page viewer of the PDF Navigator. In later articles, I’ll explain editing, creation, conversion, and other features that the Solid Framework gives you for manipulating PDF files.

Using the code

The source download contains both Visual Studio 2005 and Visual Studio 2008 solutions to build a command line application to convert PDF pages to images that are stored in a specified output directory. The project makes use of a CodeProject C# library by Richard Lopes to parse the command line parameters. You will need to download the Solid Framework .NET SDK and extract the DLL into the project directory. Be sure to update the Solid Framework reference in the project.

The following code in program.cs takes care of parsing out the parameters we need from the user to call DoConversion():

using System;
using System.Text;
using CommandLine.Utility;

namespace PDFtoImage
{
    enum ImageType {TIFF, BMP, JPG, PNG };

    partial class Program
    {
        static int Main(string[] args)
        {
            // What we need as commandline parameters:
            // 1 - PDF file to convert                  -f
            // 2 - Password if needed                   -p
            // 3 - Output folder for images             -o
            // 4 - DPI for the image size               -d
            // 5 - Page range of pages to convert       -r
            // 6 - Output image type: TIFF|BMP|JPG|PNG  -t

            string pdfFile = null;
            string password = null;
            string outputfolder = null;
            string pagerange = null;

            ImageType imagetype = ImageType.PNG;
            int dpi = 96;

            Arguments CommandLine = new Arguments(args);

            if (CommandLine["f"] == null)
            {
                ShowUsage();
                return -1;
            }
            else
                pdfFile = CommandLine["f"];

            if (CommandLine["p"] != null)
                password = CommandLine["p"];

            if (CommandLine["o"] == null)
            {
                ShowUsage();
                return -2;
            }
            else
                outputfolder = CommandLine["o"];

            // Note: We default to 96 dpi if the parameter was not provided.
            if (CommandLine["d"] != null)
                dpi = Convert.ToInt32(CommandLine["d"]);

            if (CommandLine["t"] != null)
            {
                switch (CommandLine["t"].ToUpper())
                {
                    case "TIF":
                    case "TIFF":
                        imagetype = ImageType.TIFF;
                        break;
                    case "BMP":
                        imagetype = ImageType.BMP;
                        break;
                    case "JPEG":
                    case "JPG":
                        imagetype = ImageType.JPG;    
                    break;
                    case "PNG":
                    default:
                        imagetype = ImageType.PNG;
                        break;
                }
            }

            if (CommandLine["r"] != null)
            {
                pagerange = CommandLine["r"];
            }

            DoConversion(pdfFile, password, outputfolder, dpi, 
                         pagerange, imagetype);

            return 0;
        }

        static void ShowUsage()
        {
            Console.WriteLine("PDFtoImage.exe -f:(Path to pdf file) " + 
                              "-p:(password) -o:(path to image folder) -d:(integer dpi)"); 
            Console.WriteLine("-r:(Range i.e. 1,2-3,7,8-10) -t:TIFF|BMP|JPG|PNG");
            Console.WriteLine("");
            Console.WriteLine("-p, -d -t and -r are optional. Defaults to 96 dpi and PNG");
        }
    }
}

Let's say we have a PDF file at c:\mypdfs\pdftest.pdf that is encrypted with a user password of "mypassword", and we want to make JPEG images of pages 1-5, 7, 8, with a DPI of 127, and put these images in c:\myimages. The command line would look like this:

PDFtoImage.exe -f:c:\mypdfs\pdftest.pdf -p:mypassword -o:c:\myimages -d:127 -t:JPG -r:1-5,7,8

Note: -p, -d, -t, and -r are optional. No password is used if -p is missing. DPI will default to 96, and image type will default to PNG. If -r is missing, all pages will be used to make images.

The DoConversion function in pdftoimage.cs is the meat of the project.

static private void DoConversion(string file, string password, 
       string folder, int dpi, string pagerange, ImageType iType)
{
    System.Drawing.Imaging.ImageFormat format;
    string extension = null;

    // Setup the license
    SolidFramework.License.ActivateDeveloperLicense();

    // Set the output image type
    switch (iType)
    {
        case ImageType.BMP:
            format = System.Drawing.Imaging.ImageFormat.Bmp;
            extension = "bmp";
            break;
        case ImageType.JPG:
            format = System.Drawing.Imaging.ImageFormat.Jpeg;
            extension = "jpg";
            break;
        case ImageType.PNG:
            format = System.Drawing.Imaging.ImageFormat.Png;
            extension = "png";
            break;
        case ImageType.TIFF:
            format = System.Drawing.Imaging.ImageFormat.Tiff;
            extension = "tif";
            break;
        default:
            throw new ArgumentException("DoConversion: ImageType not known");
    }

    // Load up the document
    SolidFramework.Pdf.PdfDocument doc =
        new SolidFramework.Pdf.PdfDocument(file, password);
    doc.Open();

    // Setup the outputfolder
    if (!Directory.Exists(folder))
    {
        Directory.CreateDirectory(folder);
    }

    // Setup the file string.
    string filename = folder + Path.DirectorySeparatorChar +
        Path.GetFileNameWithoutExtension(file);

    // Get our pages.
    List<solidframework.pdf.plumbing.pdfpage> Pages =
        new List<solidframework.pdf.plumbing.pdfpage>(doc.Catalog.Pages.PageCount);
    SolidFramework.Pdf.Catalog catalog =
        (SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);
    SolidFramework.Pdf.Plumbing.PdfPages pages =
        (SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;
    ProcessPages(ref pages, ref Pages);

    // Check for page ranges
    PageRange ranges = null;
    bool bHaveRanges = false;
    if (!string.IsNullOrEmpty(pagerange))
    {
        bHaveRanges = PageRange.TryParse(pagerange, out ranges);
    }

    if (bHaveRanges)
    {
        int[] pageArray = ranges.ToArray();
        foreach (int number in pageArray)
        {
            CreateImageFromPage(Pages[number], dpi, filename, number,
                extension, format);
            Console.WriteLine(string.Format("Processed page {0} of {1}", number,
                Pages.Count));
        }
    }
    else
    {
        // For each page, save off a file.
        int pageIndex = 0;
        foreach (SolidFramework.Pdf.Plumbing.PdfPage page in Pages)
        {
            // Update the page number.
            pageIndex++;

            CreateImageFromPage(page, dpi, filename, pageIndex, extension, format);
            Console.WriteLine(string.Format("Processed page {0} of {1}", pageIndex,
                Pages.Count));
        }
    }
}

First, we setup the trial license for Solid Framework with the call to ActivateDeveloperLicense(). We then setup format and extension of the format type and extension name of the image files.

After this, we create the PDFDocument object and hand it the filename to open and password if the PDF file is encrypted (secured).

Once the document is open, we check to see if the output folder exists, and if it does not, we create it.

Now, we use Solid Framework to get the Pages dictionary and walk it to find all the PdfPage objects:

// Get our pages.
List<solidframework.pdf.plumbing.pdfpage> Pages =
    new List<solidframework.pdf.plumbing.pdfpage />(doc.Catalog.Pages.PageCount);
SolidFramework.Pdf.Catalog catalog =
    (SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);
SolidFramework.Pdf.Plumbing.PdfPages pages =
    (SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;
ProcessPages(ref pages, ref Pages);

We now figure out if we have a range of pages or not. If the argument is null or empty for range, we will process all the pages in the documents. If the argument is not empty or null, we'll use the PageRange object in the Solid Framework to get us an integer array of page indices.

From each page we need to process, we then make the call to CreateImageFromPage() to finally create the images.

private static void CreateImageFromPage(SolidFramework.Pdf.Plumbing.PdfPage page,
    int dpi, string filename, int pageIndex, string extension,
    System.Drawing.Imaging.ImageFormat format)
{
    // Create a bitmap from the page with set dpi.
    Bitmap bm = page.DrawBitmap(dpi);

    // Setup the filename.
    string filepath = string.Format(filename + "-{0}.{1}", 
                                    pageIndex, extension);

    // If the file exits already, delete it. I.E. Overwrite it.
    if (File.Exists(filepath))
        File.Delete(filepath);

    // Save the file.
    bm.Save(filepath, format);

    // Cleanup.
    bm.Dispose();
}

We request the Page object to return us back a Bitmap image by calling its method DrawBitmap with the specified DPI. We ask the Bitmap object to save the file, and cleanup for each processed page.

Points of interest

CommandLine.Utility takes the pain out of command line parsing. The PageRange object takes care of the page range parsing for us. Once we have the Pages list, we just request each Page object to give us a bitmap in the DPI we require.

The images will have a small watermark at the bottom which will go away with the purchase of a license.

History

September 11, 2009 - v1.0: Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here