Click here to Skip to main content
Click here to Skip to main content

Convert PDF pages to image files using the Solid Framework

By , 11 Sep 2009
 

Introduction

This article is a walkthrough for converting pages in a PDF file to image files. With the Solid Framework .NET SDK, it's fast and easy to convert pages from a PDF file to images.

Background

This is the first of I hope many articles that I will be writing on using the Solid Framework .NET SDK for all your PDF document needs. In this article, I will show you how to generate images from pages in a PDF document. We use these images to generate thumbnail images for a bookmark viewer, and also the current page in the main page viewer of the PDF Navigator. In later articles, I’ll explain editing, creation, conversion, and other features that the Solid Framework gives you for manipulating PDF files.

Using the code

The source download contains both Visual Studio 2005 and Visual Studio 2008 solutions to build a command line application to convert PDF pages to images that are stored in a specified output directory. The project makes use of a CodeProject C# library by Richard Lopes to parse the command line parameters. You will need to download the Solid Framework .NET SDK and extract the DLL into the project directory. Be sure to update the Solid Framework reference in the project.

The following code in program.cs takes care of parsing out the parameters we need from the user to call DoConversion():

using System;
using System.Text;
using CommandLine.Utility;

namespace PDFtoImage
{
    enum ImageType {TIFF, BMP, JPG, PNG };

    partial class Program
    {
        static int Main(string[] args)
        {
            // What we need as commandline parameters:
            // 1 - PDF file to convert                  -f
            // 2 - Password if needed                   -p
            // 3 - Output folder for images             -o
            // 4 - DPI for the image size               -d
            // 5 - Page range of pages to convert       -r
            // 6 - Output image type: TIFF|BMP|JPG|PNG  -t

            string pdfFile = null;
            string password = null;
            string outputfolder = null;
            string pagerange = null;

            ImageType imagetype = ImageType.PNG;
            int dpi = 96;

            Arguments CommandLine = new Arguments(args);

            if (CommandLine["f"] == null)
            {
                ShowUsage();
                return -1;
            }
            else
                pdfFile = CommandLine["f"];

            if (CommandLine["p"] != null)
                password = CommandLine["p"];

            if (CommandLine["o"] == null)
            {
                ShowUsage();
                return -2;
            }
            else
                outputfolder = CommandLine["o"];

            // Note: We default to 96 dpi if the parameter was not provided.
            if (CommandLine["d"] != null)
                dpi = Convert.ToInt32(CommandLine["d"]);

            if (CommandLine["t"] != null)
            {
                switch (CommandLine["t"].ToUpper())
                {
                    case "TIF":
                    case "TIFF":
                        imagetype = ImageType.TIFF;
                        break;
                    case "BMP":
                        imagetype = ImageType.BMP;
                        break;
                    case "JPEG":
                    case "JPG":
                        imagetype = ImageType.JPG;    
                    break;
                    case "PNG":
                    default:
                        imagetype = ImageType.PNG;
                        break;
                }
            }

            if (CommandLine["r"] != null)
            {
                pagerange = CommandLine["r"];
            }

            DoConversion(pdfFile, password, outputfolder, dpi, 
                         pagerange, imagetype);

            return 0;
        }

        static void ShowUsage()
        {
            Console.WriteLine("PDFtoImage.exe -f:(Path to pdf file) " + 
                              "-p:(password) -o:(path to image folder) -d:(integer dpi)"); 
            Console.WriteLine("-r:(Range i.e. 1,2-3,7,8-10) -t:TIFF|BMP|JPG|PNG");
            Console.WriteLine("");
            Console.WriteLine("-p, -d -t and -r are optional. Defaults to 96 dpi and PNG");
        }
    }
}

Let's say we have a PDF file at c:\mypdfs\pdftest.pdf that is encrypted with a user password of "mypassword", and we want to make JPEG images of pages 1-5, 7, 8, with a DPI of 127, and put these images in c:\myimages. The command line would look like this:

PDFtoImage.exe -f:c:\mypdfs\pdftest.pdf -p:mypassword -o:c:\myimages -d:127 -t:JPG -r:1-5,7,8

Note: -p, -d, -t, and -r are optional. No password is used if -p is missing. DPI will default to 96, and image type will default to PNG. If -r is missing, all pages will be used to make images.

The DoConversion function in pdftoimage.cs is the meat of the project.

static private void DoConversion(string file, string password, 
       string folder, int dpi, string pagerange, ImageType iType)
{
    System.Drawing.Imaging.ImageFormat format;
    string extension = null;

    // Setup the license
    SolidFramework.License.ActivateDeveloperLicense();

    // Set the output image type
    switch (iType)
    {
        case ImageType.BMP:
            format = System.Drawing.Imaging.ImageFormat.Bmp;
            extension = "bmp";
            break;
        case ImageType.JPG:
            format = System.Drawing.Imaging.ImageFormat.Jpeg;
            extension = "jpg";
            break;
        case ImageType.PNG:
            format = System.Drawing.Imaging.ImageFormat.Png;
            extension = "png";
            break;
        case ImageType.TIFF:
            format = System.Drawing.Imaging.ImageFormat.Tiff;
            extension = "tif";
            break;
        default:
            throw new ArgumentException("DoConversion: ImageType not known");
    }

    // Load up the document
    SolidFramework.Pdf.PdfDocument doc =
        new SolidFramework.Pdf.PdfDocument(file, password);
    doc.Open();

    // Setup the outputfolder
    if (!Directory.Exists(folder))
    {
        Directory.CreateDirectory(folder);
    }

    // Setup the file string.
    string filename = folder + Path.DirectorySeparatorChar +
        Path.GetFileNameWithoutExtension(file);

    // Get our pages.
    List<solidframework.pdf.plumbing.pdfpage> Pages =
        new List<solidframework.pdf.plumbing.pdfpage>(doc.Catalog.Pages.PageCount);
    SolidFramework.Pdf.Catalog catalog =
        (SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);
    SolidFramework.Pdf.Plumbing.PdfPages pages =
        (SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;
    ProcessPages(ref pages, ref Pages);

    // Check for page ranges
    PageRange ranges = null;
    bool bHaveRanges = false;
    if (!string.IsNullOrEmpty(pagerange))
    {
        bHaveRanges = PageRange.TryParse(pagerange, out ranges);
    }

    if (bHaveRanges)
    {
        int[] pageArray = ranges.ToArray();
        foreach (int number in pageArray)
        {
            CreateImageFromPage(Pages[number], dpi, filename, number,
                extension, format);
            Console.WriteLine(string.Format("Processed page {0} of {1}", number,
                Pages.Count));
        }
    }
    else
    {
        // For each page, save off a file.
        int pageIndex = 0;
        foreach (SolidFramework.Pdf.Plumbing.PdfPage page in Pages)
        {
            // Update the page number.
            pageIndex++;

            CreateImageFromPage(page, dpi, filename, pageIndex, extension, format);
            Console.WriteLine(string.Format("Processed page {0} of {1}", pageIndex,
                Pages.Count));
        }
    }
}

First, we setup the trial license for Solid Framework with the call to ActivateDeveloperLicense(). We then setup format and extension of the format type and extension name of the image files.

After this, we create the PDFDocument object and hand it the filename to open and password if the PDF file is encrypted (secured).

Once the document is open, we check to see if the output folder exists, and if it does not, we create it.

Now, we use Solid Framework to get the Pages dictionary and walk it to find all the PdfPage objects:

// Get our pages.
List<solidframework.pdf.plumbing.pdfpage> Pages =
    new List<solidframework.pdf.plumbing.pdfpage />(doc.Catalog.Pages.PageCount);
SolidFramework.Pdf.Catalog catalog =
    (SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);
SolidFramework.Pdf.Plumbing.PdfPages pages =
    (SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;
ProcessPages(ref pages, ref Pages);

We now figure out if we have a range of pages or not. If the argument is null or empty for range, we will process all the pages in the documents. If the argument is not empty or null, we'll use the PageRange object in the Solid Framework to get us an integer array of page indices.

From each page we need to process, we then make the call to CreateImageFromPage() to finally create the images.

private static void CreateImageFromPage(SolidFramework.Pdf.Plumbing.PdfPage page,
    int dpi, string filename, int pageIndex, string extension,
    System.Drawing.Imaging.ImageFormat format)
{
    // Create a bitmap from the page with set dpi.
    Bitmap bm = page.DrawBitmap(dpi);

    // Setup the filename.
    string filepath = string.Format(filename + "-{0}.{1}", 
                                    pageIndex, extension);

    // If the file exits already, delete it. I.E. Overwrite it.
    if (File.Exists(filepath))
        File.Delete(filepath);

    // Save the file.
    bm.Save(filepath, format);

    // Cleanup.
    bm.Dispose();
}

We request the Page object to return us back a Bitmap image by calling its method DrawBitmap with the specified DPI. We ask the Bitmap object to save the file, and cleanup for each processed page.

Points of interest

CommandLine.Utility takes the pain out of command line parsing. The PageRange object takes care of the page range parsing for us. Once we have the Pages list, we just request each Page object to give us a bitmap in the DPI we require.

The images will have a small watermark at the bottom which will go away with the purchase of a license.

History

  • September 11, 2009 - v1.0: Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Greg Greenaae
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 1 Pinmembergarridillo23 May '12 - 0:40 
Questionhow i can produce images without water mark???!!! Pinmemberahmad abdo30 Jan '12 - 3:35 
AnswerRe: how i can produce images without water mark???!!! PinmemberKalpana Volety11 Jan '13 - 6:19 
QuestionPage 1 is blank Pinmemberbeckerben24 Jan '12 - 7:29 
AnswerRe: Page 1 is blank Pinmembergreggree24 Jan '12 - 11:58 
GeneralUnable to run more than once PinmemberMember 406686030 Aug '10 - 3:09 
GeneralRe: Unable to run more than once Pinmembergreggree24 Jan '12 - 11:58 
GeneralMy vote of 1 Pinmembercocowalla13 May '10 - 23:43 
GeneralMy vote of 1 PinmemberMicroImaging17 Mar '10 - 16:49 
GeneralMy vote of 1 Pinmemberpophelix28 Oct '09 - 16:47 
GeneralFramework.dll Pinmemberdefineconst16 Sep '09 - 21:47 
GeneralRe: Framework.dll PinmemberTuzzolino27 Sep '09 - 12:06 
GeneralRe: Framework.dll Pinmemberdefineconst27 Sep '09 - 21:56 
GeneralError Pinmemberdefineconst16 Sep '09 - 21:01 
GeneralRe: Error Pinmembernelgnut20 Sep '09 - 18:26 
GeneralRe: Error PinmemberAhMaDGeNdY27 Sep '09 - 21:48 
QuestionIs possible to use these framework to reduce PDF resolution (for web use)? PinmemberMarco Schwertner14 Sep '09 - 5:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 11 Sep 2009
Article Copyright 2009 by Greg Greenaae
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid