
Introduction
This article is a walkthrough for converting pages in a PDF file to image files. With the Solid Framework .NET SDK, it's fast and easy to convert pages from a PDF file to images.
Background
This is the first of I hope many articles that I will be writing on using the Solid Framework .NET SDK for all your PDF document needs. In this article, I will show you how to generate images from pages in a PDF document. We use these images to generate thumbnail images for a bookmark viewer, and also the current page in the main page viewer of the PDF Navigator. In later articles, I’ll explain editing, creation, conversion, and other features that the Solid Framework gives you for manipulating PDF files.
Using the code
The source download contains both Visual Studio 2005 and Visual Studio 2008 solutions to build a command line application to convert PDF pages to images that are stored in a specified output directory. The project makes use of a CodeProject C# library by Richard Lopes to parse the command line parameters. You will need to download the Solid Framework .NET SDK and extract the DLL into the project directory. Be sure to update the Solid Framework reference in the project.
The following code in program.cs takes care of parsing out the parameters we need from the user to call DoConversion()
:
using System;
using System.Text;
using CommandLine.Utility;
namespace PDFtoImage
{
enum ImageType {TIFF, BMP, JPG, PNG };
partial class Program
{
static int Main(string[] args)
{
string pdfFile = null;
string password = null;
string outputfolder = null;
string pagerange = null;
ImageType imagetype = ImageType.PNG;
int dpi = 96;
Arguments CommandLine = new Arguments(args);
if (CommandLine["f"] == null)
{
ShowUsage();
return -1;
}
else
pdfFile = CommandLine["f"];
if (CommandLine["p"] != null)
password = CommandLine["p"];
if (CommandLine["o"] == null)
{
ShowUsage();
return -2;
}
else
outputfolder = CommandLine["o"];
if (CommandLine["d"] != null)
dpi = Convert.ToInt32(CommandLine["d"]);
if (CommandLine["t"] != null)
{
switch (CommandLine["t"].ToUpper())
{
case "TIF":
case "TIFF":
imagetype = ImageType.TIFF;
break;
case "BMP":
imagetype = ImageType.BMP;
break;
case "JPEG":
case "JPG":
imagetype = ImageType.JPG;
break;
case "PNG":
default:
imagetype = ImageType.PNG;
break;
}
}
if (CommandLine["r"] != null)
{
pagerange = CommandLine["r"];
}
DoConversion(pdfFile, password, outputfolder, dpi,
pagerange, imagetype);
return 0;
}
static void ShowUsage()
{
Console.WriteLine("PDFtoImage.exe -f:(Path to pdf file) " +
"-p:(password) -o:(path to image folder) -d:(integer dpi)");
Console.WriteLine("-r:(Range i.e. 1,2-3,7,8-10) -t:TIFF|BMP|JPG|PNG");
Console.WriteLine("");
Console.WriteLine("-p, -d -t and -r are optional. Defaults to 96 dpi and PNG");
}
}
}
Let's say we have a PDF file at c:\mypdfs\pdftest.pdf that is encrypted with a user password of "mypassword", and we want to make JPEG images of pages 1-5, 7, 8, with a DPI of 127, and put these images in c:\myimages. The command line would look like this:
PDFtoImage.exe -f:c:\mypdfs\pdftest.pdf -p:mypassword -o:c:\myimages -d:127 -t:JPG -r:1-5,7,8
Note: -p, -d, -t, and -r are optional. No password is used if -p is missing. DPI will default to 96, and image type will default to PNG. If -r is missing, all pages will be used to make images.
The DoConversion
function in pdftoimage.cs is the meat of the project.
static private void DoConversion(string file, string password,
string folder, int dpi, string pagerange, ImageType iType)
{
System.Drawing.Imaging.ImageFormat format;
string extension = null;
SolidFramework.License.ActivateDeveloperLicense();
switch (iType)
{
case ImageType.BMP:
format = System.Drawing.Imaging.ImageFormat.Bmp;
extension = "bmp";
break;
case ImageType.JPG:
format = System.Drawing.Imaging.ImageFormat.Jpeg;
extension = "jpg";
break;
case ImageType.PNG:
format = System.Drawing.Imaging.ImageFormat.Png;
extension = "png";
break;
case ImageType.TIFF:
format = System.Drawing.Imaging.ImageFormat.Tiff;
extension = "tif";
break;
default:
throw new ArgumentException("DoConversion: ImageType not known");
}
SolidFramework.Pdf.PdfDocument doc =
new SolidFramework.Pdf.PdfDocument(file, password);
doc.Open();
if (!Directory.Exists(folder))
{
Directory.CreateDirectory(folder);
}
string filename = folder + Path.DirectorySeparatorChar +
Path.GetFileNameWithoutExtension(file);
List<solidframework.pdf.plumbing.pdfpage> Pages =
new List<solidframework.pdf.plumbing.pdfpage>(doc.Catalog.Pages.PageCount);
SolidFramework.Pdf.Catalog catalog =
(SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);
SolidFramework.Pdf.Plumbing.PdfPages pages =
(SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;
ProcessPages(ref pages, ref Pages);
PageRange ranges = null;
bool bHaveRanges = false;
if (!string.IsNullOrEmpty(pagerange))
{
bHaveRanges = PageRange.TryParse(pagerange, out ranges);
}
if (bHaveRanges)
{
int[] pageArray = ranges.ToArray();
foreach (int number in pageArray)
{
CreateImageFromPage(Pages[number], dpi, filename, number,
extension, format);
Console.WriteLine(string.Format("Processed page {0} of {1}", number,
Pages.Count));
}
}
else
{
int pageIndex = 0;
foreach (SolidFramework.Pdf.Plumbing.PdfPage page in Pages)
{
pageIndex++;
CreateImageFromPage(page, dpi, filename, pageIndex, extension, format);
Console.WriteLine(string.Format("Processed page {0} of {1}", pageIndex,
Pages.Count));
}
}
}
First, we setup the trial license for Solid Framework with the call to ActivateDeveloperLicense()
. We then setup format
and extension
of the format type and extension name of the image files.
After this, we create the PDFDocument
object and hand it the filename
to open and password
if the PDF file is encrypted (secured).
Once the document is open, we check to see if the output folder exists, and if it does not, we create it.
Now, we use Solid Framework to get the Pages
dictionary and walk it to find all the PdfPage
objects:
List<solidframework.pdf.plumbing.pdfpage> Pages =
new List<solidframework.pdf.plumbing.pdfpage />(doc.Catalog.Pages.PageCount);
SolidFramework.Pdf.Catalog catalog =
(SolidFramework.Pdf.Catalog)SolidFramework.Pdf.Catalog.Create(doc);
SolidFramework.Pdf.Plumbing.PdfPages pages =
(SolidFramework.Pdf.Plumbing.PdfPages)catalog.Pages;
ProcessPages(ref pages, ref Pages);
We now figure out if we have a range of pages or not. If the argument is null or empty for range, we will process all the pages in the documents. If the argument is not empty or null
, we'll use the PageRange
object in the Solid Framework to get us an integer array of page indices.
From each page we need to process, we then make the call to CreateImageFromPage()
to finally create the images.
private static void CreateImageFromPage(SolidFramework.Pdf.Plumbing.PdfPage page,
int dpi, string filename, int pageIndex, string extension,
System.Drawing.Imaging.ImageFormat format)
{
Bitmap bm = page.DrawBitmap(dpi);
string filepath = string.Format(filename + "-{0}.{1}",
pageIndex, extension);
if (File.Exists(filepath))
File.Delete(filepath);
bm.Save(filepath, format);
bm.Dispose();
}
We request the Page
object to return us back a Bitmap
image by calling its method DrawBitmap
with the specified DPI. We ask the Bitmap
object to save the file, and cleanup for each processed page.
Points of interest
CommandLine.Utility takes the pain out of command line parsing. The PageRange
object takes care of the page range parsing for us. Once we have the Pages
list, we just request each Page
object to give us a bitmap in the DPI we require.
The images will have a small watermark at the bottom which will go away with the purchase of a license.
History
- September 11, 2009 - v1.0: Initial release.