|
|||||||||||||||||||||
|
|||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThis article presents VB.NET code to create thumbnail images from a directory of Adobe Acrobat PDF documents. Often when looking for documents it is much easier to find what you want visually, for example seeing the cover of a document. The application was written for a website that I was developing that needed to display links to PDF documents. Instead of just showing a little PDF icon next to each document we wanted to display the front page of the actual document. As shown below, this gives the listings better aesthetics and also enables the users to find documents quicker if they recognise it.
Note: please ignore the strange text, lorem ipsum is simply dummy text for this example BackgroundThe web site was a Content Management System (CMS) so new PDF documents were uploaded to the site by the users. We then had this application scheduled as a batch service to run every 5 minutes and check for new files. In the backend system the documents have metadata stored in a SQL Server 2000 database. We would then write a flag to say the thumbnail had been created and when we generated the HTML content for the page request in ASP/ASP.NET we would return the appropriate Using the Acrobat SDK also meant we could programmically read the PDF metadata and retrieve the number of pages in the document, which could then be displayed as well. Although the end users could have entered that information it meant less work for them and a better overall impression of the web site. Another advantage was that many users relied on the number of pages to determine how large the document was rather than the more technical Kb/Mb value. ApproachTo generate the thumbnail image for each document I used the Adobe Acrobat 5.0 SDK and the Microsoft .NET 1.1 Framework. Note: do not confuse the thumbnails that are part of a PDF document with the .png files this application generates. The Acrobat SDK combined with the full version of Adobe Acrobat (sadly the free reader does not expose the COM interfaces) exposes a COM library of objects that can be used to manipulate and access PDF information. So using these COM objects via COM Interop, we can load the PDF document, get the first page and render that page to the clipboard. Then using the .NET Framework we can copy this to a bitmap, scale and combine that image and then save the result as a .gif or .png file. At first I just saved the scaled down image, but then decided to “fancy” up the thumbnail with a drop-shadow and folded corner. To achieve this effect I created a transparent .gif, called pdftemplate_portrait.gif, using Macromedia Fireworks MX where the main body of the page template was transparent. By making the bottom-left pixel transparent too we can easily set the transparent colour for a bitmap in .NET. I keep the top-right of the image white where the corner folds over, that means I can just combine the images by drawing the transparent template directly over the PDF image to achieve the final look.
Pre-requisitesThe full version of Adobe Acrobat (the free reader does not expose the COM interfaces) which exposes a COM library of objects to manipulate and access PDF information. The Adobe Acrobat 5.0 SDK which is a free download from the Adobe Solutions Network website (note: the site requires registration). The latest SDK for Acrobat 6.0 requires paid membership, so we will use the previous SDK version.
To quickly see if you have the full version of Adobe Acrobat installed, use regedit.exe and look under HKEY_CLASSES_ROOT for entry entry called AcroExch.PDDoc.
You'll also need the .NET 1.1 Framework and some PDF files to test the solution. The code was written in VB.NET using the .NET 1.1 Framework and Visual Studio.NET 2003 on Windows XP, but there is no reason it wouldn't work on Windows NT/2000 or .NET 1.0.
Using the codeThe code is quite simple with a try/catch over the main body. It is purposely in one large block so it's easy to see what it happening and to step through and examine with the debugger. Initially we create an instance of Pass the filename of the PDF documents to be opened to the ' Create the document (Can only create the AcroExch.PDDoc object using
' late-binding)
pdfDoc = CreateObject("AcroExch.PDDoc")
' Open the document
ret = pdfDoc.Open(inputFile)
If ret = False Then
Throw New FileNotFoundException
End If
' Get the number of pages
pageCount = pdfDoc.GetNumPages()
Set a reference to the first page of the document as Finally we render the PDF page to the clipboard at full size. We could have Acrobat scale the image down for us by a percentage, but we can get better visual results using the .NET scaling algorithms of the It would have been more efficient to render directly to an off-screen bitmap, and also not have overwritten what ever was previously on the clipboard, but I found the clipboard method the most stable way to get a rendered bitmap of the page using Acrobat. Although it looks like the Note: the
' Get the first page
pdfPage = pdfDoc.AcquirePage(0)
' Get the size of the page
' This is really strange bug/documentation problem
' The PDFRect you get back from GetSize has properties
' x and y, but the PDFRect you have to supply CopyToClipboard
' has left, right, top, bottom
pdfRectTemp = pdfPage.GetSize
' Create PDFRect to hold dimensions of the page
pdfRect = CreateObject("AcroExch.Rect")
pdfRect.Left = 0
pdfRect.right = pdfRectTemp.x
pdfRect.Top = 0
pdfRect.bottom = pdfRectTemp.y
' Render to clipboard, scaled by 100 percent (ie. original size)
' Even though we want a smaller image, better for us to scale in .NET
' than Acrobat as it would greek out small text
' see http://www.adobe.com/support/techdocs/1dd72.htm
Call pdfPage.CopyToClipboard(pdfRect, 0, 0, 100)
Dim clipboardData As IDataObject = Clipboard.GetDataObject()
Grab the rendered page bitmap from the clipboard and based on the
Dim pdfBitmap As Bitmap = clipboardData.GetData(DataFormats.Bitmap)
' Size of generated thumbnail in pixels
Dim thumbnailWidth As Integer = 38
Dim thumbnailHeight As Integer = 52
Dim templateFile As String
' Switch between portrait and landscape
If (pdfRectTemp.x < pdfRectTemp.y) Then
templateFile = templatePortraitFile
Else
templateFile = templateLandscapeFile
' Swap width and height (little trick not using third temp variable)
thumbnailWidth = thumbnailWidth Xor thumbnailHeight
thumbnailHeight = thumbnailWidth Xor thumbnailHeight
thumbnailWidth = thumbnailWidth Xor thumbnailHeight
End If
Load the template file as as Render the Next create a blank bitmap with room for the template border. Set the
Using the new blank bitmap, draw the rendered pdf page image to it and then the template with transparency directly over the top. Because it is transparent the main area of the page template will still appear through. Finally, save the composited image back as a .png or .gif file, although .png does look better. ' Load the template graphic
Dim templateBitmap As Bitmap = New Bitmap(templateFile)
Dim templateImage As Image = Image.FromFile(templateFile)
' Render to small image using the bitmap class
Dim pdfImage As Image = pdfBitmap.GetThumbnailImage(thumbnailWidth, _
thumbnailHeight, _
Nothing, Nothing)
' Create new blank bitmap (+ 7 for template border)
Dim thumbnailBitmap As Bitmap = New Bitmap(thumbnailWidth + 7, _
thumbnailHeight + 7, _
Imaging.PixelFormat.Format32bppArgb)
' To overlayout the template with the image, we need to set the transparency
' http://www.sellsbrothers.com/writing/default.aspx?
' content=dotnetimagerecoloring.htm
templateBitmap.MakeTransparent()
Dim thumbnailGraphics As Graphics = Graphics.FromImage(thumbnailBitmap)
' Draw rendered pdf image to new blank bitmap
thumbnailGraphics.DrawImage(pdfImage, 2, 2, thumbnailWidth, thumbnailHeight)
' Draw template outline over the bitmap (pdf with show through the
' transparent area)
thumbnailGraphics.DrawImage(templateImage, 0, 0)
' Save as .png file
thumbnailBitmap.Save(outputFile, Imaging.ImageFormat.Png)
Write some feedback to the console as we work through each of the files. Then actively release the reference code to the COM objects as Acrobat it isn't the best suited application to opening and closing multiple PDF documents without falling over. Luckily the code doesn't cause Acrobat to display any UI that might cause the process to hang waiting for user interaction. Console.WriteLine("Generated thumbnail... {0}", outputFile)
thumbnailGraphics.Dispose()
pdfDoc.Close()
Marshal.ReleaseComObject(pdfPage)
Marshal.ReleaseComObject(pdfRect)
Marshal.ReleaseComObject(pdfDoc)
| ||||||||||||||||||||