Introduction
This article discusses how to create a .NET PDF Viewer control that is not dependent on Acrobat software being installed.
Fundamental Concepts
The basic steps that need to take place in order to view a PDF document:
- Get a page count of the PDF document that needs to be viewed to define your page number boundaries (iTextSharp or PDFLibNET)
- Convert the PDF document (specific page on demand) to a raster image format (GhostScript API or PDFLibNET)
- --(Deprecated) Extract only the current frame to be viewed from the raster image (FreeImage.Net)
- Convert the current frame to be viewed into a
System.Image
- Display the current frame in a
PictureBox
control
Several utility classes were created or added from others which expose functionality needed from the various helper libraries.
- GhostScriptLib.vb (contains methods to convert PDF to TIFF for Viewing and Printing)
- AFPDFLibUtil.vb (contains methods to convert PDF to
System.Image
for Viewing and Printing as well as methods to create a Bookmark TreeView) - iTextSharpUtil.vb (contains methods for getting PDF page count, converting images to searchable PDF and for extracting PDF bookmarks into TreeNodes)
- PrinterUtil.vb (contains methods for sending images to printers)
- ImageUtil.vb (contains methods for image manipulation such as resize, rotation, conversion, etc.)
- TesseractOCR.vb (contains methods for Optical Character Recognition from images)
- PDFViewer.vb (contains the Viewer user control)
I was tempted to move every function over to PDFLibNet (XPDF) which is faster, but after a lot of testing, I decided to use Ghostscript and PDFLibNET. Ghostscript is used for printing, "PDF to image" conversion, and as a secondary renderer in case of XPDF incompatibility. PDFLibNET is used for quick PDF to screen rendering, searching, and bookmarks.
Using the Code
This project consists of 7 DLLs that must all be in the same directory:
- FreeImage.dll
- FreeImageNET.dll
- gsdll32.dll
- itextsharp.dll
- PDFLibNET.dll
- tessnet2_32.dll
- PDFView.dll
Due to file size restrictions, I could not include the Ghostscript 8.64 DLL (gsdll32.dll) in the source code. Please download the Win32 Ghostscript 8.64 package from sourceforge.net and place the file "gsdll32.dll" into the \PDFView\lib directory where the other DLLs already exist.
To place a PDF control on form:
Dim PDFFileName As String = "MyPDF.pdf"
Dim PDFViewer As New PDFView.PDFViewer
PDFViewer.FileName = OpenFileDialog1.FileName
PDFViewer.Dock = DockStyle.Fill
Me.Controls.Add(PDFViewer)
The essential part of this solution is extracting the current frame to be viewed from a multi-frame (or single frame) image. At first I used System.Drawing
to implement it. I found this to be slower than other C++ solutions that use DIBs (Device Independent Bitmaps) to perform graphic conversions.
Public Shared Function GetFrameFromTiff_
(ByVal Filename As String, ByVal FrameNumber As Integer) As Image
Dim fs As FileStream = File.Open(Filename, FileMode.Open, FileAccess.Read)
Dim bm As System.Drawing.Bitmap = _
CType(System.Drawing.Bitmap.FromStream(fs), System.Drawing.Bitmap)
bm.SelectActiveFrame(FrameDimension.Page, FrameNumber)
Dim temp As New System.Drawing.Bitmap(bm.Width, bm.Height)
Dim g As Graphics = Graphics.FromImage(temp)
g.InterpolationMode = InterpolationMode.NearestNeighbor
g.DrawImage(bm, 0, 0, bm.Width, bm.Height)
g.Dispose()
GetFrameFromTiff = temp
fs.Close()
End Function
I then tried implementing FreeImage
with a .NET wrapper which gave it a little speed boost. FreeImage
also has a ton of image conversion functions which may come in handy if you wanted to extend this into an editor.
Public Shared Function GetFrameFromTiff2_
(ByVal Filename As String, ByVal FrameNumber As Integer) As Image
Dim dib As FIMULTIBITMAP = New FIMULTIBITMAP()
dib = FreeImage.OpenMultiBitmapEx(Filename)
Dim page As FIBITMAP = New FIBITMAP()
page = FreeImage.LockPage(dib, FrameNumber)
GetFrameFromTiff2 = FreeImage.GetBitmap(page)
page.SetNull()
FreeImage.CloseMultiBitmapEx(dib)
End Function
I ended up implementing PDFLibNET
which gave it a substantial speed boost since the amount of File I/O operations were reduced. Another streamlined routine for extracting one page from a PDF was added to the Ghostscript
utility class as well.
AFPDFLibUtil.vb
Public Shared Sub DrawImageFromPDF(ByRef pdfDoc As AFPDFLibNET.AFPDFDoc,
ByVal PageNumber As Integer, ByRef oPictureBox As PictureBox)
If pdfDoc IsNot Nothing Then
pdfDoc.CurrentPage = PageNumber
pdfDoc.CurrentX = 0
pdfDoc.CurrentY = 0
pdfDoc.RenderDPI = RENDER_DPI
pdfDoc.RenderPage(oPictureBox.Handle.ToInt32())
oPictureBox.Image = Render(pdfDoc)
End If
End Sub
Public Shared Function Render(ByRef pdfDoc As AFPDFLibNET.AFPDFDoc) As Bitmap
If pdfDoc IsNot Nothing Then
Dim backbuffer As New Bitmap(pdfDoc.PageWidth, pdfDoc.PageHeight)
Dim g As Graphics = Graphics.FromImage(backbuffer)
Using g
Dim lhdc As Integer = g.GetHdc().ToInt32()
pdfDoc.RenderHDC(lhdc)
g.ReleaseHdc()
End Using
g.Dispose()
Return backbuffer
End If
Return Nothing
End Function
GhostScriptLib.vb
Public Shared Function GetPageFromPDF(ByVal filename As String,
ByVal PageNumber As Integer, Optional ByVal ToPrinter As Boolean = False) As Image
Dim converter As New ConvertPDF.PDFConvert
Dim Converted As Boolean = False
converter.RenderingThreads = Environment.ProcessorCount
converter.OutputToMultipleFile = False
If PageNumber > 0 Then
converter.FirstPageToConvert = PageNumber
converter.LastPageToConvert = PageNumber
Else
GetPageFromPDF = Nothing
Exit Function
End If
converter.FitPage = False
converter.JPEGQuality = 70
If ToPrinter = True Then
converter.TextAlphaBit = -1
converter.GraphicsAlphaBit = -1
converter.ResolutionX = PRINT_DPI
converter.ResolutionY = PRINT_DPI
Else
converter.TextAlphaBit = 4
converter.GraphicsAlphaBit = 4
converter.ResolutionX = VIEW_DPI
converter.ResolutionY = VIEW_DPI
End If
converter.OutputFormat = COLOR_PNG_RGB
Dim input As System.IO.FileInfo = New FileInfo(filename)
Dim output As String = System.IO.Path.GetTempPath & Now.Ticks & ".png"
Converted = converter.Convert(input.FullName, output)
If Converted Then
GetPageFromPDF = New Bitmap(output)
ImageUtil.DeleteFile(output)
Else
GetPageFromPDF = Nothing
End If
End Function
In the PDFViewer code, a page number is specified and:
- The page is loaded from the PDF file and converted to a
System.Image
object. - The
PictureBox
is updated with the image.
Private Function ShowImageFromFile(ByVal sFileName As String,
ByVal iFrameNumber As Integer, ByRef oPictureBox As PictureBox,
Optional ByVal XPDFDPI As Integer = 0) As Image
oPictureBox.Invalidate()
If mUseXPDF Then
If ImageUtil.IsPDF(sFileName) Then
If XPDFDPI > 0 Then
AFPDFLibUtil.DrawImageFromPDF(mPDFDoc, iFrameNumber + 1,
oPictureBox, XPDFDPI)
Else
AFPDFLibUtil.DrawImageFromPDF(mPDFDoc, iFrameNumber + 1, oPictureBox)
End If
End If
Else
If ImageUtil.IsPDF(sFileName) Then
oPictureBox.Image = ConvertPDF.PDFConvert.GetPageFromPDF(sFileName,
iFrameNumber + 1)
ElseIf ImageUtil.IsTiff(sFileName) Then
oPictureBox.Image = ImageUtil.GetFrameFromTiff(sFileName, iFrameNumber)
End If
End If
oPictureBox.Update()
Return oPictureBox.Image
End Function
Points of Interest
This project was made possible due to various open source libraries that others were kind enough to distribute freely. I would like to thank all of the Ghostscript, FreeImage.NET, iTextSharp, TessNet, and AFPDFLib (PDFLibNet) developers for their efforts.
History
- 19th June, 2009: 1.0 Initial release
- 22nd June, 2009: Updated source code to correctly scale printed pages to the Printable Page Area of the printer that is selected
- 7th July, 2009: Updated source code to use AFPDFLib(XPDF) or Ghostscript for PDF rendering
- 15th July, 2009: Updated source code to use PDFLibNet(XPDF ver 3.02pl3) and added search/export options
- 22nd July, 2009: Added "Image to PDF" import, password prompt for encrypted PDF files, fallback rendering to Ghostscript if XPDF fails, latest version of PDFLibNet with various bug fixes applied, and LZW compression for "PDF to TIFF" export
- 20th August, 2009: Major changes:
- Added the ability to convert images into a searchable PDF (OCR is English only for now)
- Added the ability to export a PDF to an HTML Image Viewer
- Pages are only rendered at the DPI needed to fill the Viewer window (good speed increase)
- Rotated page settings are kept while viewing the document
- Added the ability to convert images into an encrypted PDF
- Changed bookmark tree generation to use recursion
- Multiple bug fixes (see SVN log on the repository)
- 5th October, 2009
- Fixed problem with incorrect configuration error with PDFLibNet.dll
- Removed dependencies on
FreeImage