SVN repository on Google code

Introduction

This article discusses how to create an ASP.NET PDF Viewer User Control that is not dependent on Acrobat software being installed.

Fundamental Concepts

Get a page count of the PDF document that needs to be viewed to define your page number boundaries (PdfLibNet - XPDF)
Convert the PDF document (specific page on demand) to a raster image format (PdfLibNet - XPDF)
Convert the current page to be viewed into a PNG file
Display the PNG file in an image on a web page

Several utility classes were created or added from others which expose functionality needed from the various helper libraries.

AFPDFLibUtil.vb (contains methods to create Bookmark HTML, Search, get page count, convert PDF to PNG)
ImageUtil.vb (contains methods for image manipulation such as resize, rotation, conversion, etc.)
ASPPDFLib.vb (contains generic wrapper functions that call specific technologies)
PDFViewer.ascx.vb (contains code behind for the PDF Viewer User Control)
PDFViewer.ascx (contains client side HTML/JavaScript for the PDF Viewer User Control)

Using the Code

ASP Server Configuration Requirements

You must give the ASPNET user (IISUSR or ASPNET or Network services user) permission to modify (read/write) the /PDF and /render directories.
You must give the ASPNET user (IISUSR or ASPNET or Network services user) permission to Read & Execute the /bin diirectory.
The DLL PDFLibNet.dll must be available to the page. You might have to register it with the GAC depending on your operating system to make it available to the application.
PDFLibNet.dll and PDFLibCmdLine.exe must both be compiled with the same architecture (x86 or x64)
If x86 is used, you must set the advanced settings of the AppPool to allow execution of 32-bit applications if you are on a 64-bit OS.

This project consists of 3 DLLs that must all be in the same directory:

PDFLibNET.dll
StatefullScrollPanel.dll
PDFViewASP.dll

To place a PDF Viewer User Control on a web page:

<uc1:PDFViewer ID="PDFViewer1" runat="server" />

Set the FileName property to view the PDF file in the code behind:

      Dim pdfFileName As String = Request.MapPath("PDF") & "\" & "myPDF.pdf"
      If ImageUtil.IsPDF(pdfFileName) Then
        ErrorLabel.Visible = False
        PDFViewer1.FileName = pdfFileName
      Else
        ErrorLabel.Text = "Only PDF files (*.pdf) are allowed to be viewed."
        ErrorLabel.Visible = True
      End If

The essential part of this solution is extracting the current page to be viewed from a PDF file. Since we are using ASP.NET, I chose to implement a file based solution to avoid memory management issues when trying to persist PDF byte streams for multiple clients. I chose to extract a page from PDF using PDFLibNet and store it to the File System as a PNG image. I chose PNG since it uses ZIP compression which results in a lossless compressed image and small file size.

    'Modified for ASP usage
    Public Shared Function GetPageFromPDF(ByVal filename As String, _
	ByVal destPath As String, ByRef PageNumber As Integer, _
	Optional ByVal DPI As Integer = RENDER_DPI, _
	Optional ByVal Password As String = "", _
	Optional ByVal searchText As String = "", _
	Optional ByVal searchDir As SearchDirection = 0) As String
    GetPageFromPDF = ""
    Dim pdfDoc As New PDFLibNet.PDFWrapper
    pdfDoc.RenderDPI = 72
    pdfDoc.LoadPDF(filename)
    If Not Nothing Is pdfDoc Then
      pdfDoc.CurrentPage = PageNumber
      pdfDoc.SearchCaseSensitive = False
      Dim searchResults As New List(Of PDFLibNet.PDFSearchResult)
      If searchText <> "" Then
        Dim lFound As Integer = 0
        If searchDir = SearchDirection.FromBeginning Then
          lFound = pdfDoc.FindFirst(searchText, _
          	PDFLibNet.PDFSearchOrder.PDFSearchFromdBegin, False, False)
        ElseIf searchDir = SearchDirection.Forwards Then
          lFound = pdfDoc.FindFirst(searchText, _
          	PDFLibNet.PDFSearchOrder.PDFSearchFromCurrent, False, False)
        ElseIf searchDir = SearchDirection.Backwards Then
          lFound = pdfDoc.FindFirst(searchText, _
          	PDFLibNet.PDFSearchOrder.PDFSearchFromCurrent, True, False)
        End If
        If lFound > 0 Then
          If searchDir = SearchDirection.FromBeginning Then
            PageNumber = pdfDoc.SearchResults(0).Page
            searchResults = GetAllSearchResults(filename, searchText, PageNumber)
          ElseIf searchDir = SearchDirection.Forwards Then
            If pdfDoc.SearchResults(0).Page > PageNumber Then
              PageNumber = pdfDoc.SearchResults(0).Page
              searchResults = GetAllSearchResults(filename, searchText, PageNumber)
            Else
              searchResults = SearchForNextText(filename, searchText, _
						PageNumber, searchDir)
              If searchResults.Count > 0 Then
                PageNumber = searchResults(0).Page
              End If
            End If
          ElseIf searchDir = SearchDirection.Backwards Then
            If pdfDoc.SearchResults(0).Page < PageNumber Then
              PageNumber = pdfDoc.SearchResults(0).Page
              searchResults = GetAllSearchResults(filename, searchText, PageNumber)
            Else
              searchResults = SearchForNextText(filename, searchText, _
						PageNumber, searchDir)
              If searchResults.Count > 0 Then
                PageNumber = searchResults(0).Page
              End If
            End If
          End If
        End If
      End If
      Dim outGuid As Guid = Guid.NewGuid()
      Dim output As String = destPath & "\" & outGuid.ToString & ".png"
      Dim pdfPage As PDFLibNet.PDFPage = pdfDoc.Pages(PageNumber)
      Dim bmp As Bitmap = pdfPage.GetBitmap(DPI, True)
      bmp.Save(output, System.Drawing.Imaging.ImageFormat.Png)
      bmp.Dispose()
      GetPageFromPDF = output
      If searchResults.Count > 0 Then
        GetPageFromPDF = HighlightSearchCriteria(output, DPI, searchResults)
      End If
      pdfDoc.Dispose()
    End If
  End Function

In the PDFViewer code, a page number is specified and:

The page is loaded from the PDF file and converted to a PNG file.
We add the PNG file name to the ASP.NET Cache with an expiration of 5 minutes to ensure that we don't leave rendered images lying around on the server.
The ImageUrl is updated with the path to the PNG file.

PDFViewer.ascx.vb

 Private Sub DisplayCurrentPage(Optional ByVal doSearch As Boolean = False)
   'Set how long to wait before deleting the generated PNG file
   Dim expirationDate As DateTime = Now.AddMinutes(5)
   Dim noSlide As TimeSpan = System.Web.Caching.Cache.NoSlidingExpiration
   Dim callBack As New CacheItemRemovedCallback(AddressOf OnCacheRemove)
   ResizePanels()
   CheckPageBounds()
   UpdatePageLabel()
   InitBookmarks()
   Dim destPath As String = Request.MapPath("render")
   Dim indexNum As Integer = (parameterHash("CurrentPageNumber") - 1)
   Dim numRotation As Integer = parameterHash("RotationPage")(indexNum)
   Dim imageLocation As String
   If doSearch = False Then_
     imageLocation = ASPPDFLib.GetPageFromPDF(parameterHash("PDFFileName"), _
     destPath, parameterHash("CurrentPageNumber"), parameterHash("DPI"), _
     parameterHash("Password"), numRotation)
   Else
     imageLocation = ASPPDFLib.GetPageFromPDF(parameterHash("PDFFileName"), destPath _
                                              , parameterHash("CurrentPageNumber") _
                                              , parameterHash("DPI") _
                                              , parameterHash("Password") _
                                              , numRotation, parameterHash("SearchText") _
                                              , parameterHash("SearchDirection") _
                                              )
     UpdatePageLabel()
   End If
   ImageUtil.DeleteFile(parameterHash("CurrentImageFileName"))
   parameterHash("CurrentImageFileName") = imageLocation
   'Add full filename to the Cache with an expiration
   'When the expiration occurs, it will call OnCacheRemove which will delete the file
   Cache.Insert(New Guid().ToString & "_DeleteFile", imageLocation, _
   	Nothing, expirationDate, noSlide, _
   	System.Web.Caching.CacheItemPriority.Default, callBack)
   Dim matchString As String = _
	Request.MapPath("").Replace("\", "\\") ' escape backslashes
   CurrentPageImage.ImageUrl = Regex.Replace(imageLocation, matchString & "\\", "~/")
 End Sub

 Private Sub OnCacheRemove(ByVal key As String, ByVal val As Object, _
  	ByVal reason As CacheItemRemovedReason)
   If Regex.IsMatch(key, "DeleteFile") Then
     ImageUtil.DeleteFile(val)
   End If
 End Sub

ASPPDFLib.vb

    Public Shared Function GetPageFromPDF(ByVal sourceFileName As String _
                                        , ByVal destFolderPath As String _
                                        , ByRef iPageNumber As Integer _
                                        , Optional ByVal DPI As Integer = 0 _
                                        , Optional ByVal password As String = "" _
                                        , Optional ByVal rotations As Integer = 0 _
                                        , Optional ByVal searchText As String = "" _
                                        , Optional ByVal searchDir As Integer = _
                                        	AFPDFLibUtil.SearchDirection.FromBeginning _
                                        ) As String
    GetPageFromPDF = AFPDFLibUtil.GetPageFromPDF(sourceFileName, _
	destFolderPath, iPageNumber, DPI, password, searchText, searchDir)
    ImageUtil.ApplyRotation(GetPageFromPDF, rotations)
  End Function

Points of Interest

This project was made possible due to various open source libraries that others were kind enough to distribute freely. I would like to thank the PDFLibNet developer Antonio Sandoval and Foo Labs (XPDF) for their efforts.

History

1.0 - Initial version
1.1 - Added Search capabilities, reduced DLL dependencies, made PDF subsystem use XPDF only
1.2 - Replaced PDFLibNet.dll to fix incorrect configuration error 0x800736B1
1.3 - Optimized search routines
1.4 - Remove outdated links to legacy content
1.5 - Updated permissions information