PDF to BMP using Adobe Reader and API Functions






4.80/5 (5 votes)
Extract pages from a .pdf file and save as bitmaps
Introduction
For one of my projects, I needed to extract pages from .pdf files. I needed the images to be as sharp as possible, that means with the original resolution (DPI). There is a lot of software that claims to extract images from pdfs, and I tried several solutions. They do a poor job when it comes to saving an image with the original DPI. One might think that the original size is the same as a 100% zoomlevel view in Adobe Reader, but often that is not true. And when the image is saved as for example .jpg, it becomes even more distorted. So I decided to write my own.
Background
I was looking for a free solution. The Adobe Reader is free, and comes with an ActiveX control that can be embedded in VB6. However, the available functions are very limited (as opposed to the ActiveX that comes with Adobe Pro). So it was a challenge to extract the pages, and I had to turn to API calls to find (child) windows and send them messages. Another thing was that I wanted to hide the ActiveX Reader Window, because it is not pleasant to look at, being selected and deselected, and resized regularly. To send keystrokes and mouseclicks to a hidden window, or get it to repaint, requires extra coding.
Using the Code
The code addresses several topics:
- Find the handle of any window within the application by its Classname and Text
- Find the number of pages in a .pdf
- Get the DPI of any page in a .pdf
- Two methods to extract a page from a .pdf:
-
- Send a mouse click to a hidden window
- Simulate a Control-C input with API function
keybd_event
- Get data from the Clipboard with API functions
-
- Two methods to get another DPI than the original
- Resize a
PictureBox.Picture
to a high-quality image in anotherPictureBox
- Paint a hidden window's content to a
PictureBox
-
- Save as image with various image format options (bmp, gif, jpg, png, tif) using GDI+
Download the source code to view all these issues with explanatory comments. The following snippets address two of these issues:
- Find the handle of any window within the application by its
Classname
andText
:
Private Function FindWindowHandle(ByVal hwnd As Long, _
SelectClass As String, SelectText As String, bSelect As Boolean) As Long
'' A recursive function to go through all the descendant windows of window with handle hwnd
'' Returns handle for window with Classname = SelectClass and Window Text
'' that either contains SelectText if bSelect = True; or does not contain SelectText is bSelect = False
'' (There often is more than one window with the same class name)
'' SelectText may be empty ("") and then this function only searches for a Classname
'' Note : hwnd has to be ByVal
Dim sClass As String, sText As String
Dim sLen As Long
Dim ParentHwnd As Long
Dim FoundHwnd As Long
FoundHwnd = 0
''Get Class name of window with handle hwnd
sClass = Space(64)
sLen = GetClassName(hwnd, sClass, 63)
sClass = Left(sClass, sLen)
If StrComp(sClass, SelectClass, 1) = 0 Then
If SelectText <> "" Then
''Get Window Text of window
sText = Space(256)
sLen = SendMessageS(hwnd, WM_GETTEXT, 255, sText)
sText = Left(sText, sLen)
''If bSelect = True : If the text matches we have found the window
If bSelect = True Then
If InStr(sText, SelectText) > 0 Then
''FoundHwnd is the handle for the window with the required Classname and Text
FoundHwnd = hwnd
End If
Else
''If bSelect = False : If the text does not match we have found the window
If InStr(sText, SelectText) = 0 Then
FoundHwnd = hwnd
End If
End If
Else
FoundHwnd = hwnd
End If
End If
'' If the window is found, return its handle and exit
If FoundHwnd <> 0 Then
FindWindowHandle = FoundHwnd
Exit Function
End If
'' If the window is not found, look for the next child window
ParentHwnd = hwnd
hwnd = FindWindowX(hwnd, 0, 0, 0)
Do While hwnd
''Recursion : this function calls itself to find child windows of the child windows,
''so all descendants, not just one level of child windows
FoundHwnd = FindWindowHandle(hwnd, SelectClass, SelectText, bSelect)
If FoundHwnd <> 0 Then
Exit Do
End If
'' FindWindowX is called repeatedly to find the next child window
hwnd = FindWindowX(ParentHwnd, hwnd, 0, 0)
Loop
FindWindowHandle = FoundHwnd
End Function
The second snippet shows how to send a mouse click to a hidden window:
Private Sub SendLeftClick(ByVal hwnd As Long, ByVal hwnd2 As Long, x As Long, y As Long)
''Send Left mouse click to invisible window with handle hwnd and with top-level parent window hwnd2
Dim position As Long
''Set window as active window
Call SetActiveWindow(hwnd)
''Calculate lParam to pass the mouses x and y position in the window, (x and y in pixels)
position = y * &H10000 + x
''The required messages with their wParam and lParam were found by using Spy++
Call SendMessage(hwnd, WM_MOUSEACTIVATE, ByVal hwnd2, _
ByVal CLng(&H2010001)) ''lParam is HTCLIENT(=1, low) and WM_LBUTTONDOWN(= &H201, high)
Call SendMessage(hwnd, WM_SETCURSOR, ByVal CLng(0), ByVal CLng(&H2010001))
Call SendMessage(hwnd, WM_LBUTTONDOWN, ByVal CLng(1), ByVal position)
Call SendMessage(hwnd, WM_LBUTTONUP, ByVal CLng(0), ByVal position)
End Sub
Points of Interest
The application was written in VB6, I still like it a lot over .NET, and IMHO it shows that anything can be done with VB6 and a few API calls. But of course, if you have another favorite programming language, the source can be rewritten, that should be fairly easy to achieve if you are familiar with API, because the API functions are the core of this application.
History
Update: Adapted the code for Acrobat Reader DC. The DC ActiveX does not work with VB6 (nor with Visual Basic 2015, for that matter). Solution: Rename C:\Program Files (x68)\Common Files\Adobe\Acrobat\ActiveX\ and add a new directory (... )\ActiveX\ with the Acropdf.dll from the download in it. (The code also still works with previous versions of the Reader.) Also added: Save images in various image formats, and a routine to make PrintWindow()
work with Adobe. The API function PrintWindow
is notorious for returning black images with some applications, like Adobe. So I needed to add a check for black results, but more can be done to optimize the result.
Call RedrawWindow(PageViewhWnd, ByVal 0&, ByVal 0&, _
RDW_ERASE Or RDW_INVALIDATE Or RDW_FRAME Or RDW_ALLCHILDREN Or RDW_UPDATENOW)
'' Printwindow often returns a black screen,
'' especially with some applications (Adobe, among others)
'' The problem is that the window has not finished
'' its asynchronous painting when Printwindow is done
'' The following functions seem to improve the Printwindow result
'' bij adding to the time that Printwindow is busy
'' Also, when printing a large window the chance that it returns black increases
'' It seems that 1024 X 1024 pixels blocks can be returned in most cases,
'' so PicSrc (container for Acropdf) is 1024 X 1024 and the fullsize AcroPdf
'' (with PageView-window) is moved (for example, AcroPDf1.Top = -1024)
For i = 1 To 5
'' Repeatedly call Printwindow
PrintWindow picSrc.hWnd, PicTemp.hDC, 0&
For j = 1 To 1000
'' Send extra WM_PAINTs (or call RedrawWindow in this loop, that also works but takes longer)
retVal = PostMessage(PageViewhWnd, WM_PAINT, PicTemp.hDC, 0&)
Next j
Next i