Click here to Skip to main content
15,886,919 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have the following code, however, to run it, the prerequisite is to have the PDFsharp library installed and I'm not able to add the reference in Excel VBA.

Sub BulkPDFExtraction()
' Reference: PDFsharp library

' Define the folder path containing the PDF files
Dim folderPath As String
folderPath = "C:\Users\vlad.zaicescu\Desktop\PDF-to-Excel\sample_invoice"

' Create a new Excel workbook and worksheet
Dim wb As Workbook
Set wb = Workbooks.Add

Dim ws As Worksheet
Set ws = wb.Sheets(1)

' Set the starting row for inserting the extracted data
Dim startRow As Integer
startRow = 1

' Get all PDF files in the folder
Dim pdfFiles() As String
pdfFiles = GetPDFFilesInFolder(folderPath)

' Extract data from each PDF file and insert into Excel
Dim i As Integer
For i = LBound(pdfFiles) To UBound(pdfFiles)
Dim pdfPath As String
pdfPath = folderPath & pdfFiles(i)

Dim extractedData As String
extractedData = ExtractDataFromPDF(pdfPath)

' Insert the extracted data into the Excel worksheet
ws.Cells(startRow, 1).Value = pdfFiles(i) ' Insert the PDF file name
ws.Cells(startRow, 2).Value = extractedData ' Insert the extracted data

startRow = startRow + 1
Next i

' Auto-fit the columns in the Excel worksheet
ws.Columns.AutoFit

' Save the workbook
Dim savePath As String
savePath = "C:\Path\to\save\output.xlsx"

wb.SaveAs savePath

' Close the workbook and clean up
wb.Close
Set ws = Nothing
Set wb = Nothing

MsgBox "Bulk extraction completed. Output saved to: " & savePath
End Sub

Function GetPDFFilesInFolder(ByVal folderPath As String) As String()
' Get all PDF files in the specified folder

Dim files() As String
Dim fileCount As Integer
fileCount = 0

Dim fileName As String
fileName = Dir(folderPath & "*.pdf")

Do While fileName <> ""
ReDim Preserve files(0 To fileCount)
files(fileCount) = fileName

fileCount = fileCount + 1
fileName = Dir
Loop

GetPDFFilesInFolder = files
End Function

Function ExtractDataFromPDF(ByVal pdfPath As String) As String
' Extract data from a PDF file using PDFsharp

' Create a PDF document
Dim document As New PdfSharp.Pdf.PdfDocument

' Open the PDF file
document.Open(pdfPath)

' Extract the text from each page
Dim extractedData As String
extractedData = ""

Dim pageCount As Integer
pageCount = document.PageCount

Dim i As Integer
For i = 0 To pageCount - 1
Dim page As PdfSharp.Pdf.PdfPage
page = document.Pages(i)

Dim content As PdfSharp.Pdf.Content.PdfContentReader
content = New PdfSharp.Pdf.Content.PdfContentReader(page)

While content.Read()
If content.GetType() Is GetType(PdfSharp.Pdf.Content.PdfLiteral) Then
Dim literal As PdfSharp.Pdf.Content.PdfLiteral
literal = DirectCast(content, PdfSharp.Pdf.Content.PdfLiteral)

extractedData = extractedData & literal.Value
End If
End While
Next i

' Close the PDF document
document.Close

ExtractDataFromPDF = extractedData
End Function

What I have tried:

I have created a class in visual studio, installed the PDFsharp library through NuGet package manager and the build the solution to receive a DLL that I can use as a reference in Excel VBA. However, when I try to add the DLL I cannot. Furthermore, I tried just adding the PDFsharp DLL from the NuGet package, which also failed.
Please see the repro video here: https://app.screencast.com/MJmEkYKUsnGsw
Posted
Comments
Dave Kreskowiak 29-Dec-23 16:25pm    
Did you write your .DLL to be exposed through COM? If not, VBA cannot use it.

Also, just creating a Class Library project and adding PDFSharp to it is not going to work. You MUST write a bunch of code, with classes and methods exposed through COM, and a wrapper around the PDFSharp functionality you want to use. VBA cannot directly interact with PDFSharp because that library itself is not a COM server.

https://learn.microsoft.com/en-us/dotnet/core/native-interop/expose-components-to-com

1 solution

To learn the basics start with a really simple example and once you know how to make that work then you can start the more complex implementation you outlined above. There are lots of examples out there, here's once :

Create a new Class Library project

Make the Class COM-Visible:

In the class that you want to expose, add the ComClass attribute to make it COM-visible.
VB.NET
Imports System.Runtime.InteropServices

<ComClass(MyClass.ClassId, MyClass.InterfaceId, MyClass.EventsId)>
Public Class MyClass

    Public Const ClassId As String = "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
    Public Const InterfaceId As String = "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
    Public Const EventsId As String = "XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

    ' Add your methods and properties here

End Class
Note: You need to replace the ClassId, InterfaceId, and EventsId with your own GUIDs. You can generate these GUIDs using Visual Studio's "Create GUID" tool.

Write Methods and Properties:

Add the methods and properties you want to expose through COM in your class.

VB
Public Function AddNumbers(ByVal a As Integer, ByVal b As Integer) As Integer
    Return a + b
End Function

Register for COM Interop:

In the project properties, go to the "Compile" tab and check the "Register for COM Interop" option.

Build the Project

Excel Add-In Configuration:

In Excel, go to "Developer" -> "Excel Add-ins" -> "Browse" and select the DLL file generated in the bin folder of your project.

Testing:

Now you can use your class library from Excel as an add-in. In a cell, you can use a formula like =MyClass.AddNumbers(3, 4) to get the sum.
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900