 |
|
 |
Hi ,
I am using pdfbox ,IKVM dll s reference in website to convert pdf file to text file.
I am using
Protected Function parseUsingPDFBox(ByVal filename As String) As String
Dim doc As PDDocument = PDDocument.load(filename)
Dim stripper As New PDFTextStripper()
Return stripper.getText(doc)
End Function
in my asp.net page and it is showing message child could not be evaluated.please let me know how to use this pdf box and convert to text from pdf file .
Thanks in advance.
|
|
|
|
 |
|
 |
Hi, I am trying to scrap text from pdf file to html, but am getting following error :
" Could not load file or assembly 'FontBox-0.1.0-dev, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. A strongly-named assembly is required. (Exception from HRESULT: 0x80131044)":"FontBox-0.1.0-dev, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null "
error is saying that "A strongly-named assembly is required" but all dll's are already have strong name.
I am using following dll with strong name,
PDFBox-0.7.3.dll
bcprov-jdk14-132.dll
FontBox-0.1.0-dev.dll
IKVM.GNU.Classpath.dll
IKVM.Runtime.dll
using following namespaces:
using org.pdfbox.util;
using org.pdfbox.pdmodel;
using org.pdfbox.examples.pdmodel;
using org.pdfbox.exceptions;
using org.pdfbox.pdmodel.documentinterchange;
using org.pdfbox.pdmodel.interactive.documentnavigation.outline;
using org.pdfbox.pdmodel.interactive.documentnavigation.destination;
using org.pdfbox.pdfparser;
using org.pdfbox.pdmodel.encryption;
using java.lang;
using java.io;
using java.util;
using org.pdfbox.pdfwriter;
using org.pdfbox.cos;
using IKVM.Attributes;
using IKVM.Runtime;
using org.pdfbox.pdmodel.font;
C# code :
1. PDDocument doc = new PDDocument();
2. PDFText2HTML PDFhtml = new PDFText2HTML();
3. doc = PDDocument.load(FileLocation + FileName);
4. string FileText = PDFhtml.getText(doc);
on 4th line I am getting the above error...
please give me solution, I seached a lot on internet but unable to get proper solution.
|
|
|
|
 |
|
|
 |
|
 |
Hi Dan!!
Thanks for the excellent article.
I wanted to read a PDF through VB.NET. Earlier, I had tried using ITextSharp(using SimpleTextExtractionStrategy), but the returned pdf contents were malformed with erroneous spaces, forcing me to look for alternatives. And thats when I found this article as most hit for solution.
I see that this article is quite old (it references PDFBox 0.7.2) while the current version of the PDFBox is 1.5. I also read about some of the guys struggling to have PDFBox 1.5 converted to a DLL, since they get loads of warning messages while using IKVMC. I was faced with the same ordeal, but I decided to use the IKVMC created PDFBox 1.5 DLL anyways and it worked just perfect. The pdf contents were as expected with no malformations whatsoever.
Thanks again!
Regards,
Tushar P
|
|
|
|
 |
|
 |
Hello Tushar,
so you got a 1.5 NET version of PDFBox? can you please share?
Thanks
|
|
|
|
 |
|
 |
Hi,
Sorry for the late reply.
Im pretty new to CodeProject. Is there a way I can upload the PDFBox 1.5 dll here ?
Regards,
Tushar
|
|
|
|
 |
|
 |
Can you pls post a downloadlink for the pdfbox 1.5.0 dll version?
regards Dirk
|
|
|
|
 |
|
 |
Hello,
I'm having real trouble trying to get hold of the latest PDFBox dll for .NET. I see in your post you have been able to compile them. Would you be kind enough to send the file 1.5 or 1.6 to me?
You would be a life saver if your could!
Thanks.
|
|
|
|
 |
|
|
 |
|
|
 |
|
 |
Hi!! Tusharap,
Can you please share 1.5 NET version of PDFBox.
Please do the need full..
|
|
|
|
 |
|
 |
Very good! I tested veriosn 0.7.3 and it is needed to add reference to FontBox-0.1.0-dev.dll to make it work.
|
|
|
|
 |
|
 |
I found the same. Adding FontBox dll to the bin directory did the trick!
|
|
|
|
 |
|
 |
this article helps me a lot... THANK YOU
|
|
|
|
 |
|
 |
i am using following code to convert pdf document into text
PDDocument doc = PDDocument.load(strPDFFilesNameWithPath);
PDFTextStripper pdfStripper = new PDFTextStripper();
string strContent = pdfStripper.getText(doc);
it working fine for most of the pdf document. For some document its throwing error
"Object reference not set to an instance of an object" at bolded line of the code shown above.
The document which it throwing error is of pdf version 1.5
can you help me to overcome the problem
regards
kumaran
|
|
|
|
 |
|
 |
I am having the same issue with PDF version 1.5
PDF version 1.4 works fine for me. At least it has with the few tests I've thrown at it. I'll update this if I discover a solution.
|
|
|
|
 |
|
 |
I was able to extract text from PDF files version 1.5 but I had to use PDFBox 1.5 and use IKVM to create a .NET version to create the DLLs I needed to use. It was definitely a time consuming job and although it does extract the text from the PDF files, now I realized that it does not extract it in the correct "lay-out" as it is displayed in the PDF file.
|
|
|
|
 |
|
 |
I dicovered that when you try to extract from a pdf file with some type of protection this exception occurs.
try to use pdfs with no protections and this code will work fine.
unfortunatelly I dont know a solution for this and wich type of protection causes this...
|
|
|
|
 |
|
 |
this method is very useful i beginner in .Net and so i need to convert many pdf files to text not only one in random choosing from folder
|
|
|
|
 |
|
 |
Hi all,
I found this article very usefull. I have tried out the sample code and its really very good. I found this is doing the PDF parsing each row from the begining of the PDF file.
But I am looking for some solution where I can get the field values which ever I need. Below is my requirment.
Need to parse one PDF document and capture the required field from PDF to an excel sheet using the c#.
Please help me out on this and also let me know if we have any property in the PDFBox to oin down this issue.
Thanks in Advance,
Auro...
|
|
|
|
 |
|
 |
Made it work real nice, it also extracted special chars for swedish alphabet that itextsharp doesn't
|
|
|
|
 |
|
 |
I am having a nightmare trying to get this to work using vs2010 .net 4 winforms app just to test it. My code is a simple as you can get and regardless seems to keep throwing a file not found error from the IKVM.GNU.Classpath
I have tested creating a filestream and received the same exception, and creating a File instance and testing if it exists also return false despite the path being 100% correct. Any thoughts would be appreciated!
Chris
|
|
|
|
 |
|
 |
Solved this issue or if it has multiple origins i solved one of them.
You need to add another dll file to the "bin" directory.
For me it was "FontBox-0.1.0-dev.dll"
//Kristoffer
|
|
|
|
 |
|
|
 |
|
 |
Incomplete and unprecise.
|
|
|
|
 |