 |
|
 |
I created another new project in C# only to test pdfbox.
It works. But old project still having same problem of exception.
How to repair this problem with that specefic c# project?
|
|
|
|
 |
|
 |
Hi friend
Can i have ur pdf to text conversion of vb.net code pls??
|
|
|
|
 |
|
 |
The file that is missing when you see this error is IKVM.Runtime.dll.
|
|
|
|
 |
|
 |
With out FontBox-*-dev.dll library you will recieve the same error
So, my working directory is follows:
19.02.2009 14:17 16 384 Pdf2Text.exe
12.10.2006 12:20 4 653 056 PDFBox-0.7.3.dll
10.08.2006 10:17 9 568 256 IKVM.GNU.Classpath.dll
19.02.2009 14:14 1 290 714 sample.pdf
12.10.2006 12:20 86 016 FontBox-0.1.0-dev.dll
10.08.2006 10:14 344 064 IKVM.Runtime.dll
|
|
|
|
 |
|
|
 |
|
 |
Did any of you use the GAC?
|
|
|
|
 |
|
 |
Why is it you all are able to load a PDFBox-0.7.x.dll into the GAC? Are you compiling your own PDFBox-0.7.x.dll with a .snk? If so, from what PDFBox-0.7.x.dll source? If not, where can I locate, for download, a strongly named PDFBox-0.7.[2 or 3 or whatever].dll?
I've tried PDFBox versions .2 and .3 and my gacutil.exe fails on adding either assembly to the cache.
But you guys appear to have no problem with that. I'm using .NET SDK v2.0.
BTW, is anybody even using the GAC? Or are you allowed to just drop these DLLs directly into a directory path and start compiling the PDF sample?
Thanks for any replies.
modified on Friday, December 26, 2008 8:10 PM
|
|
|
|
 |
|
|
 |
|
 |
hello when i converting pdf file to text that display in othere font type
Amitkumar Prajapati
Anjar(Kutch)/Baroda
|
|
|
|
 |
|
 |
I want to covert pdf to doc.I am using PDFBox-0.7.3 from code project.
I am using c# .net, while converting the file from pdf to doc the text's are converted correctly but without formatting and also I can't get the images from pdf file.
|
|
|
|
 |
|
 |
I want to read/parse the tablular data from a pdf documents. I have found some third party softwares which can covert the entire pdf to text preserving the layout(display). But none of the tools provide pre-defined seperators/delimiters between the text of each cell.
I has also investigated for some tools which can covert this pdf to html which can then be parsed. But even in the html files the entire table is represented by absolute positioned divs. It would've been easy to parse tables from HTML.
Is there any way I can read the tables from pdf document in some object(which can be easily represented in terms of rows and columns)? Or is there any third party developer library using which I can easily read the cells of a table in pdf?
Please let me know even if there is some third party software to convert the pdf containg to html document representing tabular data in html tables
Thanks in anticipation.
|
|
|
|
 |
|
 |
Hi All
I wand read text from pdf file. but my pdf file have a password.
what must i do
thanks...
|
|
|
|
 |
|
 |
wand = want
|
|
|
|
 |
|
 |
Hi,
trying to run your code i m getting this error at run time :
Could not load file or assembly 'bcprov-jdk14-132, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. The system cannot find the file specified.
on the 1st line of the given code :
PDDocument doc = PDDocument.load(filename);
I ve copied:
bcprov-jdk14-132.dll
FontBox-0.1.0-dev.dll
IKVM.GNU.Classpath.dll
IKVM.Runtime.dll
PDFBox-0.7.3.dll
from the PDFBox-0.7.3 bin directory to my project but the problem pesists
any suggentions???
|
|
|
|
 |
|
 |
Hi,
Can you tell me how to convert pdf to word.
|
|
|
|
 |
|
 |
I had copied the dll files to the bin library and inported the classpath and PDFBox dll file references, and put in the namespaces
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using org.pdfbox.util;
using org.pdfbox.pdmodel;
but it still was not working. It threw a System.IO.File exception on my input file.
The problem was the later version of PDFBox (0.7.3).
I used the following files from
http://www.netlikon.de/docs/PDFBox-0.7.2/bin/?C=M;O=A :
IKVM.Runtime.dll (9/7/2005 356K)
IKVM.GNU.Classpath.dll (9/7/2005 6.8M)
PDFBox-0.7.2.dll (9/11/2005 8.1M)
and this fixed it, along with a re-write (even though the re-write prior to the file changing did NOT solve the issue, so this wasn't the reason, but the code does make more sense this way... this is all in C#)
This assumes the input and output files have been created and are in the same directory as your built exe file. As I said in the subject, your input PDF file CANNOT be a URL path, as this is NOT supported.
static void Main() // string[] args
{
// DateTime dt = DateTime.Now;
StreamWriter writer = File.CreateText("output.txt");
writer.WriteLine(TransformPdfToText("input.pdf"));
writer.Close();
}
static string TransformPdfToText(string SourceFile)
{
PDDocument doc = PDDocument.load(SourceFile);
PDFTextStripper stripper = new PDFTextStripper();
return stripper.getText(doc);
}
Happy coding!!
-Tom
|
|
|
|
 |
|
 |
It works,
Thanks,
Can we convert to world file also?
|
|
|
|
 |
|
 |
http://www.pdftoword.com/ is a good way to convert from PDF to Word. If you are converting to text in C#, instead of naming the text file with a ".txt" extension, put a ".doc" on the end to make it a "Word document", or at least, open in Word. It will only appear as plain text, though. For Rich Text/Images, use the link I gave you.
Enjoy,
Tom
|
|
|
|
 |
|
 |
I'm using Visual Studio 2005 and C# .NET.
I put the DLL files into the bin directory, added the files as References, added these namespaces as directed in this forum:
using org.pdfbox.util;
using org.pdfbox.pdmodel;
and am trying to use this code in my Main function:
string filename = "test.pdf";
PDDocument doc = PDDocument.load(filename);
PDFTextStripper stripper = new PDFTextStripper();
StreamWriter writer = File.CreateText("output.txt");
writer.Write(stripper.getText(doc));
writer.Close();
Wouldn't this be the right way to do it? No matter what I put for filename (i.e. C:\\test.pdf, http://localhost/test.pdf when I put it in my C:\inetpub\wwwroot directory) it throws an exception on the PDDocument line: The type initializer for 'java.io.File' threw an exception.
Any help? Thanks in advance.
-Tom
|
|
|
|
 |
|
 |
The problem was the later version (0.7.3).
I used the following files from
http://www.netlikon.de/docs/PDFBox-0.7.2/bin/?C=M;O=A :
IKVM.Runtime.dll (9/7/2005 356K)
IKVM.GNU.Classpath.dll (9/7/2005 6.8M)
PDFBox-0.7.2.dll (9/11/2005 8.1M)
and this fixed it, along with a re-write following Dan's lead with his VB, I converted his back to C#
This assumes the files are in the same directory as your built exe file.
static void Main() // string[] args
{
// DateTime dt = DateTime.Now;
StreamWriter writer = File.CreateText("output.txt");
writer.WriteLine(TransformPdfToText("input.pdf"));
writer.Close();
}
static string TransformPdfToText(string SourceFile)
{
PDDocument doc = PDDocument.load(SourceFile);
PDFTextStripper stripper = new PDFTextStripper();
return stripper.getText(doc);
}
Happy coding!!
-Tom
|
|
|
|
 |
|
 |
Guys,
This program DOES NOT work for putting in web paths (URLs) to input PDFs so you can convert them to text. I know because what worked for a local file does NOT work for a URL and I get the error "URI formats are not supported." They should have put that in the documentation, but they didn't. Sorry, gents.
-Tom
|
|
|
|
 |
|
 |
Make sure your test.pdf can be opened.
|
|
|
|
 |
|
 |
Yeah, the file could be opened, just not by my program when it was at the URL path. I had IIS permissions set for Everyone to have full access to the file and that still didn't work. Even set Everyone to have full access to the folder the file was at and still got that error about URIs not supported.
I did invent a nice solution to the problem, though. I have my C# program download a copy of the file to a temporary directory using the user's CredentialsCache.DefaultCredentials, then have it do the PDF conversion on the local file in the temp folder. Works fine:
private static string pdfDownloader(string file, string tempFolderPath)
{
// Download the PDF
WebClient Client = new WebClient();
Client.Credentials = CredentialCache.DefaultCredentials;
try
{
Client.DownloadFile(file, tempFolderPath + "newcontent.pdf");
return "ok";
}
catch (WebException ex)
{
return ex.Message;
}
}
which downloads the PDF file as the file "newcontent.pdf" in the Temp folder, and then you can strip out the text:
String tempFolderPath = Path.GetTempPath();
// Download the PDF
while (status != "ok")
{
status = pdfDownloader(file, tempFolderPath);
}
bool fileTester = File.Exists(tempFolderPath + "newcontent.pdf");
if (fileTester == true) {
StreamWriter writeTextFile = File.CreateText(tempFolderPath + "content.txt");
writeTextFile.WriteLine(TransformPdfToText(tempFolderPath + "newcontent.pdf"));
writeTextFile.Close();
File.Delete(tempFolderPath + "newcontent.pdf"); // Done with the downloaded PDF file -
// we have our text now
}
which calls the actual stripping code:
private static string TransformPdfToText(string SourceFile)
{
string content = "";
PDDocument doc = new PDDocument();
PDFTextStripper stripper = new PDFTextStripper();
doc.close();
doc = PDDocument.load(SourceFile);
try
{
content = stripper.getText(doc);
doc.close();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
doc.close();
}
return content;
}
Enjoy,
Tom
modified on Monday, July 13, 2009 5:24 PM
|
|
|
|
 |
|
 |
Hi.
I have some PDF files created by a tool we are using (TXTextControl .NET), which for some reason cannot be parsed by PDFBox. The error is always:
java.io.IOException: Unknown colorspace array type:COSName{DeviceRGB}
On the sourceforge help forum for PDFBox some guy said that they fixed this problem by replacing 'throw IOException' with 'return null' in the source code (line 116, PDColorSpaceFactory.java).
So I want to do the same, but I didn't manage to create the .NET library. I can compile it with ANT, and in the properties I included the ikvm dir, but no libraries are created in the build process.
Do you know how to do such a small modification in the PDFBox code and then recompile it as .NET dll?
Thanks.
|
|
|
|
 |
|
 |
For anyone interested, I fixed it by doing that modification at line 116, BUT I had to use an older version of PdfBox(0.7.3), which was using GNU.ClassPath because the most recent one, which was using OpenJDK was giving error messages when I tried to build it with Ant.
|
|
|
|
 |