|
 |
|
|
Hi... how to get the content of selected text in a pdf file using vb or c#...is it possible to open pdf file in windows form and then when user selects the text using select text tool i need those contents...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi When i convert pdf version 9.0 file it generates an exception "Object reference not set to instance of an object" can anyone tell why it is not converting pdf 9.0 file ? How i can convert latest vesion of pdf file.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Converted your C# code to VB.net and works like a dream
Imports System Imports System.IO Imports Org.pdfbox.pdmodel Imports Org.pdfbox.util
Public Class clsPdfToText
Public Sub PdfToText(ByVal SourceFile As String, ByVal TargetFile As String)
Dim FileStream As New StreamWriter(TargetFile)
FileStream.WriteLine(TransformPdfToText(SourceFile))
FileStream.Close()
End Sub
Private Function TransformPdfToText(ByVal SourceFile As String) As String
Dim PDDocument As PDDocument = PDDocument.load(SourceFile)
Dim TextStripper As New PDFTextStripper
Return TextStripper.getText(PDDocument)
End Function
End Class
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
hi thank u for ur article but some pdfs exists that i cant convert them to text i dont know why. sometimes ifilter does not work well too. can u help me thanks
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi, Very useful your article. Is it possible to extract the content for a specified page. I mean, using a pdf doc with 20 pages how to extract the content of the page #5 ? any help ?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Dan, I tried to start you PDF-Scanner but I got: An unhandled exception of type 'System.IO.FileNotFoundException' occurred in pdfbox-0.7.3.dll Additional information: File- or Assemblyname 'FontBox-0.1.0-dev' or a dependency not found.
Did I something wrong? Tanks! Andreas
Ariadne
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I was working specifically with 0.7.2 and it seems that getting it working with newer versions requires a little bit of tweaking (I haven't tried it myself).
-- My open-source ASP.NET 2.0 controls: DayPilot - Outlook-like calendar/scheduling control MenuPilot - Hover context menu
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
Hello,
I have a PDF file which contains a table. The table is nothing but a form where people can enter their details in text boxs and drop down combo box. Is there any method by which I can read these fields.
Thank you.
|
| Sign In·View Thread·PermaLink | 1.67/5 (2 votes) |
|
|
|
 |
|
|
hello,
i have a PDF Document which i have created after scannig some documents. the scanned documents consist of Forms and plain Text Documents.
as i am extracting text from this PDF document it only extracting the Text from the Plain page and not from the Forms documents.
why it is so. can anyone help on this. as it is a basic requirement for me.
thanks in advance.
Rakesh singh. http://www.4colordesign.com
Rakesh
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
thanks for this article, it's really useful. in he example you posted, you should close the document, this is metioned in the Java version documentation[^]. I believe the same applies to .net version This is the in-memory representation of the PDF document. You need to call close() on this object when you are done using it!!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
While I do not need to extract the text from a PDF, I do need to see if the PDF contains text so we can keep image only PDF from our document archive system. I added a method to this class that returns False the first instance of text found - which speeds up the check as it typically does not have to read past the first page.
If anyone else is interested, the method is as follows:
//TBO 10/11/07 - added method to return False on first instance of text //assumes the worse, so if file I/O error, it will return True public bool IsImageOnly(string inFileName) { try { PdfReader reader = new PdfReader(inFileName); int totalLen = 68; float charUnit = ((float)totalLen) / (float)reader.NumberOfPages;
for (int page = 1; page <= reader.NumberOfPages; page++) { if (ExtractTextFromPDFBytes(reader.GetPageContent(page)).Length > 0) { return false; }
} return true; } catch { return true; } }
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
in my project i have two form ,the second form is loaded successfuly but whene it loaded again it cause error like (c# 2005) Cannot access a disposed object. Object name: 'Searchform'.
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
How can i save pdf pages in to jpeg using dll. (Now iam using AcroPDDoc.GetClipboard() method to save as jpeg)
Thanks in advance, Govindaraj R
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Hi, is it possible to read tables from pdf file using this great mechanism ...
thnks
marquito
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
To be honest, I don't know. This is just a quick hack to extract the text content for the purpose of indexing. But you could a look at iTextSharp library. It allows you to parse the structure of PDF files and it doesn't require ikvm.net (it's a full port to C#).
-- My open-source ASP.NET 2.0 controls: DayPilot - Outlook-like calendar/scheduling control MenuPilot - Hover context menu
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Please tell me the steps should be followed on converting pdf to txt format...
MAHESH KUMAR BR
|
| Sign In·View Thread·PermaLink | 2.27/5 (5 votes) |
|
|
|
 |
|
|
Hello all, According to the author of this article, it is overly complicated to use iTextSahrp for text extraction. However, I found it wasn't that bad... These are the steps I used to extract text from Pdfs with iTextSharp:
1. Create an instance of the PdfReader object 2. For Each pdf page, use the reader to get the pageContentByte 3. Create an instance of PRTokenizer and pass in the pageContentByte as argument 4. Use this PRTokenizer to loop thru all tokens in the page. Test each token to see if its type is String. If it is, append the StringValue of the token to output.
Here is the function I wrote in VB.Net
Public Function ParsePdfText(ByVal sourcePDF As String, _ Optional ByVal fromPageNum As Integer = 0, _ Optional ByVal toPageNum As Integer = 0) As String
Dim sb As New System.Text.StringBuilder() Try Dim reader As New PdfReader(sourcePDF) Dim pageBytes() As Byte = Nothing Dim token As PRTokeniser = Nothing Dim tknType As Integer = -1 Dim tknValue As String = String.Empty
If fromPageNum = 0 Then fromPageNum = 1 End If If toPageNum = 0 Then toPageNum = reader.NumberOfPages End If
If fromPageNum > toPageNum Then Throw New ApplicationException("Parameter error: The value of fromPageNum can " & _ "not be larger than the value of toPageNum") End If
For i As Integer = fromPageNum To toPageNum Step 1 pageBytes = reader.GetPageContent(i) If Not IsNothing(pageBytes) Then token = New PRTokeniser(pageBytes) While token.NextToken() tknType = token.TokenType() tknValue = token.StringValue If tknType = PRTokeniser.TK_STRING Then sb.Append(token.StringValue) End If End While End If Next i Catch ex As Exception MessageBox.Show("Exception occured. " & ex.Message) Return String.Empty End Try Return sb.ToString() End Function
I'm now playing with image extraction using iTextSharp... Although my initial reasearch on this was very disappointed (most iText experts are saying it's impossible), however I still would like to give it a try... If I'm successful, I'll share it with everyone.
|
| Sign In·View Thread·PermaLink | 3.44/5 (8 votes) |
|
|
|
 |
|
|
Hey, that is excellent! Thanks for the code...it worked perfectly. I threw in
sb.Append(token.StringValue & vbtab) to separate out the fields in my PDF.
This is much more lightweight than all of the assemblies for PDFBox.
Thanks!
Byron
|
| Sign In·View Thread·PermaLink | 2.33/5 (3 votes) |
|
|
|
 |
|
|
 |
|
|
Nice solution, but it sems it doenst work with all pdf.
I think the problem is that string are sometimes compressed.
But I really like your solution (and iTextSharp is a great library).
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
Thanks a lot! It's funny that such a famous lib barely has any useful tutorials. You should upload that somewhere!
Cheers
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
I'd like to have a form on my website that can be completed by a visitor, printed by the visitor, and saved on my site.
My site is constructed using visual studio.net and Basic.
Can this be done?
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |