Click here to Skip to main content
15,887,683 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to get each word co-ordinates from the text pdf using iTextSharp. in the below mentioned link i have uploaded text pdf.i had no idea on itextsharp.can anybody have sample code to get word co-ordinates?In the below mentioned code i have extracted text .but i don't know how to get word co-ordinates. https://docs.google.com/file/d/0B3ZAyYMW9DEMMUNQVEFYNWRDZjg/edit?usp=sharing

VB
Public Sub GetPDFText(ByVal pdfpath As String)
    Dim reader As New PdfReader(pdfpath)
    Dim output As New StringWriter()
    For i As Integer = 1 To reader.NumberOfPages
        output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, New SimpleTextExtractionStrategy()))
    Next
    pdftext.Text = output.ToString
    Dim filenam As String = "D:\Temp\itext\test.txt"
    Dim testss As New System.IO.StreamWriter(filenam)
    testss.Write(pdftext.Text)
    testss.Close()
        End Sub
Posted
Comments
Sergey Alexandrovich Kryukov 17-Aug-13 3:06am    
Without even looking thoroughly into it, I can tell you in advance that getting coordinates of the words would be extremely difficult if possible at all, because this is not how all components aligning the text flow work. Figuratively speaking, they don't "know" those coordinates themselves. Such information is hardly exposed to the API.

So, I wonder, why on Earth would you need that? Maybe your whole concept is not so good? At the very least, I would need to know your idea.

—SA
jai_mca 17-Aug-13 3:16am    
i need to write word co-ordinates in xml format.
ledtech3 17-Aug-13 10:46am    
You might be able to use RegX to get the "Index" location.

http://msdn.microsoft.com/en-us/library/gg578045.aspx

Then output them to your xml file.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900