Click here to Skip to main content
15,867,771 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi can any one tell me how to convert pdf to searchable pdf using itextsharp or by any other open source.

Thanks in advance.
Posted
Comments
joshrduncan2012 7-Oct-13 9:07am    
What have you tried so far to accomplish this?

1 solution

Try below one.

if (File.Exists(filename))
            {
                try
                {
                    StringBuilder text = new StringBuilder();
                    PdfReader pdfReader = new PdfReader(filename);
                    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                    {
                        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                        string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                        text.Append(System.Environment.NewLine);
                        text.Append("\n Page Number:" + page);
                        text.Append(System.Environment.NewLine);
                        currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                        text.Append(currentText);
                        pdfReader.Close();
                       

                    }
                    pdftext.Text += text.ToString();
                   
                }
                catch (Exception ex)
                {
                    MessageBox.Show("Error: "+ ex.Message, "Error");
                }
            }


For more info:

Using iTextSharp

I hope this will help to you.
 
Share this answer
 
Comments
Santhosh Kumar 8-Oct-13 2:03am    
Hi Sampath
Thanks for your reply
I have already tried the above code which converts pdf to text, but i want to convert my non searchable pdf file to searchable pdf with out any change in its design(images should be displayed as it is and the text on it should be searchable)
El Guru 17-Jan-14 19:59pm    
Hi... I have 2 questions if is possible...
1)pdfreader.close will be out of FOR ?
2)pdftext is a variable or an object of itexsharp ?
thanks a lot !! regards
Sampath Lokuge 18-Jan-14 1:52am    
Please try it yourself and verify that. I got that from the above mentioned link. So you can visit that article also.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900