Index was outside the bounds of the array while reading a .Pdf using iTextSharp

Question

2.00/5 (1 vote)

See more:

Im using the Open Source Tool iTextSharp to read a .Pdf file in my Asp.Net MVC3 application which is coded in c#.Net.

Below is my Code.

C#

filePath = Path.Combine(
                    AppDomain.CurrentDomain.BaseDirectory,
                    Path.GetFileName(Infile.FileName));
                    if (System.IO.File.Exists(filePath))
                    {
                        System.IO.File.Delete(filePath);
                    }
                    Infile.SaveAs(filePath);
                    var pdfdoc = new iTextSharp.text.Document();
                    PdfReader reader2 = new PdfReader((string)filePath);
                    string strText = string.Empty;

                    for (int page = 1; page <= reader2.NumberOfPages; page++)
                    {
                        iTextSharp.text.pdf.parser.ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
                        PdfReader reader = new PdfReader((string)filePath);
                        String s = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, page,its);

                        s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
                        strText = strText + s;
                        reader.Close();
                    }

Im getting the Error on the line

C#

String s = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, page,its);

The error is Index was outside the bounds of the array.
Regards.

Posted 20-Jan-12 20:55pm

Member 8121187

Add a Solution

Comments

mnandikanti 3-Feb-12 18:03pm

I am having this very same issue, does anyone out there know a solution for this problem? In my case I am able to read some PDF files and for some I get this "Index was outside ....." error.

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2012-01-20T22:28:00

It looks like there is no such page.

Most probably, the problem is here: instead of

C#

for (int page = 1; page <= reader2.NumberOfPages; page++) {/*...*/}

you need

C#

for (int page = 0; page < reader2.NumberOfPages; page++) {/*...*/}

Remember: in most cases indexing of elements is zero-based.

Next time use the Debugger; you will be able to dig out the problem in no time, with some minimal experience.

—SA

weecom · Answer 2 · 2014-05-29T15:13:00

Solution 2

same problem i encountered. this occurs if the pdf file contains images. this thread helps solved my problem : http://itextsharp.10939.n7.nabble.com/Possible-bug-in-CMapAwareDocumentFont-ProcessUni2Byte-iTextSharp-5-4-3-td4480.html

Posted 29-May-14 15:13pm

weecom