Click here to Skip to main content
15,847,750 members
Please Sign up or sign in to vote.
2.33/5 (3 votes)
See more:
I have Develop the project in that i have error

"Index was outside the bounds of the array."

Actually this project work for pdf files(it's conatain content English language)but

whenever take the Pdf's contain content in Arabic Language at the time raise

the error

raise the error at the following

page = PdfTextExtractor.GetTextFromPage(r, i, Strategy);

my code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.IO;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace test
    public partial class Form1 : Form
        string filename;
        string path;
        public Form1()

        private void button1_Click(object sender, EventArgs e)
            OpenFileDialog openFileDialog = new OpenFileDialog();
            openFileDialog.CheckFileExists = true;
            openFileDialog.AddExtension = true;
            openFileDialog.Filter = "PDF files (*.pdf)|*.pdf";
            DialogResult result = openFileDialog.ShowDialog();
            if (result == DialogResult.OK)
                //data1 = openFileDialog.FileNames.Select(x => new FileInfo(x)).ToArray();
                filename = Path.GetFileName(openFileDialog.FileName);
                path = Path.GetDirectoryName(openFileDialog.FileName);
                textBox1.Text = path + "\\" + filename;


        private void button3_Click(object sender, EventArgs e)
          string s=  Form1.ExtractTextFromPdf(textBox1.Text);
          string reverseValue = new string(s.Select((c, index) => new { c, index })
                                       .OrderByDescending(x => x.index)
                                       .Select(x => x.c)
          richTextBox1.Text = reverseValue;

        public static string ExtractTextFromPdf(string filename)
            using (PdfReader r = new PdfReader(filename))
                StringBuilder text = new StringBuilder();
                ITextExtractionStrategy Strategy = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
                for (int i = 1; i <= r.NumberOfPages; i++)
                    //string first;
                    string page = "";
                    page = PdfTextExtractor.GetTextFromPage(r, i, Strategy);
                    string[] lines = page.Split('\n');
                    foreach (string line in lines)
                string first = text.ToString();
                return first;


please help me.

thank u.
Simon_Whale 27-Aug-15 9:32am    
Have you made sure that r, i and Strategy are not null?
Also what is PdfTextExtractor.GetTextFromPage method? is this something that you have created or is it a 3rd party API?
What is the exact error message that you are getting?
F-ES Sitecore 27-Aug-15 10:05am    
It might be because GetTextFromPage uses a 0-based index.

for (int i = 0; i < r.NumberOfPages; i++)
CHill60 27-Aug-15 10:32am    
That's what I thought too, but it looks like page numbering *does* begin with 1 :(
CHill60 27-Aug-15 10:35am    
If you change LocationTextExtractionStrategy to SimpleTextExtractionStrategy do you get the same error? I can't see where you're defining the text location
Herman<T>.Instance 27-Aug-15 11:08am    
this is your third question over the same code. You expand your code each time. And when there is a next problem you are here. Some questions please Google first. Some of your problems are to easy to tackle. Like this one. Why don't you debug the loop and see if i or r is the problem in which case.

And please accept the solutions given to you in the other questions.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900