Click here to Skip to main content
14,972,737 members
Please Sign up or sign in to vote.
3.17/5 (6 votes)
See more:
In order to get all words from one line that contains many words, if have used Split.
every things is correct when i used English letters,but if used forign language like arabic
things becomes diffrent ,i have found that At least 3 spaces must be between each arabic words 1 space is not enough
in order to use the
ss2= ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
look at the program below.
you will find method
private void EnglishIf() works fine without any errors
and
you will find method
private void ArabicIf() also will works fine if i comment the Line B and uncommented line A
and you will find that
private void ArabicIf() will not works fine if i comment the Line A and uncommented line B
MY QUESTION IS
is there is any way to obtains J=20 if i used line line B
ss2= ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
while ssFeedo[0] contains several arabic words that is separated by only one single space?
//-----------
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
namespace testIf
{
    public partial class Form1 : Form
    {
        StreamWriter sw = null;
        FileStream fsW = null;
        public Form1()
        {
            InitializeComponent();
             fsW = new FileStream("pursuit.txt", FileMode.Open, FileAccess.Write);
             sw = new StreamWriter(fsW);
            EnglishIf();
            ArabicIf();
            sw.Close(); fsW.Close();
        }
        private void EnglishIf()
        {
            string[] ss2 = new string[120];
            string[] ssFeedo = new string[2000];
            int J = -1;
                ssFeedo[0] = "hellow0 hellow1 hellow2";
                ss2 = ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
                   sw.Write(ss2.Length.ToString() + "\n");
                sw.Write("ss2[0]=" + ss2[0] + "  ss2[1]=" +ss2[1] + "  ss2[2]=" +ss2[2]+"\n");
                if (ss2.Length == 3)
                {
                   sw.Write("yes ss2[0]=" + ss2[0] + "\n");
                   sw.Write("yes ss2[1]=" + ss2[1] + "\n");
                   sw.Write("yes ss2[2]=" + ss2[2] + "\n");
                   if(ss2[2]=="hellow2")
                   {
                    J=10;
                    MessageBox.Show("J=",J.ToString());
                    sw.Write("J=" + J.ToString() + "\n");
                    goto outy2;
                   }
                   else;
                }//if ss2.length==3
                else;
            outy2:
                ;
        }//EnglishIf
        private void ArabicIf()
        {
            string[] ss2 = new string[120];
            string[] ssFeedo = new string[2000];
            int J = -1;
            //there are 3 spaces between the words
            //ssFeedo[0] = "مرحبا0   مرحبا1  مرحبا2";//<--------------- Line A
            
            
            //there are only one space between the words
            ssFeedo[0] = "مرحبا0 مرحبا1 مرحبا2";// <------------------- Line B
            ss2 = ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
            sw.Write(ss2.Length.ToString() + "\n");
            sw.Write("ss2[0]=" + ss2[0] + "  ss2[1]=" + ss2[1] + "  ss2[2]=" + ss2[2] + "\n");
            if (ss2.Length == 3)
            {
                sw.Write("yes ss2[0]=" + ss2[0] + "\n");
                sw.Write("yes ss2[1]=" + ss2[1] + "\n");
                sw.Write("yes ss2[2]=" + ss2[2] + "\n");
                if (ss2[2] == "مرحبا2")
                {
                    J = 20;
                    //you will never get J=20 if you used the line B instead of line A
                    MessageBox.Show("J=", J.ToString());
                    sw.Write("J="+J.ToString()+"\n");
                    goto outy2;
                }
                else;
            }//if ss2.length==3
            else ;
        outy2:
            ;
        }//ArabicIf
    }
}


[edit]Code block added - OriginalGriff[/edit]
Posted
Updated 10-May-11 10:18am
v4
Comments
Khalid Sabtan 10-May-11 13:57pm
   
i have wrote 3 spaces in line A separatings the words ,but the server at CodeProject changed it to only one space,so make sure that you read it correctly
Keith Barrow 10-May-11 16:45pm
   
The "missing spaces" are due to the browser: it contracts multiple spaces down to 1. I tried to fix this, but the form doesn't play with mixed LTR / RTL languages very well.
Khalid Sabtan 10-May-11 14:22pm
   
i still waitting for an answer , i have to write the above comment to make thing more clear,but i thing the server mistakley has voted 4 of five without notofying me.
Keith Barrow 10-May-11 16:14pm
   
It is rude to try an hurry people up, they will just down-vote your answer, a vote of 4 is still positive, but whoever voted *probably* didn't give you 5 because you didn't put your code into pre tags, making it hard to read
Khalid Sabtan 10-May-11 16:18pm
   
what do you suggest, i am thinking to post my question again,to me the answer is very important
Fabio V Silva 10-May-11 16:21pm
   
It might be very important to you but it's not for others and making the same question twice will only make it worse...
Keith Barrow 10-May-11 16:53pm
   
I suggest patience! We are all unpaid, your question might be important to you, it isn't to anyone else here. Re-posting will be worse and will probably result in you being flamed. This is how *all* newsgroups work, you are being cut some slack as you are new, if I did what you did, I'd get into a real bother.

Running this:

MIDL
//The arabic is fine on my machine, it might be dodgy here!!!
string foo = "مرحب مرحبا مرحبا";
string[] bar = foo.Split(' ');
Console.WriteLine(bar.Length);

Output 3 (the Arabic "Welcomes" are split).
Replacing
string foo = "مرحبا0 مرحبا1 مرحبا2
";
Returns 3, as expected as does
string[] bar = foo.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

Note that I have used only single spaces.

This implies (but is not conclusive) that there is something wrong with your code, but I cannot see what.

The other thing to do is to check the space you are using to split is the same as the one in the text. I notice in the Windows Arabic char set (see link below) there is an Non-Breaking Space at #00A0, but there is also the normal space at #0020, these are not the same, and the split would fail! This doesn't explain why two spaces work, unless you have copied the standard space or typed the secondary spaces with the keyboard set to "en". The best way to check is to copy the string you are splitting and output the hex code values, if the two spaces are different, there's your problem. You can fix the problem by adding the Arabic space to the list of split characters.

http://en.wikipedia.org/wiki/Windows-1256[^]
   
v2
Comments
Khalid Sabtan 10-May-11 17:07pm
   
Hi mr Keith ,you have pressed on the right button. i have realized that i have used english space when i wrote .Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
and used the arabic space(which is diffrent) when i wrote the Line B.thank you very very much
Hmm, this is a difficult one. I pasted your code into VS 2010, using .NET 4.0, and adapted it slightly (removing parts not immediately related to the problem, eg. the file I/O), and it's running as expected. The words are correctly split into 3 elements, the number of spaces separating them does not make a difference.

C#
using System;
namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            English();
            Arabic("مرحبا0   مرحبا1  مرحبا2"); //<--------------- Line A (3 spaces between the words)
            Arabic("مرحبا0 مرحبا1 مرحبا2"); // <------------------- Line B (only one space between the words)
            Console.ReadLine(); // Pause window before it disappears
        }

        private static void English()
        {
            Console.WriteLine("---------English---------");
            string[] ss2; // = new string[120]; <------ This is unnecessary, the array created here will be overwritten by String.Split
            string[] ssFeedo = new string[2000];
            ssFeedo[0] = "hellow0 hellow1 hellow2";
            ss2 = ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
            Console.WriteLine("ss2[0]=" + ss2[0] + "  ss2[1]=" + ss2[1] + "  ss2[2]=" + ss2[2] + "\n");
            if (ss2.Length == 3)
            {
                Console.WriteLine("Correct");
            }
        }

        private static void Arabic(string text)
        {
            Console.WriteLine("---------Arabic---------");
            string[] ss2; // = new string[120]; <------ This is unnecessary, the array created here will be overwritten by String.Split
            string[] ssFeedo = new string[2000];
            int J = -1;
            ssFeedo[0] = text;
            ss2 = ssFeedo[0].Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
            Console.WriteLine("ss2[0]=" + ss2[0] + "  ss2[1]=" + ss2[1] + "  ss2[2]=" + ss2[2] + "\n");
            if (ss2.Length == 3)
            {
                Console.WriteLine("Correct");
                if (ss2[2] == "مرحبا2")
                {
                    J = 20;
                    //you will never get J=20 if you used the line B instead of line A
                    Console.WriteLine("J=" + J.ToString() + "\n");
                }
            }//if ss2.length==3
        }
    }
}


In each of the Arabic text cases J is indeed set to 20.

Would you mind trying this little app on your system (it's a console application), and verifying if you're also getting correct output? Granted, the Arabic characters display as question marks in the console window, but the contents of the variables is nevertheless correct, as verified under a debugger.

If you don't get the same results, then I'm wondering whether our systems may perhaps differ in terms of some regional settings.

My output:
---------English---------
ss2[0]=hellow0  ss2[1]=hellow1  ss2[2]=hellow2
Correct
---------Arabic---------
ss2[0]=?????0  ss2[1]=?????1  ss2[2]=?????2
Correct
J=20
---------Arabic---------
ss2[0]=?????0  ss2[1]=?????1  ss2[2]=?????2
Correct
J=20
   
Comments
Keith Barrow 10-May-11 16:49pm
   
See my comment below, I had the same result. The Arabic character set has two spaces, the "normal" ASCII one at x020 and a non breaking Arabic one at 0x0A0. This might be the problem, but we can't check. It is curious!
R. Hoffmann 10-May-11 16:57pm
   
Ah, very good find!
I'm getting the same result with both lines...

What do you get on ss2[2] when it fails?
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900