Click here to Skip to main content
15,892,005 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi,

I am facing an issue while importing special characters from excel sheet.

Special characters includes (¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÜÝÞßàáâãäåæçèéêëìíîïðñòóôö÷øùúûüýþÿ)


I am using streamreader to import this text and no parameter is passed to streamreader, so default is UTF-8.
After import the special chater are displayes as "?".
Where Encoding.Default is passed as parameter to streamreader the special charter are displaying propery.


But I came across this msdn post which says Encoding.Deafult is not advisable to use.

https://msdn.microsoft.com/en-us/library/system.text.encoding.default(v=vs.110).aspx

Does Encoding.Default can change in 2 different systems and will it handle special characters from other language?

My code looks like this:

FileStream fs = new fileStream(sFileName,FileMode.Open,FileAccess.Read,FileShare.ReadWrite);
StreamReader sw = new StreamReader(fs);
String sbuf = sw.ReadLine();

When I read the csv file having above mentioned special characters, sbuf value is showing as ?

It works fine when I use Streamreader like this.
StreamReader sw = new StreamReader(fs,Encoding.Default);

I just want to know will there be any issues if I use Encoding.Default.

How it behaves if some more characters from different code page comes. Does this result in data loss.


Can someone share your thoughts on this.
Posted
Updated 17-Apr-19 23:19pm
v2
Comments
Richard MacCutchan 16-Jul-15 7:51am    
I suspect the characters are not being changed at all, but your applications are not using the font that is required to display them.
NavyaKrishna51 16-Jul-15 8:42am    
Hi Richard,
Can you suggest the encoding type i need to use to display the special characters
Richard MacCutchan 16-Jul-15 8:45am    
No, because I don't know what the characters are supposed to be. And if you do not know then you need to ask whoever created the file.
Sergey Alexandrovich Kryukov 16-Jul-15 14:35pm    
No, Richard.
With improper font, it would show a box for a non-printing character. '?' indicates that some Unicode-encoded character was attempted to interpret as non-Unicode, and, at the moment of getting back to Unicode, the actual character data was already lost. Please see Solution 1.
—SA
Richard Deeming 16-Jul-15 8:58am    
Why not find out the name of the encoding that Encoding.Default returns on your computer, where it works, and then use that encoding explicitly in your code?

First of all, those characters are no special; I don't really know what is "special character", probably it's just someone's fantasy. A character can be "special" only in relation to some specific format of something, not by itself.

Now, forget all the "default" rubbish and find out what is the real encoding of the text file your are reading. In some (many) cases of Unicode-based encodings (UTFs), the text file can have BOM indicating what is it. Then text editors will tell you what it is on "Save As". Please see: http://unicode.org/faq/utf_bom.html[^].

However, what to do if there is no BOM or encoding is not Unicode? My "secret weapon" is any good Web browser. Rename your input file as *.HTML and open. If it is not readable, use View / Character Encoding menu of your browser to quickly find out right option.

And then use proper encoding when you instantiate StreamReader:
https://msdn.microsoft.com/en-us/library/system.io.streamreader.streamreader%28v=vs.110%29.aspx[^] (pick one with Encoding argument),
https://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29.aspx[^].

As simple as that.

—SA
 
Share this answer
 
Use below code

text=Regex.Replace(cell.Text, @"[^\u0020-\u007E]", string.Empty);

You need to add the namespace
System.Text.RegularExpressions;
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900