Click here to Skip to main content
Click here to Skip to main content

Encoding Accented Characters

By , 22 May 2007
 

Introduction

There is a problem exporting accented characters in text files. Some programs cannot import or correctly display accented characters. Therefore you need to use encoding to correctly export a plain text file. However, there are a LOT of encodings, so which one should you use?

Here's How

The answer is: iso-8859-8.

That is the Hebrew (ISO-Visual) encoding. The encoding is natively supported in .NET. It intelligently converts to a visual format for you. The other standard encoders do not do this as you will see below.

Example

Converting the following: Frédéric François.

Encoding Description Output
ASCII Fr?d?ric Fran?ois
Default Frédéric François
UTF7 Unicode (UTF-7) Fr+AOk-d+AOk-ric Fran+AOc-ois
UTF8 Unicode (UTF-8) Frédéric François
iso-8859-1 Western European (ISO) Frédéric François
iso-8859-8 Hebrew (ISO-Visual) Frederic Francois
us-ascii US-ASCII Fr?d?ric Fran?ois
Windows-1252 Western European (Windows) Frédéric François

Example of Code Using Encoding

StreamWriter sw = new StreamWriter
    ("somefile.txt", false, System.Text.Encoding.GetEncoding("iso-8859-8"));

A Full Example for the Beginner

using (StreamWriter sw = new StreamWriter
    ("somefile.txt", false, System.Text.Encoding.GetEncoding("iso-8859-8")))
{
    DataSet1TableAdapters.binsTA ta = new DataSet1TableAdapters.binsTA();
    DataSet1.binsDataTable dt = ta.GetData();
    foreach (DataSet1.binsRow row in dt.Rows)
    {
        sw.Write(row.ID.ToString());
        sw.Write("|");
        sw.WriteLine(row.description);
    }
}

History

  • 22nd May, 2007: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Simon Hughes
Software Developer (Senior) www.ByBox.com
United Kingdom United Kingdom
Member
C++ and C# Developer for 21 years. Microsoft Certified.
 
UK Senior software developer / team leader.
 
I've been writing software since 1985. I pride myself on designing and creating software that is first class. That means it has to be fast, scalable, and with good use of design patterns.
 
I have done everything from risk analysis and explosion modelling, banking systems, to highly scalable multi-threaded arrival and departure screens in many leading airports, to state of the art wireless warehouse systems.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralDoesn't work for me [modified]memberDEGT27 Aug '08 - 3:15 
I have been trying that in a method to simply convert a string parameter into a "visually correct" string where none of these weird characters appear, but instead into something equivalent (as in the plain Frederique XYZ without accents). However, I don't get the right results.
 
In particular I wanted to weed out all accented characters used in languages such as Spanish, French, German, Czech, etc.
 
Perhaps a sample method to convert a "dirty" string to a "visual" string would be a good idea.
 
http://www.PanamaSights.com/
http://www.coralys.com/
http://www.virtual-aviation.info/
modified on Wednesday, August 27, 2008 11:08 AM

GeneralRe: Doesn't work for mememberseanicongroup7 May '09 - 11:18 
The idea is that you need to take the string, represented in unicode (I believe .net represents internally as UTF-16, convert it to bytes using the ISO-8859-8 encoding, then convert it to ascii using those same bytes. It would look like so... (VB.Net)
 
Dim TestString As String = "Frédéric François"
Dim TemporaryBytes() As Byte = Text.Encoding.GetEncoding("ISO-8859-8").GetBytes(TestString)
Dim FinalString = Text.Encoding.ASCII.GetString(TemporaryBytes)
 
'You'll see here that FinalString shows up as "Frederic Francois"

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 22 May 2007
Article Copyright 2007 by Simon Hughes
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid