Click here to Skip to main content
13,191,400 members (73,748 online)
Click here to Skip to main content
Add your own
alternative version

Stats

11.2K views
8 bookmarked
Posted 15 Jun 2015

How to Convert Accented Characters to HTML Codes

, 15 Jun 2015
Rate this:
Please Sign up or sign in to vote.
Replacing accented characters in a text file with their HTML code equivalents

Avoiding the Dreaded Mystery Character

One of the most common utterances programmers make when informed their code is not working right is, "It works on my machine!"

An even more revolting development (than when your code works on your machine, but not on someone else's) is when it works on your machine, but also does not work on your machine.

Let me explain.

I created a "ginormous" jsfiddle of an entire Robert Louis Stevenson book in English ("Treasure Island") and Spanish ("La Isla del Tesoro"). As you can see here, the Spanish displays as desired in jsfiddle - the special characters characteristic of written Spanish, such as "ñ", "¿", "¡", etc., display just fine).

However, when I -- in preparation for making this bilingual work available as a paperback/Kindle pair -- copied the CSS and HTML to a text file, and changed the extension from .txt to .html, the file displayed more-or-less as desired in my browser, except that, on encountering the accented characters, the browser threw up its virtual hands and replaced those accented characters with the "I-don't-know-what-the-heck-this-is-so-I'm-going-to-replace-it-with-a-fallback-symbol" character, namely "�".

This wouldn't do - the Real Academia Española would likely issue a warrant for my arrest and deportation, and then force me to eat Spanish food (bland-bah!) instead of Mexican (spicy-awesome!) fare, which I can enjoy practically "at will" in my native California.

Being scared out of my wits at that prospect, I wrote a utility that replaces accented characters with their HTML code equivalents. Once this is accomplished, the modified text displays as desired in my (and your) browser. Here is the crux of it (both the source and the .exe are included as downloads):

private void buttonReplaceCharsWithCodes_Click(object sender, EventArgs e)
{
    String fallName = String.Empty;
    List<string> linesModified = new List<string>();
    StreamReader file = null;

    try // finally
    {
        try // catch
        {
            DialogResult result = openFileDialog1.ShowDialog();
            if (result == DialogResult.OK)
            {
                fallName = openFileDialog1.FileName;
            }
            file = new StreamReader(fallName, Encoding.Default, true);
            String line;
            while ((line = file.ReadLine()) != null)
            {
                linesModified.Add(line);
            }

            progressBar1.Maximum = linesModified.Count;
            progressBar1.Value = 0;
            labelProgFeedback.Text = "Replacing accented chars with HTML codes";

            for (int i = 0; i < linesModified.Count; i++)
            {
                linesModified[i] = linesModified[i].Replace("á", "&aacute;");
                linesModified[i] = linesModified[i].Replace("Á", "&Aacute;");
                linesModified[i] = linesModified[i].Replace("é", "&eacute;");
                linesModified[i] = linesModified[i].Replace("É", "&Eacute;");
                linesModified[i] = linesModified[i].Replace("í", "&iacute;");
                linesModified[i] = linesModified[i].Replace("Í", "&Iacute;");
                linesModified[i] = linesModified[i].Replace("ñ", "&ntilde;");
                linesModified[i] = linesModified[i].Replace("Ñ", "&Ntilde;");
                linesModified[i] = linesModified[i].Replace("ó", "&oacute;");
                linesModified[i] = linesModified[i].Replace("Ó", "&Oacute;");
                linesModified[i] = linesModified[i].Replace("ú", "&uacute;");
                linesModified[i] = linesModified[i].Replace("Ú", "&Uacute;");
                linesModified[i] = linesModified[i].Replace("ü", "&uuml;");
                linesModified[i] = linesModified[i].Replace("Ü", "&Uuml;");
                linesModified[i] = linesModified[i].Replace("¿", "&iquest;");
                linesModified[i] = linesModified[i].Replace("¡", "&iexcl;");
                // Spanish above; German below
                linesModified[i] = linesModified[i].Replace("Ä", "&Auml;");
                linesModified[i] = linesModified[i].Replace("ä", "&auml;");
                linesModified[i] = linesModified[i].Replace("Ö", "&Ouml;");
                linesModified[i] = linesModified[i].Replace("ö", "&ouml;");
                // U umlauteds and E acutes already among the Spanish accents above
                linesModified[i] = linesModified[i].Replace("ß", "&szlig;");
                // If need to add French, Greek, Hawaiian, Italian, Polish, Romanian, 
                // Turkish, Brail[le], or other special characters, see http://character-code.com/
                        // French (encountered in Poe stories, such as "The Murders in the Rue Morgue")
                linesModified[i] = linesModified[i].Replace("â", "&acirc;;");
                linesModified[i] = linesModified[i].Replace("ê", "&ecirc;;");
                linesModified[i] = linesModified[i].Replace("ô", "&ocirc;;");
                progressBar1.PerformStep();
            }
            progressBar1.Value = 0;
        }
        catch (Exception ex)
        {
            MessageBox.Show(String.Format("Exception {0}", ex.Message));
        }
    }
    finally
    {
        textBoxMassagedResults.Text = string.Join(Environment.NewLine, linesModified);
        String massagedFileName = String.Format("{0}_Massaged.txt", fallName);
        File.WriteAllLines(massagedFileName, linesModified, Encoding.UTF8);
        file.Close();
        buttonCopyTextToClipboard.Enabled = true;
        labelProgFeedback.Text = String.Format
         ("Finished! Massaged text below and saved as {0}", massagedFileName);
     }
}

You may note that the method above also handles German characters. Other special characters can be easily added to the code, as needed, if you want to support other languages/special characters. As for me, since English, Spanish, and German are the only (human) languages I know, they are the only ones whose special characters I need to support in this sort of endeavor.

Thar's Gold in Them Thar Caves!

And, so, voila! (I don't really know French, I just pretend I do on the Interwebs sometimes), I was able to generate the document with the characters represented as they should be, and the English/Spanish version of "Treasure Island / La Isla del Tesoro" is now available in both paperback and kindle formats.

If You See Matched Pairs in the Code...

I have updated the tip several times, changing the character to replace to the single character, and the replacement to the HTML code), but it keeps changing, so if that is what you see, just download the source code, and you will see what it really needs to be.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

B. Clay Shannon
Founder Across Time & Space
United States United States
I am in the process of morphing from a software developer into a portrayer of Mark Twain. My monologue (or one-man play, entitled "The Adventures of Mark Twain: As Told By Himself" and set in 1896) features Twain giving an overview of his life up till then. The performance includes the relating of interesting experiences and humorous anecdotes from Twain's boyhood and youth, his time as a riverboat pilot, his wild and woolly adventures in the Territory of Nevada and California, and experiences as a writer and world traveler, including recollections of meetings with many of the famous and powerful of the 19th century - royalty, business magnates, fellow authors, as well as intimate glimpses into his home life (his parents, siblings, wife, and children).

Peripatetic and picaresque, I have lived in eight states; specifically, besides my native California (where I was born and where I now again reside) in chronological order: New York, Montana, Alaska, Oklahoma, Wisconsin, Idaho, and Missouri.

I am also a writer of both fiction (for which I use a nom de plume, "Blackbird Crow Raven", as a nod to my Native American heritage - I am "½ Cowboy, ½ Indian") and nonfiction, including a two-volume social and cultural history of the U.S. which covers important events from 1620-2006: http://www.lulu.com/spotlight/blackbirdcraven

You may also be interested in...

Pro

Comments and Discussions

 
Questiontry-try-catch-finally? Pin
Huh? Come Again?18-Jun-15 21:13
memberHuh? Come Again?18-Jun-15 21:13 
AnswerRe: try-try-catch-finally? Pin
B. Clay Shannon19-Jun-15 2:41
professionalB. Clay Shannon19-Jun-15 2:41 
QuestionUsing HTML entities is not a good idea, better use Unicode characters! Pin
Gerd Wagner17-Jun-15 1:31
professionalGerd Wagner17-Jun-15 1:31 
AnswerRe: Using HTML entities is not a good idea, better use Unicode characters! Pin
B. Clay Shannon17-Jun-15 2:01
professionalB. Clay Shannon17-Jun-15 2:01 
GeneralRe: Using HTML entities is not a good idea, better use Unicode characters! Pin
Jeffrey Stedfast17-Jun-15 2:15
memberJeffrey Stedfast17-Jun-15 2:15 
AnswerRe: Using HTML entities is not a good idea, better use Unicode characters! Pin
Jeffrey Stedfast17-Jun-15 2:21
memberJeffrey Stedfast17-Jun-15 2:21 
GeneralRe: Using HTML entities is not a good idea, better use Unicode characters! Pin
Gerd Wagner17-Jun-15 2:27
professionalGerd Wagner17-Jun-15 2:27 
QuestionExtending the translation Pin
kenezgy16-Jun-15 22:03
memberkenezgy16-Jun-15 22:03 
AnswerRe: Extending the translation Pin
B. Clay Shannon17-Jun-15 1:57
professionalB. Clay Shannon17-Jun-15 1:57 
AnswerRe: Extending the translation Pin
Jeffrey Stedfast17-Jun-15 2:19
memberJeffrey Stedfast17-Jun-15 2:19 
QuestionYou might find this utility HtmlUtils.cs class to be more complete Pin
Jeffrey Stedfast16-Jun-15 8:52
memberJeffrey Stedfast16-Jun-15 8:52 
QuestionAWESOME WORK Pin
DumpsterJuice16-Jun-15 8:05
memberDumpsterJuice16-Jun-15 8:05 
AnswerRe: AWESOME WORK Pin
Jeffrey Stedfast16-Jun-15 8:53
memberJeffrey Stedfast16-Jun-15 8:53 
AnswerRe: AWESOME WORK Pin
Cindy Meister16-Jun-15 9:08
memberCindy Meister16-Jun-15 9:08 
GeneralRe: AWESOME WORK Pin
DumpsterJuice16-Jun-15 9:38
memberDumpsterJuice16-Jun-15 9:38 
GeneralRe: AWESOME WORK Pin
B. Clay Shannon16-Jun-15 10:15
professionalB. Clay Shannon16-Jun-15 10:15 
GeneralRe: AWESOME WORK Pin
B. Clay Shannon16-Jun-15 10:21
professionalB. Clay Shannon16-Jun-15 10:21 
GeneralRe: AWESOME WORK Pin
DumpsterJuice16-Jun-15 10:30
memberDumpsterJuice16-Jun-15 10:30 
AnswerRe: AWESOME WORK Pin
B. Clay Shannon16-Jun-15 10:10
professionalB. Clay Shannon16-Jun-15 10:10 
GeneralRe: AWESOME WORK Pin
Jeffrey Stedfast16-Jun-15 10:17
memberJeffrey Stedfast16-Jun-15 10:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171017.1 | Last Updated 15 Jun 2015
Article Copyright 2015 by B. Clay Shannon
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid