Click here to Skip to main content
15,881,248 members
Articles / Productivity Apps and Services / Microsoft Office
Tip/Trick

How to Insert Spaces Between Words Even When They Begin or End with "Strange" Characters Using C# and the DocX Library

Rate me:
Please Sign up or sign in to vote.
4.57/5 (3 votes)
7 Jan 2014CPOL2 min read 16.9K   66   2  
Inserting spaces between words using C# and the DocX Library

"Weird" Characters

In this tip, I showed how to wedge a space between words that were run together, such as "DennisRodman or "theWorm", making them "Dennis Rodman" and "the Worm" respectively (so to speak).

BTW: It's not funny, anymore, Dennis; you're not Marilyn Monroe, and that homicidal maniac is not JFK. 

That tip, though, only dealt with the "normal" English alphabet (a..Z and A..Z). Since I'm currently working with foreign language documents (Spanish and German, with French and perhaps Italian and Dutch coming later), I realized that I need to consider other possible characters, too, both as the ending lowercase letter or other ending character (such as é, í, ñ, ?, !, ", », and ß) and as the beginning uppercase letter or other character, such as ¿, ¡, ", and «

So, if you had a sentence such as this:

quéSera, Sera. Was zumTeuful ist hier los!¿se habla aleman?¡No!He said«Hola, muchacha»Das ist gewißMerkwürdig!

...running it through this helper method would "aerate" it like so:

qué Sera, Sera. Was zum Teuful ist hier los! ¿se habla aleman? ¡No! He said «Hola, muchacha» Das ist gewiß Merkwürdig!

Rather than clutter up and complicate the previous code, I wrote another helper function to handle those situations.

Preliminary Setting Up of Figurative Chairs

Follow these steps to prepare for the code to follow:

  1. Download the DocX DLL library from here
  2. In your Visual Studio project, right-click References, select "Add Reference..." and add docx.dll to the project from wherever you saved it.
  3. Add this to your using section:
  4. C#
    using Novacode;

Add this code to the top of your class, too:

C#
// 65..90 are A..Z; 97..122 are a..z
const int FIRST_CAP_POS = 65;
const int LAST_CAP_POS = 90;
const int FIRST_LOWER_POS = 97;
const int LAST_LOWER_POS = 122;

List<string> specialWordEndings;
List<string> specialWordBeginnings;

string soughtCombo = string.Empty;
string desiredCombo = string.Empty;
</string></string>

As usually happens, this ends up being a little more complicated than I first reckoned, because I have to deal with four different situations:

  1. An "odd" character at the end of a sentence followed by a "normal" (A..Z) character
  2. A "normal" (a..z, etc.) character at the end of a sentence followed by an "odd" character
  3. A combination of "odd" characters
  4. A combination of "normal" characters

And now, without further ado, adieux, or adios, straight from Carmel Valley, California, comes the illustrious and much-ballyhooed and anticipated code, entering from stage left, welcome:

The Nitty Gritty Prettifier/Aerator

C#
        private void Popul8UnusualCharLists()
        {   
            specialWordEndings = new List<string>() { "é", "í", "ñ", "?", "!", ",", ".", ":", ";", "\"", "»", "ß" };

            specialWordBeginnings = new List<string>() { "¿", "¡", "\"", "É", "«" };
        }

        private void AerateUnusualCombo(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                foreach (string endChar in specialWordEndings)
                {
                    foreach (string beginChar in specialWordBeginnings)
                    {
                        soughtCombo = string.Format("{0}{1}", endChar, beginChar);
                        desiredCombo = string.Format("{0} {1}", endChar, beginChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateUnusualEndNormalBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                foreach (string endChar in specialWordEndings)
                {
                    for (int i = FIRST_CAP_POS; i <= LAST_CAP_POS; i++)
                    {
                        char upperChar = (char)i;
                        soughtCombo = string.Format("{0}{1}", endChar, upperChar);
                        desiredCombo = string.Format("{0} {1}", endChar, upperChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateNormalEndUnusualBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                for (int i = FIRST_LOWER_POS; i <= LAST_LOWER_POS; i++)
                {
                    char lowerChar = (char)i;
                    foreach (string beginChar in specialWordBeginnings)
                    {
                        soughtCombo = string.Format("{0}{1}", lowerChar, beginChar);
                        desiredCombo = string.Format("{0} {1}", lowerChar, beginChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateNormalEndNormalBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                for (int i = FIRST_LOWER_POS; i <= LAST_LOWER_POS; i++)
                {
                    char lowerChar = (char)i;
                    for (int j = FIRST_CAP_POS; j <= LAST_CAP_POS; j++)
                    {
                        char upperChar = (char)j;
                        soughtCombo = string.Format("{0}{1}", lowerChar, upperChar);
                        desiredCombo = string.Format("{0} {1}", lowerChar, upperChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }
}

Call it like so:

C#
Cursor.Current = Cursors.WaitCursor;
try
{
    Popul8UnusualCharLists();
    string filename = string.Empty;
    DialogResult result = openFileDialog1.ShowDialog();
    if (result == DialogResult.OK)
    {
        filename = openFileDialog1.FileName;
    }
    else
    {
        MessageBox.Show("No file selected - exiting");
        return;
    }
    AerateUnusualCombo(filename);
    AerateUnusualEndNormalBegin(filename);
    AerateNormalEndUnusualBegin(filename);
    AerateNormalEndNormalBegin(filename);
}
finally
{
    Cursor.Current = Cursors.Default;
}
MessageBox.Show("Scrunched together words have been normalized!");

A Parting Plaintive Plea

If you find this tip useful, pay it forward and do something nice to somebody today, even if it surprises them.

Note: I have added two source code files: the smaller one is just for this tip; the larger one contains all the DocX code for various articles I wrote on CodeProject December 2013 and January 2014.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder Across Time & Space
United States United States
I am in the process of morphing from a software developer into a portrayer of Mark Twain. My monologue (or one-man play, entitled "The Adventures of Mark Twain: As Told By Himself" and set in 1896) features Twain giving an overview of his life up till then. The performance includes the relating of interesting experiences and humorous anecdotes from Twain's boyhood and youth, his time as a riverboat pilot, his wild and woolly adventures in the Territory of Nevada and California, and experiences as a writer and world traveler, including recollections of meetings with many of the famous and powerful of the 19th century - royalty, business magnates, fellow authors, as well as intimate glimpses into his home life (his parents, siblings, wife, and children).

Peripatetic and picaresque, I have lived in eight states; specifically, besides my native California (where I was born and where I now again reside) in chronological order: New York, Montana, Alaska, Oklahoma, Wisconsin, Idaho, and Missouri.

I am also a writer of both fiction (for which I use a nom de plume, "Blackbird Crow Raven", as a nod to my Native American heritage - I am "½ Cowboy, ½ Indian") and nonfiction, including a two-volume social and cultural history of the U.S. which covers important events from 1620-2006: http://www.lulu.com/spotlight/blackbirdcraven

Comments and Discussions

 
-- There are no messages in this forum --