Click here to Skip to main content
11,496,146 members (1,721 online)
Click here to Skip to main content

How to Insert Spaces Between Words Even When They Begin or End with "Strange" Characters Using C# and the DocX Library

, 7 Jan 2014 CPOL 9.1K 46 2
Inserting spaces between words using C# and the DocX Library
The site is currently in read-only mode for maintenance. Posting of new items will be available again shortly.

"Weird" Characters

In this tip, I showed how to wedge a space between words that were run together, such as "DennisRodman or "theWorm", making them "Dennis Rodman" and "the Worm" respectively (so to speak).

BTW: It's not funny, anymore, Dennis; you're not Marilyn Monroe, and that homicidal maniac is not JFK. 

That tip, though, only dealt with the "normal" English alphabet (a..Z and A..Z). Since I'm currently working with foreign language documents (Spanish and German, with French and perhaps Italian and Dutch coming later), I realized that I need to consider other possible characters, too, both as the ending lowercase letter or other ending character (such as é, í, ñ, ?, !, ", », and ß) and as the beginning uppercase letter or other character, such as ¿, ¡, ", and «

So, if you had a sentence such as this:

quéSera, Sera. Was zumTeuful ist hier los!¿se habla aleman?¡No!He said«Hola, muchacha»Das ist gewißMerkwürdig!

...running it through this helper method would "aerate" it like so:

qué Sera, Sera. Was zum Teuful ist hier los! ¿se habla aleman? ¡No! He said «Hola, muchacha» Das ist gewiß Merkwürdig!

Rather than clutter up and complicate the previous code, I wrote another helper function to handle those situations.

Preliminary Setting Up of Figurative Chairs

Follow these steps to prepare for the code to follow:

  1. Download the DocX DLL library from here
  2. In your Visual Studio project, right-click References, select "Add Reference..." and add docx.dll to the project from wherever you saved it.
  3. Add this to your using section:
  4. using Novacode;

Add this code to the top of your class, too:

// 65..90 are A..Z; 97..122 are a..z
const int FIRST_CAP_POS = 65;
const int LAST_CAP_POS = 90;
const int FIRST_LOWER_POS = 97;
const int LAST_LOWER_POS = 122;

List<string> specialWordEndings;
List<string> specialWordBeginnings;

string soughtCombo = string.Empty;
string desiredCombo = string.Empty;
</string></string>

As usually happens, this ends up being a little more complicated than I first reckoned, because I have to deal with four different situations:

  1. An "odd" character at the end of a sentence followed by a "normal" (A..Z) character
  2. A "normal" (a..z, etc.) character at the end of a sentence followed by an "odd" character
  3. A combination of "odd" characters
  4. A combination of "normal" characters

And now, without further ado, adieux, or adios, straight from Carmel Valley, California, comes the illustrious and much-ballyhooed and anticipated code, entering from stage left, welcome:

The Nitty Gritty Prettifier/Aerator

        private void Popul8UnusualCharLists()
        {   
            specialWordEndings = new List<string>() { "é", "í", "ñ", "?", "!", ",", ".", ":", ";", "\"", "»", "ß" };

            specialWordBeginnings = new List<string>() { "¿", "¡", "\"", "É", "«" };
        }

        private void AerateUnusualCombo(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                foreach (string endChar in specialWordEndings)
                {
                    foreach (string beginChar in specialWordBeginnings)
                    {
                        soughtCombo = string.Format("{0}{1}", endChar, beginChar);
                        desiredCombo = string.Format("{0} {1}", endChar, beginChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateUnusualEndNormalBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                foreach (string endChar in specialWordEndings)
                {
                    for (int i = FIRST_CAP_POS; i <= LAST_CAP_POS; i++)
                    {
                        char upperChar = (char)i;
                        soughtCombo = string.Format("{0}{1}", endChar, upperChar);
                        desiredCombo = string.Format("{0} {1}", endChar, upperChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateNormalEndUnusualBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                for (int i = FIRST_LOWER_POS; i <= LAST_LOWER_POS; i++)
                {
                    char lowerChar = (char)i;
                    foreach (string beginChar in specialWordBeginnings)
                    {
                        soughtCombo = string.Format("{0}{1}", lowerChar, beginChar);
                        desiredCombo = string.Format("{0} {1}", lowerChar, beginChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }

        private void AerateNormalEndNormalBegin(string filename)
        {
            using (DocX document = DocX.Load(filename))
            {
                for (int i = FIRST_LOWER_POS; i <= LAST_LOWER_POS; i++)
                {
                    char lowerChar = (char)i;
                    for (int j = FIRST_CAP_POS; j <= LAST_CAP_POS; j++)
                    {
                        char upperChar = (char)j;
                        soughtCombo = string.Format("{0}{1}", lowerChar, upperChar);
                        desiredCombo = string.Format("{0} {1}", lowerChar, upperChar);
                        document.ReplaceText(soughtCombo, desiredCombo);
                    }
                }
                document.Save();
            }
        }
}

Call it like so:

            Cursor.Current = Cursors.WaitCursor;
            try
            {
                Popul8UnusualCharLists();
                string filename = string.Empty;
                DialogResult result = openFileDialog1.ShowDialog();
                if (result == DialogResult.OK)
                {
                    filename = openFileDialog1.FileName;
                }
                else
                {
                    MessageBox.Show("No file selected - exiting");
                    return;
                }
                AerateUnusualCombo(filename);
                AerateUnusualEndNormalBegin(filename);
                AerateNormalEndUnusualBegin(filename);
                AerateNormalEndNormalBegin(filename);
            }
            finally
            {
                Cursor.Current = Cursors.Default;
            }
            MessageBox.Show("Scrunched together words have been normalized!");

A Parting Plaintive Plea

If you find this tip useful, pay it forward and do something nice to somebody today, even if it surprises them.

Note: I have added two source code files: the smaller one is just for this tip; the larger one contains all the DocX code for various articles I wrote on CodeProject December 2013 and January 2014.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

B. Clay Shannon
Founder "Across Time & Space"
United States United States
Ideaman and Coder at Across Time & Space, creator of "Mark Twain Central" at http://twaincentral.azurewebsites.net/

Peripatetic and picaresque, I have lived in eight states; specifically, besides my native California (where I was born and where I now again reside) in chronological order: New York, Montana, Alaska, Oklahoma, Wisconsin, Idaho, and Missouri.

I am also a writer of both fiction (for which I use a nom de plume, "Blackbird Crow Raven", as a nod to my Native American heritage - I am "½ Cowboy, ½ Indian") and nonfiction: http://www.lulu.com/spotlight/blackbirdcraven
Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150520.1 | Last Updated 7 Jan 2014
Article Copyright 2014 by B. Clay Shannon
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid