Click here to Skip to main content
Click here to Skip to main content

How to Replace a List of Words in a .DOCX File using the DocX Library

, 16 Dec 2014 CPOL
Rate this:
Please Sign up or sign in to vote.
Replace words in .docx files using the DocX Library

Mission

I recently came upon a tedious need to replace a slew (not slough) of words in a long document (John Ormsby's 1885 translation of Miguel de Cervantes' Don Quixote). 

Specifically, I wanted to replace the British spellings of words with their American dialect equivalents. For example, I wanted to replace "colour" with "color", "centre" with "center", "plough" with "plow", etc.

I could replace these one word at a time using Find > Replace, but that quickly becomes a pain in the arse...I mean, donkey. After all, WE BE PROGRAMMERS!

So, I located a handy library for working with .docx files named, appropriately if dully or even near-redundantly, DocX

Commission

To use the docx library, simply download it (docx.dll) from here, add a reference to it in your project, and then add this using clause: 

using Novacode;

You first need to load the document that contains the "fawlty" spellings, like so (this assumes you have dropped an openfiledialog control on a Windows Forms form, and kept the default name (openFileDialog1)):

string filename = string.Empty;
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.OK)
{
    filename = openFileDialog1.FileName;
}
else
{
    MessageBox.Show("No file selected - hasta la vista and Ciao, baby!");
    return;
}
using (DocX document = DocX.Load(filename))
{
    document.ReplaceText("travelled", "traveled");
    document.Save();
}

But, of course, we want to do all the words at once. First, we need to have that list of words, so code like this is needed:

List<string> wordPairs;
public Form1()
{
    InitializeComponent();
    Popul8WordPairs();
}
. . .
private void Popul8WordPairs()
{
    wordPairs = new List<string>();
    ExpandWordPairs("&aelig;roplane", "airplane");
    ExpandWordPairs("&aelig;sthetic", "esthetic");
    ExpandWordPairs("ageing", "aging");
    ExpandWordPairs("&aelig;sthetic", "esthetic");
    ExpandWordPairs("ageing", "aging");
    ExpandWordPairs("aluminium", "aluminum");
    ExpandWordPairs("am&oelig;ba", "ameba");
    ExpandWordPairs("an&aelig;mia", "anemia");
    ExpandWordPairs("an&aelig;sthesia", "anesthesia");
    ExpandWordPairs("analyse", "analyze");
    . . .
    ExpandWordPairs("victual", "vittle");
    ExpandWordPairs("vigour", "vigor");
    ExpandWordPairs("vigourous", "vigorous");
    ExpandWordPairs("vigourously", "vigorously");
    ExpandWordPairs("whiskey", "whisky");
    ExpandWordPairs("woolen", "woollen");
    ExpandWordPairs("yoghurt", "yogurt");
}

But hold on there, pard! What is this "ExpandWordPairs" jazz? Well, if a "word" (sequence of letters) in the list appears in the middle of another word, we don't want to "mess with it", so as to avoid any potentially embarrassing mishaps. So we want to look for the word and only the word, and so it is bookended with spaces. But then again, what if it is at the start of a sentence (capitalized), or at the end of a sentence or clause and does not have a space after it, but some form of punctuation, such as a comma or period, etc.?

Those are the situations the ExpandWordPairs() method handles, thusly:

private void ExpandWordPairs(string britSpelling, string amiSpelling)
{
    wordPairs.Add(SpacesForeAndAft(britSpelling, amiSpelling));
    wordPairs.Add(CapitalizedAndSpaceAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForePeriodAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeCommaAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeColonAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeSemicolonAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeDashAft(britSpelling, amiSpelling));
}

An example of the methods it calls is shown here:

private string SpaceForeDashAft(string britSpelling, string amiSpelling)
{
    return string.Format(" {0}-# {1}-", britSpelling, amiSpelling);
}

In this way, now by passing "scrutinising" to ExpandWordPairs(), the following will all be found and replaced with their equivalents ("zing" ending instead of "sing"):

  • " scrutinising "
  • "Scrutinising "
  • " scrutinising."
  • " scrutinising,"
  • " scrutinising;"
  • " scrutinising:
  • " scrutinising-"

Note: I used a pound/hash sign as a separator (instead of the traditional comma or semicolon) between the "bad" (British English) and the "good" (American English) spellings because those punctuation marks (comma and semicolon) would then be more difficult to handle. Using a "#" was simply easier. I could have used a tilde or the symbol that represents The Artist Formerly Known As Prince, or something else just as well. Well, maybe not just as well. That's why I stuck with the "#" over TAFKAP.

At any rate, the spartan code shown earlier now becomes this:

string filename = string.Empty;
string britSpelling = string.Empty;
string amiSpelling = string.Empty;
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.OK)
{
    filename = openFileDialog1.FileName;
}
else
{
    MessageBox.Show("No file selected - cheerio and later days, dude!!");
    return;
}
using (DocX document = DocX.Load(filename))
{
    foreach (string s in wordPairs)
    {
        britSpelling = GetFirstHalf(s);
        amiSpelling = GetSecondHalf(s);
        document.ReplaceText(britSpelling, amiSpelling);
    }
    document.Save();
}

Completion

All of the code is available in the accompanying file. All you need to do for it to run (besides what was already mentioned) is to drop a button on the form, retaining its default name (button1), and name the project "AmericanizeBritSpeak" (or name it whatever you want and replace that name with yours). 

This just scratches the surface of what can be accomplished when using the DocX library to work with .docx files. Download it from here, and donate if you find it useful and are able to.

If you cannot, or do not want to, create a utility based on the source code, you can download the .exe, which I have zipped up and added to this tip. It looks like this when you run it:

Just click the button and you will be able to load a document and have its British English spellings replaced with the spellings used in American English. As you can see, the utility also contains links to two of my web sites as well as to all three volumes of the dual-language (Spanish and English) volumes of Don Quixote assembled and generated by "Found in the Translation".

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

B. Clay Shannon
Founder "Across Time & Space"
United States United States
Ideaman and Coder at Across Time & Space, creator of the Windows Store App "Photrax", which can be downloaded as a trial (7 days) from http://apps.microsoft.com/windows/en-us/app/photrax/75c18e6c-96bd-4607-ac43-531aab098ab4
 
Peripatetic and picaresque, I have lived in eight states; specifically, besides my native California (where I was born and where I now again reside) in chronological order: New York, Montana, Alaska, Oklahoma, Wisconsin, Idaho, and Missouri.
 
I am also a writer of both fiction (for which I use a nom de plume, "Blackbird Crow Raven", as a nod to my Native American heritage - I am "½ Cowboy, ½ Indian") and nonfiction: http://www.lulu.com/spotlight/blackbirdcraven
Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
Questionbroken source link Pinmemberfredatcodeproject11-Dec-14 23:06 
AnswerRe: broken source link PinmemberPhilippe9112-Dec-14 4:36 
AnswerRe: broken source link PinprofessionalB. Clay Shannon12-Dec-14 6:50 
GeneralRe: broken source link Pinmemberfredatcodeproject12-Dec-14 6:58 
GeneralRe: broken source link PinprofessionalB. Clay Shannon12-Dec-14 7:02 
GeneralRe: broken source link Pinmemberfredatcodeproject12-Dec-14 7:36 
AnswerRe: broken source link PinprofessionalB. Clay Shannon12-Dec-14 19:35 
GeneralRe: broken source link Pinmemberfredatcodeproject15-Dec-14 2:18 
GeneralMy vote of 5 PinmemberHumayun Kabir Mamun11-Dec-14 18:45 
QuestionWhere is the source? PinmemberMember 983562611-Dec-14 13:17 
AnswerRe: Where is the source? PinprofessionalB. Clay Shannon11-Dec-14 13:37 
AnswerRe: Where is the source? PinprofessionalB. Clay Shannon11-Dec-14 13:59 
AnswerRe: Where is the source? PinprofessionalB. Clay Shannon11-Dec-14 14:04 
QuestionSource 2.7 KB ? PinmemberMember 45586627-Aug-14 10:36 
GeneralMy vote of 5 Pinmemberfredatcodeproject1-Jan-14 6:04 
GeneralRe: My vote of 5 PinmemberRunyonAveSoulja28-Jan-14 7:56 
GeneralRe: My vote of 5 PinprofessionalB. Clay Shannon28-Jan-14 8:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150129.1 | Last Updated 16 Dec 2014
Article Copyright 2013 by B. Clay Shannon
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid