Click here to Skip to main content
15,881,204 members
Articles / Productivity Apps and Services / Microsoft Office
Tip/Trick

How to Replace a List of Words in a .DOCX File using the DocX Library

Rate me:
Please Sign up or sign in to vote.
4.78/5 (13 votes)
16 Dec 2014CPOL3 min read 74.8K   1.7K   21   19
Replace words in .docx files using the DocX Library

Mission

I recently came upon a tedious need to replace a slew (not slough) of words in a long document (John Ormsby's 1885 translation of Miguel de Cervantes' Don Quixote). 

Specifically, I wanted to replace the British spellings of words with their American dialect equivalents. For example, I wanted to replace "colour" with "color", "centre" with "center", "plough" with "plow", etc.

I could replace these one word at a time using Find > Replace, but that quickly becomes a pain in the arse...I mean, donkey. After all, WE BE PROGRAMMERS!

So, I located a handy library for working with .docx files named, appropriately if dully or even near-redundantly, DocX

Commission

To use the docx library, simply download it (docx.dll) from here, add a reference to it in your project, and then add this using clause: 

C#
using Novacode;

You first need to load the document that contains the "fawlty" spellings, like so (this assumes you have dropped an openfiledialog control on a Windows Forms form, and kept the default name (openFileDialog1)):

C#
string filename = string.Empty;
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.OK)
{
    filename = openFileDialog1.FileName;
}
else
{
    MessageBox.Show("No file selected - hasta la vista and Ciao, baby!");
    return;
}
using (DocX document = DocX.Load(filename))
{
    document.ReplaceText("travelled", "traveled");
    document.Save();
}

But, of course, we want to do all the words at once. First, we need to have that list of words, so code like this is needed:

C#
List<string> wordPairs;
public Form1()
{
    InitializeComponent();
    Popul8WordPairs();
}
. . .
private void Popul8WordPairs()
{
    wordPairs = new List<string>();
    ExpandWordPairs("æroplane", "airplane");
    ExpandWordPairs("æsthetic", "esthetic");
    ExpandWordPairs("ageing", "aging");
    ExpandWordPairs("æsthetic", "esthetic");
    ExpandWordPairs("ageing", "aging");
    ExpandWordPairs("aluminium", "aluminum");
    ExpandWordPairs("amœba", "ameba");
    ExpandWordPairs("anæmia", "anemia");
    ExpandWordPairs("anæsthesia", "anesthesia");
    ExpandWordPairs("analyse", "analyze");
    . . .
    ExpandWordPairs("victual", "vittle");
    ExpandWordPairs("vigour", "vigor");
    ExpandWordPairs("vigourous", "vigorous");
    ExpandWordPairs("vigourously", "vigorously");
    ExpandWordPairs("whiskey", "whisky");
    ExpandWordPairs("woolen", "woollen");
    ExpandWordPairs("yoghurt", "yogurt");
}

But hold on there, pard! What is this "ExpandWordPairs" jazz? Well, if a "word" (sequence of letters) in the list appears in the middle of another word, we don't want to "mess with it", so as to avoid any potentially embarrassing mishaps. So we want to look for the word and only the word, and so it is bookended with spaces. But then again, what if it is at the start of a sentence (capitalized), or at the end of a sentence or clause and does not have a space after it, but some form of punctuation, such as a comma or period, etc.?

Those are the situations the ExpandWordPairs() method handles, thusly:

C#
private void ExpandWordPairs(string britSpelling, string amiSpelling)
{
    wordPairs.Add(SpacesForeAndAft(britSpelling, amiSpelling));
    wordPairs.Add(CapitalizedAndSpaceAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForePeriodAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeCommaAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeColonAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeSemicolonAft(britSpelling, amiSpelling));
    wordPairs.Add(SpaceForeDashAft(britSpelling, amiSpelling));
}

An example of the methods it calls is shown here:

C#
private string SpaceForeDashAft(string britSpelling, string amiSpelling)
{
    return string.Format(" {0}-# {1}-", britSpelling, amiSpelling);
}

In this way, now by passing "scrutinising" to ExpandWordPairs(), the following will all be found and replaced with their equivalents ("zing" ending instead of "sing"):

  • " scrutinising "
  • "Scrutinising "
  • " scrutinising."
  • " scrutinising,"
  • " scrutinising;"
  • " scrutinising:
  • " scrutinising-"

Note: I used a pound/hash sign as a separator (instead of the traditional comma or semicolon) between the "bad" (British English) and the "good" (American English) spellings because those punctuation marks (comma and semicolon) would then be more difficult to handle. Using a "#" was simply easier. I could have used a tilde or the symbol that represents The Artist Formerly Known As Prince, or something else just as well. Well, maybe not just as well. That's why I stuck with the "#" over TAFKAP.

At any rate, the spartan code shown earlier now becomes this:

C#
string filename = string.Empty;
string britSpelling = string.Empty;
string amiSpelling = string.Empty;
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.OK)
{
    filename = openFileDialog1.FileName;
}
else
{
    MessageBox.Show("No file selected - cheerio and later days, dude!!");
    return;
}
using (DocX document = DocX.Load(filename))
{
    foreach (string s in wordPairs)
    {
        britSpelling = GetFirstHalf(s);
        amiSpelling = GetSecondHalf(s);
        document.ReplaceText(britSpelling, amiSpelling);
    }
    document.Save();
}

Completion

All of the code is available in the accompanying file. All you need to do for it to run (besides what was already mentioned) is to drop a button on the form, retaining its default name (button1), and name the project "AmericanizeBritSpeak" (or name it whatever you want and replace that name with yours). 

This just scratches the surface of what can be accomplished when using the DocX library to work with .docx files. Download it from here, and donate if you find it useful and are able to.

If you cannot, or do not want to, create a utility based on the source code, you can download the .exe, which I have zipped up and added to this tip. It looks like this when you run it:

Image 1

Just click the button and you will be able to load a document and have its British English spellings replaced with the spellings used in American English. As you can see, the utility also contains links to two of my web sites as well as to all three volumes of the dual-language (Spanish and English) volumes of Don Quixote assembled and generated by "Found in the Translation".

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder Across Time & Space
United States United States
I am in the process of morphing from a software developer into a portrayer of Mark Twain. My monologue (or one-man play, entitled "The Adventures of Mark Twain: As Told By Himself" and set in 1896) features Twain giving an overview of his life up till then. The performance includes the relating of interesting experiences and humorous anecdotes from Twain's boyhood and youth, his time as a riverboat pilot, his wild and woolly adventures in the Territory of Nevada and California, and experiences as a writer and world traveler, including recollections of meetings with many of the famous and powerful of the 19th century - royalty, business magnates, fellow authors, as well as intimate glimpses into his home life (his parents, siblings, wife, and children).

Peripatetic and picaresque, I have lived in eight states; specifically, besides my native California (where I was born and where I now again reside) in chronological order: New York, Montana, Alaska, Oklahoma, Wisconsin, Idaho, and Missouri.

I am also a writer of both fiction (for which I use a nom de plume, "Blackbird Crow Raven", as a nod to my Native American heritage - I am "½ Cowboy, ½ Indian") and nonfiction, including a two-volume social and cultural history of the U.S. which covers important events from 1620-2006: http://www.lulu.com/spotlight/blackbirdcraven

Comments and Discussions

 
QuestionGood article, but a thought Pin
Wade Beasley29-Apr-15 5:39
Wade Beasley29-Apr-15 5:39 
AnswerRe: Good article, but a thought Pin
B. Clay Shannon29-Apr-15 5:44
professionalB. Clay Shannon29-Apr-15 5:44 
Questionbroken source link Pin
fredatcodeproject11-Dec-14 22:06
professionalfredatcodeproject11-Dec-14 22:06 
AnswerRe: broken source link Pin
Philippe9112-Dec-14 3:36
Philippe9112-Dec-14 3:36 
AnswerRe: broken source link Pin
B. Clay Shannon12-Dec-14 5:50
professionalB. Clay Shannon12-Dec-14 5:50 
GeneralRe: broken source link Pin
fredatcodeproject12-Dec-14 5:58
professionalfredatcodeproject12-Dec-14 5:58 
please do
GeneralRe: broken source link Pin
B. Clay Shannon12-Dec-14 6:02
professionalB. Clay Shannon12-Dec-14 6:02 
GeneralRe: broken source link Pin
fredatcodeproject12-Dec-14 6:36
professionalfredatcodeproject12-Dec-14 6:36 
AnswerRe: broken source link Pin
B. Clay Shannon12-Dec-14 18:35
professionalB. Clay Shannon12-Dec-14 18:35 
GeneralRe: broken source link Pin
fredatcodeproject15-Dec-14 1:18
professionalfredatcodeproject15-Dec-14 1:18 
GeneralMy vote of 5 Pin
Humayun Kabir Mamun11-Dec-14 17:45
Humayun Kabir Mamun11-Dec-14 17:45 
QuestionWhere is the source? Pin
Member 983562611-Dec-14 12:17
Member 983562611-Dec-14 12:17 
AnswerRe: Where is the source? Pin
B. Clay Shannon11-Dec-14 12:37
professionalB. Clay Shannon11-Dec-14 12:37 
AnswerRe: Where is the source? Pin
B. Clay Shannon11-Dec-14 12:59
professionalB. Clay Shannon11-Dec-14 12:59 
AnswerRe: Where is the source? Pin
B. Clay Shannon11-Dec-14 13:04
professionalB. Clay Shannon11-Dec-14 13:04 
QuestionSource 2.7 KB ? Pin
RAND 45586627-Aug-14 9:36
RAND 45586627-Aug-14 9:36 
GeneralMy vote of 5 Pin
fredatcodeproject1-Jan-14 5:04
professionalfredatcodeproject1-Jan-14 5:04 
GeneralRe: My vote of 5 Pin
RunyonAveSoulja28-Jan-14 6:56
RunyonAveSoulja28-Jan-14 6:56 
GeneralRe: My vote of 5 Pin
B. Clay Shannon28-Jan-14 7:13
professionalB. Clay Shannon28-Jan-14 7:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.