Click here to Skip to main content
12,757,220 members (42,903 online)
Click here to Skip to main content
Add your own
alternative version


93 bookmarked
Posted 19 Jul 2004

SpokenWord - Text-To-Speech and Office Automation in 1!

, 19 Jul 2004 CPOL
Rate this:
Please Sign up or sign in to vote.
.NET + MSWord + BabelFish + Speech SDK 5.1 = FUN!


This little application basically just "reads" a Word document back to you. Why? Why not? Actually, I figured someone else must have done it already, but couldn’t find anything. There’s a plug-in for IE. Plenty of cut-n-paste text readers. Microsoft Reader (mentioned below) is only for e-Books (and it sucks). There’s even a reader for Excel! But none for Word. So I wrote one. (It was a slow week at the office).


Actually I started down this path after emailing a set of technical documents to the team. One wise-guy said, "Great, something else to read. Hey, can't my computer read this for me..?" I said, "Sure! Why not?"

Famous last words.

So I looked around for a Word plugin. XP comes with Speech capabilities built in, so surely there must be something for the world's most popular word processor? (No snide remarks, please.)

Famous last words.

Well, I Googled, and found roughly ... squat. At least nothing that was actually useful. Notepad-to-Speech. Clipboard-to-Speech. Trinkets, really. Or gigantic, expensive, monolithic programs that ... well, that I didn't try. So I thought, "I have .NET ... I'll just write one. Piece of cake!"

Famous last words....


The first step, obviously, is to determine an appropriate speech technology.

I knew there was a new (more or less) speech SDK for .NET: at the time was Microsoft Speech Application SDK 1.0 Beta 4 (and is currently Microsoft Speech Server 2004). So I downloaded it (all 800,000Gb of it), installed, started playing around with it, and realized ... this ain't Text-to-Speech. Yes, it does have speech capability, but it's really for telephony -- you know, automated phone services. Wrong answer. <Sigh>

So off to CodeProject I went, looking for alternatives. There were several articles about MSAgent. I also found the Agent documentation on MSDN, and ran across a site where you can download additional characters. I played around with agents for awhile, and while they could be used for what I wanted, they were causing too many nightmarish Clippie flashbacks. Plus, there were too many components to download and install, and the text-to-speech engine wanted to "s-p-e-l-l t-h-i-n-g-s o-u-t" way too often.

Next I tried the Microsoft Reader, which is a free "reader" for e-Books. I downloaded and installed it, then went searching for something to read. It will only open e-Books, the vast majority of which must be purchased. I finally found some free e-Books on Amazon (Edgar Allen Poe's The Raven, plus others), downloaded it ... and found out the "reader" doesn't read it for you. It just shows it to you. To hear it, you have to download and install the TTS (Text-To-Speech) package (doh!). Did that, and, it now reads aloud ... with pathetic meter (seems to ignore all punctuation). Plus it only reads e-Books. There is a converter in the SDK, but the format of the source Word document is rather strict. So back to the drawing board!

So I was down to one last lead -- good ol' Speech SDK 5.1. Again, there are several articles here (even a whole category for it!). You may notice this has been around for quite awhile (current download is from August 2001!), and many of you have probably downloaded at one time or another. But just because it's old, it doesn't mean it's obsolete. In fact, I found it superior to the alternatives in several ways:

  • Fewer downloads and installs (as few as zero!),
  • Better handling of "unknown" words,
  • No annoying animated characters,
  • Easily integrated into .NET,
  • Lots of samples and code, since it's been around for awhile.

Finally having settled on the speech engine, I then focused on getting the text. Having done Office Automation before (back in the dark ages of VB6), I knew basically what needed to be done. Hook into Word's COM interface, grab the ActiveDocument, and start pulling text.

Actually, I had hoped there was a newer .NET hook into Word, but apparently Microsoft hasn't written/released a managed-code library yet. No biggie. Just right-click on References, select the COM tab, and scroll down to "Microsoft Word 9.0 Object Library." When you select it, you may notice a couple of other references tag along -- "Microsoft Office 9.0 Object Library" (gee, wonder what that is), and "Microsoft Visual Basic for Applications Extensibility." Then you're hooked in. Unless you don't have Word installed ... at which point you run to the store, buy Office, and install. Simple, right?

While you're in the COM tab (you didn't close it already, did you?), go ahead and select "Microsoft Speech Object Library" too. If you can't find it, jump to the Install section. If you have XP, I think it should already be there.

Installing Components

  • .NET Framework 1.1 (if you don't have it already).
  • Microsoft Word 2000 / XP (haven't tested with OfficeXP yet).
  • Speech 5.1:
    1. XP: You don't really need to install anything to run the exe.
      · If you want the full SDK, go ahead and download SpeechSDK51.exe from the link below.
      · If you just want extra voices, download Sp5TTIntXP.exe.
    2. 2000: If you have a "Speech" icon in your control panel, you're good to go.
    3. Else: If you don't have it, or if you want to install the full SDK, visit Microsoft.
      · If you want the full SDK, download SpeechSDK51.exe.
      · If you just want the TTS engine, download SpeechSDK51MSM.exe.
  • (optional) More Voices:
    • XP only comes with "Microsoft Sam," as does the redistributable (SpeechSDK51MSM.exe).
    • The SDK installs Sam, Mike, and Mary (my favorite). Mike and Mary are also in the XP-ONLY Sp5TTIntXP.exe.
    • Find the elusive "LH Michelle" and "LH Michael" by installing Microsoft Reader and the TTS add-on.
    • Somewhere amongst my downloads I acquired the linguistically challenged "Sample TTS Voice"....
    • If anyone knows where to find more, please let me know!
  • SpokenWord (download exe or source above): just stick it anywhere.
  • Finally, you'll need something to read. Something in Word. Fortunately for all, I have included a .doc in the download (either one). (At long last, a "ReadMe" that can really can be read!)

Using the code


  1. open a Word doc,
  2. open SpokenWord,
  3. press "Play".

Anything more complicated than that is covered in the ReadMe. But that's not why you are reading this, my CodeProject friend, you want to know the nuts and bolts ... the tips and tricks ... the ins and outs ... the gotchas and workarounds ....

Well, given that the two technologies we're using (Word 9.0 Library and Speech 5.1) both predate .NET, there's bound to be a few quirks. To be honest, this app would have been easier to write in VB6, but that just wouldn't be interesting, would it?

Problem 1: "Hooking" into Word:

First things first ... how can you access a Word document opened in another process? Do you iterate the processes found by System.Diagnostics.Process.GetProcesses() until you find WINWORD.EXE? Then maybe use some lovely Win32 API calls to attach to the process..?

Actually, it's easier than that. All you have to do is use Microsoft.VisualBasic.GetObject():

Dim app As Word.Application
app = CType(GetObject(Nothing, "Word.Application"), Word.Application)

If Word.Application is already running, GetObject() will return it. If not, it will throw a "Cannot create ActiveX component" exception, so you should wrap it in a Try / Catch block. If there's more than one instance (Word document) currently running, I believe it will just return the first one (lowest ProcessID).

Well, after you're done with a reference, you also need to clean it up. This is especially true if your code activated Word (by using app = new Word.ApplicationClass), but occasionally GetObject() would create an instance, so you should always clean up, just in case. Traditionally, you would exit Office automation by calling Word.Application.Quit(). But you probably don't want to terminate if visible, because the user might be interacting with the document. Plus, in .NET, it turns out that because of the COM Interop wrapper class, the COM reference counts on IUnknown do not decrement correctly, so you need to add a ReleaseComObject() call to correct for that. Finally, since Word automation is such a fragile beast, wrap it in Try / Catch:

<System.STAThread()> Public Sub ExitWord()
    If app Is Nothing Then Exit Sub
        ' only close word if currently hidden!
        If app.Visible Then Exit Sub
        ' decrement COM reference
    Catch ex As Exception
        RaiseEvent Status("Error in ExitWord: " & ex.Message)
    End Try
    app = Nothing
End Sub

You may have also noted that the InitWord() and ExitWord() methods are tagged as <System.STAThread()>. The Word library is single-threaded, so it seemed like the safe thing to do. It might work without the STA attributes, but why rock the boat? This is also the source of the frequent System.Windows.Forms.Application.DoEvents() calls -- we're interacting with a foreign user process, so it's a good idea to yield the CPU when possible.

Once you've found a Word document, you need to access the text in it. A couple of methods are easily implemented:

  • ActiveDocument.ActiveWindow.Selection will return the highlighted portion, and
  • ActiveDocument.ActiveWindow.Range.WholeStory will return all the text in the document.

It gets a bit trickier if you want to "step through" the document. To do that, you need to set the selection to the end of the current selection, then Collapse() the selection to remove any "non-visual" characters (like formatting tags), then inch the selection ahead by one character and Expand() by the desired increment (WdUnits.wdSentence) until you have something selected. Unfortunately, it gets more complicated if you are inside a table, or selecting embedded objects, etc. It also doesn't help that Range.Collapse() occasionally responds with "call was rejected by the Callee"! So the final implementation of NavNext() looks like:

 ' navigate in Word to next block of text.
 Public Function NavNext(Optional ByVal Recurse As Boolean = True) As Boolean
     Dim sel As Word.Selection = GetSelection()
     If sel Is Nothing Then Return False
     Dim oldstart As Integer = sel.Range.Start
     Dim oldEnd As Integer = sel.Range.End
     Dim i As Integer = 1
     Do ' don't ask.  sometimes it gets stuck.
         sel.Start = sel.End
         Catch ' useless errors.
         End Try
         sel.End += i
         If sel.End >= initialEnd Then
             ' at end of original selection.
             sel.End = initialEnd
             Exit Do
         End If
         i += 1
Loop Until sel.End > oldEnd                ' YES, ALL of these
If sel.Start < initialStart Then sel.Start = initialStart ' are necessary.
If sel.Start < oldstart Then sel.Start = oldstart ' Navigation in Word
If sel.Start < oldEnd Then sel.Start = oldEnd ' can be a pain-in-the
If sel.Start < oldEnd Then ' (you-know-what)!
         ' This happens sometimes in a table cell ...
         ' it refuses to select a portion.
         ' have to decrement the end as well to make it work.
         sel.End -= 1
         sel.Start = oldEnd
         If sel.Start = sel.End Then
             sel.Expand(SELECT_UNIT) ' ok ... now we're stuck in a cell.
                'expand to get it all THEN navnext.
             If Recurse Then Return NavNext(False) Else Return False
         End If
     End If

     ' Force doc to show selection.
     If sel.End > sel.Start Then Return True
     Return False
 End Function

Now that we have what to read, we have to get it into a readable format. Range.Text will return the unformatted text in Word, but unfortunately that does not include any kind of format information, including any automatically incrementing numbers (like section headers or numbered lists). To preserve most of the formatting, I use Range.Copy() to copy the selection to the clipboard, then Paste to a RichTextBox control. RichText is a much easier format to decode than Word, and the intent was to look for simple formatting like bold, italics, and underlining to modify the voice inflection. Of course, that part was never written....

But I digress. It turns out the numbering information is available in the RTF copy, but unfortunately (there's that word again) the numbering always resets to Fortunately, the correct numbering is available in the "raw text" clipboard copy.

If you want to learn more about programming with Word Automation, see the Reference Documentation on MSDN.

Problem 2: Don't "steal" focus:

Now that I had a way to step through a Word document, I wanted to keep focus on that Word document while still allowing interaction with the SpokenWord controls. In order to do this, SpokenWord needs to be non-activating window, like most tool windows. This is achieved via:

Protected Overrides ReadOnly Property CreateParams() As _
     ' This sets the window up as not "stealing" focus (NOACTIVATE)
     ' Unfortunately, the dropdown lists force activation!!
     Const WS_EX_NOACTIVATE As Integer = &H8000000
     Dim Result As CreateParams
     Result = MyBase.CreateParams
     Result.ExStyle = Result.ExStyle Or WS_EX_NOACTIVATE
     Return Result
  End Get
End Property

As you can see in the comments, it is not perfect, due to a bug in .NET dropdown lists. It also means the window will not shuffle to the top of the ZOrder, since this normally happens when the window gets focus. We have two ways to fix this: 1) make the window TopMost (always on top), or 2) force it to the top at an appropriate time. I decided to use the latter. When is an appropriate time? How about when the user clicks on the titlebar ... well, there's no event for that, so you have to override the WinProc:

Protected Overrides Sub WndProc(ByRef m As System.Windows.Forms.Message)
    If m.Msg = WM_NCLBUTTONDOWN Then
        If (m.WParam.ToInt32 = HTCAPTION) Then
            Me.TopMost = True
            Application.DoEvents() ' wait for it....
            Me.TopMost = False
        End If
    End If
    MyBase.WndProc(m)  ' allow ancestor to handle it.
End Sub

Problem 3: "steal" focus!!:

Now that the application no longer steals the focus, a new problem turned up. Apparently, the Word.Application is not accessible until after the Word document has lost focus. Why? Who knows? The fact remains that a document opened after SpokenWord will remain invisible to it until it loses focus.

So the easiest solution, therefore, is to "force" focus onto SpokenWord! This is encapsulated in the form "frmThief", using a series of API calls including SetForegroundWindow. All it does is pop up, make itself the active foreground window, and then go away. The code is supposed to set focus back to the prior foreground window, but that isn't working. If anyone can fix it, please let me know how.

Problem 4: Text-To-Speech:

This is actually the easiest problem so far. You could simply grab the default voice: Dim m_voice As New SpeechLib.SpVoiceClass and tell it to speak: m_voice.Speak(Text). However, that wouldn't be much fun, and the "playback" would be less than ideal. SpokenWord processes each segment of text in several steps (which you can see if you click on "options"):

  1. Orig: this is actually just the RTF copy from Word, corresponding to the "original" text.
  2. Pre: this is the output of Parser.PreProcess(Text), which basically does some character replacement/expansion.
  3. Effect: this is the output of the selected effect (if any). All the effects are public webservices, and are just for fun. I have no control over their success or failure!
  4. Post: this is the output of Parser.PostProcess(Text). Not currently implemented, but the intent is to reformat acronyms into something more understandable (for instance, SpokenWord reads "XP" as "Ixpee", not "Ex Pee").
  5. SAPI: this is the output from VoiceHelper.SAPIfy(Text). Currently, it just expands certain punctuation into "pauses", but ideally it would add emphasis and inflection where appropriate. SAPI is just a markup language (like HTML) for "formatting" text-to-speech. See the SAPI help file for more information on valid tags.

The last tab is a running log of what is happening.

So the only thing left to explain is how to control the voice. SpokenWord lets you adjust:

  • Voice: this dropdown lets you change the current speaker. See the Installation section to add voices.
  • Speed: this slider will let you adjust the speed.
  • Pitch: this slider will let you adjust the pitch.
  • As well as Pause, Stop, and Repeat (on the toolbar).

Stop warrants special mention, since the SpeechLib doesn't directly support a stop action. However, it does have Skip, which I use to simply skip through the current playback buffer until it is empty.

Speaking (ahem) of stopping, you may wonder: how does the application know when to start the next sentence? This is achieved by injecting SAPI bookmarks into the playback stream. When the SpeechLib encounters a bookmark during playback, it raises an event. So SpokenWord adds a bookmark to the end of the text. Then when the bookmark event is raised, it knows the playback is done, so it moves to the next sentence.

Unfortunately (ugh), SpeechLib occasionally raises some aberrant bookmarks, specifically if it encounters currency amounts in the text (i.e. $100). These "unknown" bookmarks are ignored by the app.

Areas of Improvement

There are several areas that can be improved (or at least implemented) in the code:

  • Externalize Pre-process "dictionary" to an XML file for user customization.
  • Actually implement the Post-process.
  • Parse the RTF text to add SAPI "formatting". For example, if the original text is bold, wrap the word in <EMPH> (emphasis) tags.
  • Better synchronization between playback and navigation. If you look at frmHover.m_voice_Bookmark you'll see that I tried to use the bookmarks to set the Word selection to the correct position, but that proved more trouble than it was worth. Currently, the selection and playback can get out of sync, especially inside tables. The only way to re-sync it is to stop playback and restart.
  • Detect and ignore sections like Table of Contents and Indexes (see WordHelper.IsAnIndexBlock()).


  • Initial version posted July 19, 2004


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Team Leader
United States United States
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralDeployment of text to speech .NET applications Pin
Graham Dean6-Mar-07 6:36
memberGraham Dean6-Mar-07 6:36 
GeneralRe: Deployment of text to speech .NET applications Pin
sak1818-Jun-07 23:22
membersak1818-Jun-07 23:22 
GeneralRe: Deployment of text to speech .NET applications Pin
Graham Dean19-Jun-07 0:59
memberGraham Dean19-Jun-07 0:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170217.1 | Last Updated 20 Jul 2004
Article Copyright 2004 by GWSyZyGy
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid