
Introduction
This little application basically just "reads" a Word
document back to you. Why? Why not? Actually, I figured someone else must have
done it already, but couldn�t find anything. There�s a plug-in for IE. Plenty of
cut-n-paste text readers. Microsoft Reader (mentioned below) is only for e-Books
(and it sucks). There�s even a reader for Excel! But none for Word. So I wrote
one. (It was a slow week at the office).
Background
Actually I started down this path after emailing a set of technical documents
to the team. One wise-guy said, "Great, something else to read. Hey, can't my
computer read this for me..?" I said, "Sure! Why not?"
Famous last words.
So I looked around for a Word plugin. XP comes with Speech capabilities built
in, so surely there must be something for the world's most popular
word processor? (No snide remarks, please.)
Famous last words.
Well, I Googled, and found roughly ... squat. At least nothing that
was actually useful. Notepad-to-Speech. Clipboard-to-Speech. Trinkets, really.
Or gigantic, expensive, monolithic programs that ... well, that I didn't try. So
I thought, "I have .NET ... I'll just write one. Piece of cake!"
Famous last words....
Research
The first step, obviously, is to determine an appropriate speech technology.
I knew there was a new (more or less) speech SDK for .NET: at the time was
Microsoft Speech Application SDK 1.0 Beta 4 (and is currently Microsoft Speech Server
2004). So I downloaded it (all 800,000Gb of it), installed, started playing
around with it, and realized ... this ain't Text-to-Speech. Yes, it does have
speech capability, but it's really for telephony -- you know,
automated phone services. Wrong answer. <Sigh>
So off to CodeProject I went, looking for alternatives. There were several
articles about MSAgent. I also found the Agent documentation on MSDN, and ran across a site where you can
download
additional characters. I played around with agents for awhile, and while
they could be used for what I wanted, they were causing too many
nightmarish Clippie flashbacks. Plus, there were too many components to download
and install, and the text-to-speech engine wanted to "s-p-e-l-l t-h-i-n-g-s
o-u-t" way too often.
Next I tried the Microsoft Reader, which is a free "reader" for e-Books. I
downloaded and installed it, then went searching for something to read. It will
only open e-Books, the vast majority of which must be purchased. I
finally found some free e-Books on Amazon (Edgar Allen Poe's The Raven,
plus others), downloaded it ... and found out the "reader"
doesn't read it for you. It just shows it to you. To hear it, you
have to download and install the TTS (Text-To-Speech) package (doh!). Did that, and, it now reads
aloud ... with pathetic meter (seems to ignore all punctuation). Plus it only
reads e-Books. There is a converter in the SDK, but the format of the source Word document is rather
strict. So back to the drawing board!
So I was down to one last lead -- good ol' Speech SDK 5.1. Again, there are several articles here (even a whole category for it!). You may notice
this has been around for quite awhile (current download is from August
2001!), and many of you have probably downloaded at one time or another. But
just because it's old, it doesn't mean it's obsolete. In fact, I found it
superior to the alternatives in several ways:
- Fewer downloads and installs (as few as zero!),
- Better handling of "unknown" words,
- No annoying animated characters,
- Easily integrated into .NET,
- Lots of samples and code, since it's been around for awhile.
Finally having settled on the speech engine, I then focused on getting the
text. Having done Office Automation before (back in the dark ages of
VB6), I knew basically what needed to be done. Hook into Word's COM interface,
grab the ActiveDocument, and start pulling text.
Actually, I had hoped there was a newer .NET hook into Word, but apparently
Microsoft hasn't written/released a managed-code library yet. No biggie. Just
right-click on References, select the COM tab, and scroll down to
"Microsoft Word 9.0 Object Library." When you select it, you may notice a couple
of other references tag along -- "Microsoft Office 9.0 Object Library" (gee,
wonder what that is), and "Microsoft Visual Basic for Applications
Extensibility." Then you're hooked in. Unless you don't have Word installed ...
at which point you run to the store, buy Office, and install. Simple, right?
While you're in the COM tab (you didn't close it already, did you?), go ahead
and select "Microsoft Speech Object Library" too. If you can't find it, jump to
the Install section. If you have XP, I think it should
already be there.
- .NET Framework 1.1 (if you don't have it already).
- Microsoft Word 2000 / XP (haven't tested with OfficeXP yet).
- Speech 5.1:
- XP:
You don't really need to install anything to run the exe.
� If
you want the full SDK, go ahead and download SpeechSDK51.exe from the link
below.
� If you just want extra voices, download Sp5TTIntXP.exe.
- 2000: If
you have a "Speech" icon in your control panel, you're good to go.
- Else: If
you don't have it, or if you want to install the full SDK, visit Microsoft.
� If you want the full SDK, download
SpeechSDK51.exe.
� If you just want the TTS engine, download
SpeechSDK51MSM.exe.
- (optional) More Voices:
- XP only comes with "Microsoft Sam," as does the redistributable
(SpeechSDK51MSM.exe).
- The SDK installs Sam, Mike, and Mary (my favorite). Mike and Mary are also
in the XP-ONLY Sp5TTIntXP.exe.
- Find the elusive "LH Michelle" and "LH Michael" by installing Microsoft
Reader and the TTS add-on.
- Somewhere amongst my downloads I acquired the linguistically challenged
"Sample TTS Voice"....
- If anyone knows where to find more, please let me know!
- SpokenWord (download exe or source above): just stick it anywhere.
- Finally, you'll need something to read. Something in Word. Fortunately for
all, I have included a .doc in the download (either one). (At long last, a
"ReadMe" that can really can be read!)
Using the code
Basically:
- open a Word doc,
- open SpokenWord,
- press "Play".
Anything more complicated than that is covered in the ReadMe. But that's not
why you are reading this, my CodeProject friend, you want to know the nuts and
bolts ... the tips and tricks ... the ins and outs ... the gotchas and
workarounds ....
Well, given that the two technologies we're using (Word 9.0 Library and
Speech 5.1) both predate .NET, there's bound to be a few quirks. To be
honest, this app would have been easier to write in VB6, but that just wouldn't
be interesting, would it?
Problem 1: "Hooking" into Word:
First things first ... how can you access a Word document opened in another
process? Do you iterate the processes found by
System.Diagnostics.Process.GetProcesses()
until you find
WINWORD.EXE? Then maybe use some lovely Win32 API calls to attach to the
process..?
Actually, it's easier than that. All you have to do is use
Microsoft.VisualBasic.GetObject()
:
Dim app As Word.Application
app = CType(GetObject(Nothing, "Word.Application"), Word.Application)
If Word.Application
is already running, GetObject()
will return it. If not, it will throw a "Cannot create ActiveX component"
exception, so you should wrap it in a Try / Catch
block. If there's
more than one instance (Word document) currently running, I believe it will just
return the first one (lowest ProcessID).
Well, after you're done with a reference, you also need to clean it up. This
is especially true if your code activated Word (by using app = new
Word.ApplicationClass
), but occasionally GetObject()
would create an instance, so you should always clean up, just in case.
Traditionally, you would exit Office automation by calling
Word.Application.Quit()
. But you probably don't want to
terminate if visible, because the user might be interacting with the document.
Plus, in .NET, it turns out that because of the COM Interop wrapper class, the
COM reference counts on IUnknown
do not decrement correctly, so you
need to add a ReleaseComObject()
call to correct for that. Finally,
since Word automation is such a fragile beast, wrap it in Try /
Catch
:
<System.STAThread()> Public Sub ExitWord()
If app Is Nothing Then Exit Sub
Try
If app.Visible Then Exit Sub
app.Quit()
System.Runtime.InteropServices.Marshal.ReleaseComObject(app)
Catch ex As Exception
RaiseEvent Status("Error in ExitWord: " & ex.Message)
End Try
app = Nothing
End Sub
You may have also noted that the InitWord()
and
ExitWord()
methods are tagged as
<System.STAThread()>
. The Word library is single-threaded, so
it seemed like the safe thing to do. It might work without the STA attributes,
but why rock the boat? This is also the source of the frequent
System.Windows.Forms.Application.DoEvents()
calls -- we're
interacting with a foreign user process, so it's a good idea to yield the CPU
when possible.
Once you've found a Word document, you need to access the text in it. A
couple of methods are easily implemented:
ActiveDocument.ActiveWindow.Selection
will return the
highlighted portion, and
ActiveDocument.ActiveWindow.Range.WholeStory
will return all
the text in the document.
It gets a bit trickier if you want to "step through" the document. To do
that, you need to set the selection to the end of the current selection, then
Collapse()
the selection to remove any "non-visual" characters
(like formatting tags), then inch the selection ahead by one character and
Expand()
by the desired increment (WdUnits.wdSentence
)
until you have something selected. Unfortunately, it gets more complicated if
you are inside a table, or selecting embedded objects, etc. It also doesn't help
that Range.Collapse()
occasionally responds with "call was rejected
by the Callee"! So the final implementation of NavNext()
looks
like:
Public Function NavNext(Optional ByVal Recurse As Boolean = True) As Boolean
Dim sel As Word.Selection = GetSelection()
If sel Is Nothing Then Return False
Dim oldstart As Integer = sel.Range.Start
Dim oldEnd As Integer = sel.Range.End
Dim i As Integer = 1
Do
sel.Start = sel.End
Try
sel.Collapse()
Catch
End Try
sel.End += i
If sel.End >= initialEnd Then
sel.End = initialEnd
Exit Do
End If
i += 1
sel.Expand(SELECT_UNIT)
Loop Until sel.End > oldEnd
If sel.Start < initialStart Then sel.Start = initialStart
If sel.Start < oldstart Then sel.Start = oldstart
If sel.Start < oldEnd Then sel.Start = oldEnd
If sel.Start < oldEnd Then
sel.End -= 1
sel.Start = oldEnd
If sel.Start = sel.End Then
sel.Expand(SELECT_UNIT)
If Recurse Then Return NavNext(False) Else Return False
End If
End If
sel.Document.ActiveWindow.ScrollIntoView(sel.Range)
If sel.End > sel.Start Then Return True
Return False
End Function
Now that we have what to read, we have to get it into a readable
format. Range.Text
will return the unformatted text in Word, but
unfortunately that does not include any kind of format information,
including any automatically incrementing numbers (like section headers or
numbered lists). To preserve most of the formatting, I use
Range.Copy()
to copy the selection to the clipboard, then
Paste
to a RichTextBox control. RichText is a much easier
format to decode than Word, and the intent was to look for simple
formatting like bold, italics, and underlining to modify the voice inflection.
Of course, that part was never written....
But I digress. It turns out the numbering information is available in the RTF
copy, but unfortunately (there's that word again) the numbering always
resets to 1.0.0.0.0. Fortunately, the correct numbering is
available in the "raw text" clipboard copy.
If you want to learn more about programming with Word Automation, see the Reference Documentation on MSDN.
Problem 2: Don't "steal" focus:
Now that I had a way to step through a Word document, I wanted to keep focus
on that Word document while still allowing interaction with the SpokenWord
controls. In order to do this, SpokenWord needs to be non-activating window,
like most tool windows. This is achieved via:
Protected Overrides ReadOnly Property CreateParams() As _
System.Windows.Forms.CreateParams
Get
Const WS_EX_NOACTIVATE As Integer = &H8000000
Dim Result As CreateParams
Result = MyBase.CreateParams
Result.ExStyle = Result.ExStyle Or WS_EX_NOACTIVATE
Return Result
End Get
End Property
As you can see in the comments, it is not perfect, due to a bug in .NET
dropdown lists. It also means the window will not shuffle to the top of the
ZOrder, since this normally happens when the window gets focus. We have two ways
to fix this: 1) make the window TopMost (always on top), or 2) force it to the
top at an appropriate time. I decided to use the latter. When is an appropriate
time? How about when the user clicks on the titlebar ... well, there's no event
for that, so you have to override the WinProc
:
Protected Overrides Sub WndProc(ByRef m As System.Windows.Forms.Message)
If m.Msg = WM_NCLBUTTONDOWN Then
If (m.WParam.ToInt32 = HTCAPTION) Then
Me.TopMost = True
Application.DoEvents()
Me.TopMost = False
End If
End If
MyBase.WndProc(m)
End Sub
Problem 3: "steal" focus!!:
Now that the application no longer steals the focus, a new problem turned up.
Apparently, the Word.Application
is not accessible until
after the Word document has lost focus. Why? Who knows? The fact
remains that a document opened after SpokenWord will remain
invisible to it until it loses focus.
So the easiest solution, therefore, is to "force" focus onto SpokenWord! This
is encapsulated in the form "frmThief", using a series of API calls including
SetForegroundWindow
. All it does is pop up, make itself the active
foreground window, and then go away. The code is supposed to set focus
back to the prior foreground window, but that isn't working. If anyone
can fix it, please let me know how.
Problem 4: Text-To-Speech:
This is actually the easiest problem so far. You could simply grab the
default voice: Dim m_voice As New SpeechLib.SpVoiceClass
and tell
it to speak: m_voice.Speak(Text)
. However, that wouldn't be much
fun, and the "playback" would be less than ideal. SpokenWord processes each
segment of text in several steps (which you can see if you click on "options"):
- Orig: this is actually just the RTF copy from Word, corresponding to the
"original" text.
- Pre: this is the output of
Parser.PreProcess(Text)
, which
basically does some character replacement/expansion.
- Effect: this is the output of the selected effect (if any). All the effects
are public webservices, and are just for fun. I have no control over their
success or failure!
- Post: this is the output of
Parser.PostProcess(Text)
. Not
currently implemented, but the intent is to reformat acronyms into something
more understandable (for instance, SpokenWord reads "XP" as "Ixpee", not "Ex
Pee").
- SAPI: this is the output from
VoiceHelper.SAPIfy(Text)
.
Currently, it just expands certain punctuation into "pauses", but ideally it
would add emphasis and inflection where appropriate. SAPI is just a markup
language (like HTML) for "formatting" text-to-speech. See the SAPI help file for
more information on valid tags.
The last tab is a running log of what is happening.
So the only thing left to explain is how to control the voice. SpokenWord
lets you adjust:
- Voice: this dropdown lets you change the current speaker. See the Installation section to add voices.
- Speed: this slider will let you adjust the speed.
- Pitch: this slider will let you adjust the pitch.
- As well as Pause, Stop, and Repeat (on the toolbar).
Stop warrants special mention, since the SpeechLib doesn't
directly support a stop action. However, it does have Skip
,
which I use to simply skip through the current playback buffer until it
is empty.
Speaking (ahem) of stopping, you may wonder: how does the application know
when to start the next sentence? This is achieved by injecting SAPI
bookmarks into the playback stream. When the SpeechLib encounters a
bookmark during playback, it raises an event. So SpokenWord adds a bookmark to
the end of the text. Then when the bookmark event is raised, it knows the
playback is done, so it moves to the next sentence.
Unfortunately (ugh), SpeechLib occasionally raises some aberrant
bookmarks, specifically if it encounters currency amounts in the text (i.e.
$100). These "unknown" bookmarks are ignored by the app.
Areas of Improvement
There are several areas that can be improved (or at least implemented)
in the code:
- Externalize Pre-process "dictionary" to an XML file for user customization.
- Actually implement the Post-process.
- Parse the RTF text to add SAPI "formatting". For example, if the original
text is bold, wrap the word in <EMPH> (emphasis) tags.
- Better synchronization between playback and navigation. If you look at
frmHover.m_voice_Bookmark
you'll see that I tried to use the
bookmarks to set the Word selection to the correct position, but that proved
more trouble than it was worth. Currently, the selection and playback can get
out of sync, especially inside tables. The only way to re-sync it is to stop
playback and restart.
- Detect and ignore sections like Table of Contents and Indexes (see
WordHelper.IsAnIndexBlock()
).
History
- Initial version posted July 19, 2004