Introduction
This article explains how to code text to speech application with SAPI and Microsoft Agent control.
When you finish reading this article, I hope you will know the following:
- How to set the HotKey for an application
- How to use Microsoft Agent Control
- How to use
Speech
API
- Finally, how to use visemes with TTS
What Do We Need To Do This
The sample given above was created in Visual Studio .NET 2003. In order to run the sample, we need a set of APIs installed on our system.
Requirements are:
Sample Application
The sample application given above converts the selected text to speech regardless of where the text is and converts it into speech when a HotKey is pressed. To achieve this task, we need to do the following set of things:
- Setting HotKey
- Referencing the
Speech
API and Microsoft Agent Control
- Setting visemes
Setting HotKey
Setting HotKey for an application means registering the key with the application in order to get the messages from the operating system.
For this, we use DllImports
of user32.dll.
[DllImport("user32.dll", SetLastError = true)]
public static extern bool RegisterHotKey( IntPtr hWnd,
int id,
KeyModifiers fsModifiers,
Keys vk
);
[DllImport("user32.dll", SetLastError = IntPtr hWnd, true)]
public static extern bool UnregisterHotKey(
int id
);
This sample application registers the F9 key as the HotKey in the constructor of the application and unregisters it when disposing.
bool
bool
bcheck = UnregisterHotKey(Handle, HOTKEY_ID);
We are done with registering of HotKey. Now we need to handle the message from the operating system. For that, we override the method WndProc
and check for the corresponding message received:
const int WM_HOTKEY= 0x312;
protected
{
override void WndProc(ref Message msg)
{
}
}
break;base.WndProc(ref msg);
When the HotKey is pressed, we copy the selection by sending keyevents using SendKeys.SendWait("^(c)");
and get the text from the clipboard. Now we have the text which needs to be converted in speech.
Referencing the Speech API and Microsoft Agent Control
In order to use the Speech
API, we need to reference it in our application. This is done as given below.
First we make a reference to the Microsoft Speech
Library 5.0 as follows:
Now, we can find the reference of SpeechLib
:
We have finished referencing the Speech
library. Now we have to reference Microsoft Agent Control.
We can add Microsoft Agent Control directly to the ToolBar by selecting Add/Remove Items from Menu and selecting the COM Component tab in the Customise toolbox dialog. From that dialog, we select the Microsoft Agent Control as follows:
Now, we can just drag and drop the control from the toolbox to our form:
Then we need to use the Speech
library and Microsoft Agent in our code.
First we go with the Speech
library and import SpeechLib
:
using SpeechLib;
Then we create a Voice
object:
voice = new SpVoice();
Now, we need to make it talk. This is done as follows:
voice.Speak("Whatever it is" ,SpeechVoiceSpeakFlags.SVSFlagsAsync);
We make use of SVSFlagAsync
because we are going to use visemes in the sample. This will be explained later.
Setting a Different Voice
If we have installed different voices, then we can make use of it in our sample application.
In order to list out all the available voices in the system, we do this:
foreach
{
Console.Writeline(t.GetAttribute("Name"));
}
(ISpeechObjectToken t in voice.GetVoices("",""))
We can set the voices according to our preference as follows:
voice.Voice =
voice.GetVoices("Name="+VoiceCombo.Items[0].ToString(), "Language=409").Item(0);
Making Use of Agent
Now we can also make use of Microsoft Agent and make the Agent speak for us.
This is done as follows:
axAgent2.Characters.Load("Genie",(Character = axAgent2.Characters["Genie"];
object)"C:/Speaker/chars/GENIE.acs");
Character.LanguageID = 0x409;
Character.Show(
Character.Speak(txt,
Character.Hide();
null);null);
Setting Visemes
Visemes are nothing but images with expression. We have different kinds of expressions related to phonetics. We can have 13 images with different expressions related to phonetics for achieving this. For setting visemes in a SAPI application, we need to have 13 images expressing Silence (ae, aa, ao, ey, er, y, w, ow, aw, oy, ay, h, r, l, s, sh, th, f, d, k, p).
Then, we need to set a viseme handler for Voice
as follows:
voice.Viseme+=
new _ISpeechVoiceEvents_VisemeEventHandler(VisemeEvent);
The VisemeEvent
method sets the different images for the pronounced words:
private
{
pictureBox1.Image= selectedList.Images[i] ;
}
void VisemeEvent(int StreamNo,object StreamPos, int duration,
SpeechLib.SpeechVisemeType nextVisemetype,
SpeechLib.SpeechVisemeFeature visemeFeature,
SpeechLib.SpeechVisemeType currentVisemetype)
int i= int.Parse(currentVisemetype.ToString().Replace("SVP_",""));
I was not able to find perfect images for visemes. So I tried to create my own visemes.
Conclusion
I hope I have covered the basic things about SAPI and Microsoft Agent Control. Please feel free to email me at beniton@gmail.com if you find any problems or have suggestions for this article. Thank you!
Please don't forget to rate this article.
switch (msg.Msg)case WM_HOTKEY:
bcheck = RegisterHotKey(Handle, HOTKEY_ID, KeyModifiers.None, Keys.F9);