Converting Text-To-Speech and and using mouth motion animation






3.38/5 (21 votes)
Nov 28, 2001
4 min read

212864

6246
This program shows how to convert text to speech and use mouth motion
Introduction
Parts of this code are based on James Matthews' original article Visemes: Representing Mouth Positions.
Figure 1: GUI of application program
This project shows I can convert from Text into Speech that followed by mouth motion. Before you build this project, I suggest you to read my article , "Simple Program for Text to Speech Using SAPI". These steps for building project:
- New Application
- Setting SAPI on project
- Building GUI of project
- Coding
- Testing
1. New Application
Create your application with MFC AppWizard (exe) and type SpeakMouth as project name. Click OK button. In Step 1 of MFC AppWizard, choice Dialog based and click Finish button.
Figure 2 : New application.
Figure 3: Application based on Dialog. Click Finish button
2. Setting SAPI on Project
Setting SAPI on this project like the article , "Simple Program for Text to Speech Using SAPI".
3. Building GUI of project
Figure 4: GUI Model
From figure 4, you can build and design like hat. This ID and properties of GUI component:
No | Component Name | ID | Add variable using ClassWizard |
1 | Picture | IDC_MOUTH_IMG | CStatic :: m_cMouth |
2 | Group Box | IDC_STATIC | - |
3 | Edit Box | IDC_TEXT | CString :: m_sText |
4 | Button | IDC_SPEAK_BTN | - |
Model GUI of this project shows on figure 4. After you build GUI, then you must import files bitmap consist of graphics of mouth (figure 5) in Resource View of project. Choice files bitmap in folder res (figure 6). I have put files bitmap of mouth shape in folder res in this project, and you can find these files bitmap in your Speech SDK too. These files bitmap you must import:
- mic eyes closed.bmp
- mic.bmp
- mic_eyes_narrow.bmp
- mic_mouth_2.bmp
- mic_mouth_3.bmp
- mic_mouth_4.bmp
- mic_mouth_5.bmp
- mic_mouth_6.bmp
- mic_mouth_7.bmp
- mic_mouth_8.bmp
- mic_mouth_9.bmp
- mic_mouth_10.bmp
- mic_mouth_11.bmp
- mic_mouth_12.bmp
- mic_mouth_13.bmp
Figure 5: Import file bitmap into project
Figure 6: Choice file Batmap
After you put all files bitmap into Resource View of project, you also must set properties of these files bitmap ie. assign ID of files bitmap. You can do it with right clicking of your mouse into each file bitmap (figure 7) and you'll see window like figure 8. Here's ID of files bitmap:
File Bitmap | ID |
mic eyes closed.bmp | IDB_MICEYESCLO |
mic.bmp | IDB_MICFULL |
mic_eyes_narrow.bmp | IDB_MICEYESNAR |
mic_mouth_2.bmp | IDB_MICMOUTH2 |
mic_mouth_3.bmp | IDB_MICMOUTH3 |
mic_mouth_4.bmp | IDB_MICMOUTH4 |
mic_mouth_5.bmp | IDB_MICMOUTH5 |
mic_mouth_6.bmp | IDB_MICMOUTH6 |
mic_mouth_7.bmp | IDB_MICMOUTH7 |
mic_mouth_8.bmp | IDB_MICMOUTH8 |
mic_mouth_9.bmp | IDB_MICMOUTH9 |
mic_mouth_10.bmp | IDB_MICMOUTH10 |
mic_mouth_11.bmp | IDB_MICMOUTH11 |
mic_mouth_12.bmp | IDB_MICMOUTH12 |
mic_mouth_13.bmp | IDB_MICMOUTH13 |
Figure 7: Setting properties of file bitmap
Figure 8: Properties of file bitmap
4. Coding
There are three classes in this project (figure 8) ie:
CAboutDlg
CSpeakMouthApp
CSpeakMouthDlg
Figure 9: Classes of project
Call ClassWizard to add variable into class CSpeakMouthDlg
, type
m_sError
as name of variable and its type is CString
, protected
.
Below steps of coding processes:
- In the file StdAfx.h, put these code lines for accessing SAPI COM
- On the class
CSpeakMouthDlg
, add the code lines at the top of the file SpeakMouthDlg.h - On the class
CSpeakMouthDlg
, add a member method (you just click on right mouse, and choice Add Member Method). Method:InitMouthImageList()::BOOL
,InitializationSAPI()::BOOL
andDestroySAPI()::BOOL
, all in protected mode. Beside that, Add variables, here's the code lines: - Now, you can implement your method that you have created
- In the class
CSpeakMouthDlg
, the methodOnInitDialog()
must have the following code for SAPI initialization - In the construction of class
CSpeakMouthDlg::CSpeakMouthDlg(CWnd* pParent=NULL)
- Now, you must create the method to handle the event of converting speech into
mouth motion. You can send the message handler ie:
WM_RECOEVENT
that you have defined asWM_USER + 101
. In file SpeakMouthDlg.h, you type: - Now, you add a method to handle the button click (
IDC_SPEAK_BTN
) event. TypeOnSpeak
as the name of the method, and add these code lines as it's implementation: - So, you can debug and run your program
#include <atlbase.h> extern CComModule _Module; #include <atlcom.h>
#include "sapi.h" #include <sphelper.h> // CONTANTS OF MOUTH #define CHARACTER_WIDTH 128 #define CHARACTER_HEIGHT 128 #define WEYESNAR 14 // eye positions #define WEYESCLO 15Beside that, add code lines on the top of class
CDialog
(file
SpeakMouthDlg.cpp):
#define WM_RECOEVENT WM_USER+101 ///////////////////////////////////////////////////////// // Mouth Mapping Array (from Microsoft's TTSApp Example) const int g_iMapVisemeToImage[22] = { 0, // SP_VISEME_0 = 0, // Silence 11, // SP_VISEME_1, // AE, AX, AH 11, // SP_VISEME_2, // AA 11, // SP_VISEME_3, // AO 10, // SP_VISEME_4, // EY, EH, UH 11, // SP_VISEME_5, // ER 9, // SP_VISEME_6, // y, IY, IH, IX 2, // SP_VISEME_7, // w, UW 13, // SP_VISEME_8, // OW 9, // SP_VISEME_9, // AW 12, // SP_VISEME_10, // OY 11, // SP_VISEME_11, // AY 9, // SP_VISEME_12, // h 3, // SP_VISEME_13, // r 6, // SP_VISEME_14, // l 7, // SP_VISEME_15, // s, z 8, // SP_VISEME_16, // SH, CH, JH, ZH 5, // SP_VISEME_17, // TH, DH 4, // SP_VISEME_18, // f, v 7, // SP_VISEME_19, // d, t, n 9, // SP_VISEME_20, // k, g, NG 1 // SP_VISEME_21, // p, b, m }; ////////////////////////////////////////////////////////////
CString m_sError; BOOL InitMouthImageList(); BOOL DestroySAPI(); BOOL InitializationSAPI(); /////////////////////////////////////////////////////// // Speech API Variables CComPtr<ISpVoice> IpVoice; CImageList m_cMouthList; int m_iMouthBmp; CRect m_cMouthRect; ///////////////////////////////////////////////////////
void CSpeakMouthDlg::OnDestroy() { DestroySAPI(); CDialog::OnDestroy(); } BOOL CSpeakMouthDlg::InitializationSAPI() { if (FAILED(CoInitialize(NULL))) { m_sError=_T("Error intialization COM"); return FALSE; } HRESULT hRes; hRes = IpVoice.CoCreateInstance(CLSID_SpVoice); if (FAILED(hRes)) { m_sError=_T("Error creating voice"); return FALSE; } hRes = IpVoice->SetInterest(SPFEI(SPEI_VISEME), SPFEI(SPEI_VISEME)); if (FAILED(hRes)) { m_sError=_T("Error creating interest...seriously"); return FALSE; } hRes = IpVoice->SetNotifyWindowMessage(m_hWnd, WM_RECOEVENT, 0, 0); if (FAILED(hRes)) { m_sError=_T("Error setting notification window"); return FALSE; } return TRUE; } BOOL CSpeakMouthDlg::DestroySAPI() { if (IpVoice) { IpVoice.Release(); } return TRUE; } BOOL CSpeakMouthDlg::InitMouthImageList() { m_cMouth.GetClientRect(&m_cMouthRect); m_cMouth.ClientToScreen(&m_cMouthRect); ScreenToClient(&m_cMouthRect); CBitmap bmp; m_cMouthList.Create(CHARACTER_WIDTH, CHARACTER_HEIGHT, ILC_COLOR32 | ILC_MASK, 1, 0); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICFULL)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH2)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH3)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH4)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH5)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH6)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH7)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH8)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH9)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH10)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH11)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH12)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICMOUTH13)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICEYESNAR)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); bmp.LoadBitmap(MAKEINTRESOURCE(IDB_MICEYESCLO)); m_cMouthList.Add(&bmp, RGB(255,0,255)); bmp.Detach(); m_cMouthList.SetOverlayImage(1, 1); m_cMouthList.SetOverlayImage(2, 2); m_cMouthList.SetOverlayImage(3, 3); m_cMouthList.SetOverlayImage(4, 4); m_cMouthList.SetOverlayImage(5, 5); m_cMouthList.SetOverlayImage(6, 6); m_cMouthList.SetOverlayImage(7, 7); m_cMouthList.SetOverlayImage(8, 8); m_cMouthList.SetOverlayImage(9, 9); m_cMouthList.SetOverlayImage(10, 10); m_cMouthList.SetOverlayImage(11, 11); m_cMouthList.SetOverlayImage(12, 12); m_cMouthList.SetOverlayImage(13, 13); m_cMouthList.SetOverlayImage(14, WEYESNAR); m_cMouthList.SetOverlayImage(15, WEYESCLO); return TRUE; }
// Set the icon for this dialog. // The framework does this automatically // when the application's main window // is not a dialog SetIcon(m_hIcon, TRUE); // Set big icon SetIcon((HICON)(LoadImage(AfxGetResourceHandle(), MAKEINTRESOURCE(IDR_MAINFRAME), IMAGE_ICON, 16, 16, 0)), FALSE); m_sError=_T(""); if (!InitializationSAPI()) { AfxMessageBox(m_sError); DestroySAPI(); } InitMouthImageList(); return TRUE;
m_iMouthBmp = 0;
// Generated message map functions //{{AFX_MSG(CSpeakMouthDlg) virtual BOOL OnInitDialog(); afx_msg void OnSysCommand(UINT nID, LPARAM lParam); afx_msg void OnPaint(); afx_msg HCURSOR OnQueryDragIcon(); afx_msg void OnDestroy(); afx_msg void OnSpeak(); //}}AFX_MSG // add message handler afx_msg LRESULT OnMouthEvent(WPARAM, LPARAM); DECLARE_MESSAGE_MAP()and in the file SpeakMouthDlg.cpp, add these code lines
BEGIN_MESSAGE_MAP(CSpeakMouthDlg, CDialog) //{{AFX_MSG_MAP(CSpeakMouthDlg) ON_WM_SYSCOMMAND() ON_WM_PAINT() ON_WM_QUERYDRAGICON() ON_WM_DESTROY() ON_BN_CLICKED(IDC_SPEAK_BTN, OnSpeak) //}}AFX_MSG_MAP // add message handler ON_MESSAGE(WM_RECOEVENT, OnMouthEvent) END_MESSAGE_MAP()
This is the implementation of the method OnMouthEvent()
:
LRESULT CSpeakMouthDlg::OnMouthEvent(WPARAM wParam, LPARAM lParam) { CSpEvent event; while (event.GetFrom(IpVoice) == S_OK) { switch (event.eEventId) { case SPEI_VISEME: m_iMouthBmp = g_iMapVisemeToImage[event.Viseme()]; InvalidateRect(m_cMouthRect, false); break; } } return 0; }
void CSpeakMouthDlg::OnSpeak()
{
UpdateData();
USES_CONVERSION;
IpVoice->Speak(m_sText.AllocSysString(), SPF_ASYNC,
NULL);
}
5. Testing
To test your application program, you can run it and type word a into the Edit Box. After that, click the Speak button.
Reference
Speech SDK 5.1 from Microsoft Inc.