[If the package you downloaded didn't have a setup.bat, please use regsvr32.exe a3dapi.dll to register A3D COM. I'm really sorry about this. In order to keep the package small, I didn't include the necessary executables in src_sdi.zip, so you have to copy the files needed by yourself]
Legal Information
SDI is based on the following products: A3D, ViaVoice TTS, DirectX, MIDI, A3D is owned by Creative Technology Ltd. (previously by Aureal). It is royalty-free. ViaVoice TTS is registered trademark of IBM. DirectX is registered trademark of Microsoft Corporation. MIDI is owned by MMA (MIDI Manufacturers Association).
Introduction
SDI (Sound Device Interface) is designed for Auditory Display, or AUI (Audio User Interface), which has proved to be good complement to GUI (Graphic User Interface) academically. At present, the SDI implements two Sound objects: Speech and Earcon. Speech is for positioning synthesized speech in 3D world, combining ViaVoice TTS and A3D together. Academically, this kind of positioning can help the user to pick out desired information out of two concurrently "displaying" sounds. This is so-called "cocktail party effect". Also, it can improve the memory, adding a location factor to each sound. Earcon, literally, can display earcon easily. Earcon is kind of musical modif, something like icon. (In fact, earcon is short for "Ear Icon". Note that there is another term "Auditory Icon", which is the actual sound about real thing. Earcon is more abstract than Auditory Icon.) You can compose MIDI using a single text string!
Installation
- Make sure you have installed Directx8 or higher. If not, download it from Microsoft Website.
- Download an evaluation version of ViaVoice TTS runtime engine (TTS! Not just "ViaVoice") from IBM's partner. You can choose from Chinese, US English, and UK English.
- [Optional] You can download ViaVoice TTS SDK Evaluation Edition if you want to learn more about ViaVoice TTS.
[Optional] Download free A3D 3.0 SDK. (I'm not sure whether I can include it in my distribution of SDI legally, so I leave it to you.)
- Double-click on setup.bat to register A3D component. (If you have installed A3D 3.0 SDK, this registration process is not compulsory).
- Done. You can check if everything is OK by running the sample program.
Using SDI
- If you want to learn how to use SDI in a minute, please refer to the sample program.
- If you are Chinese or you can speak Chinese, you can turn to NewsEverywhere for a complete online news-reader, including its source code. You can experience two "persons" presenting different news to you at the same time. It may sound unnatural, but it really works.
Because sound can't be rendered in a flash, lots of threads and synchronization objects are used in SDI, even the simplistic "Sleep(XXXX)
" delay. So, the system suffers a lot from the intensive use of threading, and thus becoming very instable. The debug is also a nightmare.
There are some guidelines for you to diminish the number of frustrating problems when you use SDI:
- Keep in mind that SDI is not fully thread-safe.
- ECI (the API provided by ViaVoice TTS) may have a function limit (possibly due to the evaluation version), so one more thread is used for each Speech object. In addition, ECI does not support thread-reentering. So, please make sure there are intervals between operations. (It's not a MediaPlayer anyway ^_^ )
- Get/Set Position/Volume are always safe, because they are handed over to A3D or DirectX internally. You can achieve "sound animation" by using them.
- Earcon is relatively more robust than Speech, except for its restrictions on data. Refer to Play/ParseNotation and PlaySegment to see how to use Earcon correctly.
- Some knowledge about MIDI and MIDI files is highly recommended. Please refer to "MIDI Specification" and "MIDI File Format" for detailed information on them.
For more information about SDI, see sdi_readme.txt and SDI v1.0 document in the zip package.
A brief demo on how to use SDI
First, initialize SDI Environment.
int APIENTRY WinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
......
br=InitializeSDI(NULL, NULL, hInstance, 0);
......
Then you can create an Earcon object for playing music, and set its position and instrument
HEARCON hEarcon1=CreateEarcon(666666L, 1024,
MAKE_TIMESIGNATURE(4,2,0x18,0x08),MAKE_KEYSIGNATURE(0,0));
pos.x=1.0f;
pos.z=1.0f;
br=SetPosition(hEarcon1, &pos);
......
br=SetChannelInstrument(hEarcon1, 0, 0, NULL);
br=PlaySegment(hEarcon1, pData,1);
......
You can compose your own MIDI using a single text string. I hope you are excited about this.
PTSTR psNotation=
"@I00 E42 E42 F42 G42 G42 F42 E42 D42 C42
C42 D42 E42 E42. D43 D41;@I34 E40 G40 C40 E40";
br=PlayNotation(hEarcon2, psNotation);
......
You can even output the composed MIDI to a standard MIDI file.
ParseNotation(hEarcon2,
"@I00 E42 E42 F42 G42 G42 F42 E42 D42
C42 C42 D42 E42 E42. D43 D41;@I34 E40 G40 C40 E40",
data);
HANDLE hFile=CreateFile("test.mid", GENERIC_READ|GENERIC_WRITE, 0,
NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
_ASSERTE(hFile!=INVALID_HANDLE_VALUE);
DWORD dwWritten;
WriteFile(hFile, data, nLen, &dwWritten, NULL);
CloseHandle(hFile);
......
Now create a Speech object, and position it on your left. The positioning is much better than Earcon object.
HSPEECH hSpeech1=CreateSpeech(SDI_LANGUAGE_GENERAL_AMERICAN_ENGLISH,
SDI_VOICE_ADULTMALE2, 0);
Sleep(1000);
pos.x=-1.0f;
pos.z=0;
br=SetPosition(hSpeech1, &pos);
br=PlayText(hSpeech1, "The sound is coming from the left. ");
......
You can animate it by changing its position in a loop.
for(i=-20; i<20; i++)
{
SDIVECTOR animation={(FLOAT)i/20, 0, 1.0f};
SetPosition(hSpeech1, &animation);
Sleep(100);
}
SetVolume(hSpeech1, 0.4f);
......
Start another Speech object, and you can distinguish the two clearly. It is impossible without the necessary positioning. You can try setting both of them right in front of you to see whether you can pick out one of them. :-)
HSPEECH hSpeech2=CreateSpeech(SDI_LANGUAGE_GENERAL_AMERICAN_ENGLISH,
SDI_VOICE_ADULTFEMALE1, 0);
pos.x=-1.0f;
pos.z=0;
SetPosition(hSpeech2, &pos);
PlayText(hSpeech2, "Everything is possible--Li Ling");
......
At last, release SDI environment. This operation will delete all the remaining sound objects.
ReleaseSDI();
return 0;
}
Points of Interest
Possibly due to the evaluation version of ViaVoice TTS SDK, I found out that eciSpeaking
didn't work! So, I added one more thread to each Speech object, only to find I can't achieve actual Pause function on the Speech object, because ECI (API provided by ViaVoice TTS) does not support thread-reentry. For Earcon, I found something really interesting and, well, frustrating indeed. When you play DirectSound Buffer, the play cursor will erase the data it's read. The illustration below describes what the DirectSound Buffer looks like when a specific notification point is encountered. (It took me quite a while to investigate into the DirectSound Buffer ^o^)
From the picture above, you can see that I have only approx. 70ms to lock the buffer and copy the data between two notification points to A3D source buffer, if I want to achieve best positioning effect. The test was disappointing. Because I can't finish processing in 70ms, some notifications are lost and the sound coming out of my earphone is dreadful. (The code in WAIT_OBJECT_0+1
section in SDIEarconThread
was used for this heavy work.) So, I have to fall back on DirectX itself. However, although I specify DS3DALG_HRTF_FULL
for DSound buffers, the positioning is still unsatisfactory. I've thought about creating MS synthesizer object manually and calling IDirectMusicSynth::Render
to pull the synthesized wave data. That way I should have enough time to transfer the data to A3D. But this method requires that I fill the instrument data manually, too! That means I have to study the Downloadable Sounds Specification, and.... you know, I'm just a university student and I have more important things to do. So, if you are interested in this, you can implement a better Earcon. And do let me know when you've finished, OK? ^_^
Enjoy!
A student at Zhejiang University, Zhejiang, China.
Major in Automation.
Now I want to study machine vision and robotics, but I'm really consumed with choices between hardware and software, and between research and engineering.
I'll be glad if you can give some suggestions.