SDI (Sound Device Interface)--A library for Auditory Display

Dong Lin

Rate me:

4.60/5 (3 votes)

2 May 20046 min read

110.9K

3.7K

A GDI-like API for 3D positioning of speech, and MIDI composition using a single string.

[If the package you downloaded didn't have a setup.bat, please use regsvr32.exe a3dapi.dll to register A3D COM. I'm really sorry about this. In order to keep the package small, I didn't include the necessary executables in src_sdi.zip, so you have to copy the files needed by yourself]

Legal Information

SDI is based on the following products: A3D, ViaVoice TTS, DirectX, MIDI, A3D is owned by Creative Technology Ltd. (previously by Aureal). It is royalty-free. ViaVoice TTS is registered trademark of IBM. DirectX is registered trademark of Microsoft Corporation. MIDI is owned by MMA (MIDI Manufacturers Association).

Introduction

SDI (Sound Device Interface) is designed for Auditory Display, or AUI (Audio User Interface), which has proved to be good complement to GUI (Graphic User Interface) academically. At present, the SDI implements two Sound objects: Speech and Earcon. Speech is for positioning synthesized speech in 3D world, combining ViaVoice TTS and A3D together. Academically, this kind of positioning can help the user to pick out desired information out of two concurrently "displaying" sounds. This is so-called "cocktail party effect". Also, it can improve the memory, adding a location factor to each sound. Earcon, literally, can display earcon easily. Earcon is kind of musical modif, something like icon. (In fact, earcon is short for "Ear Icon". Note that there is another term "Auditory Icon", which is the actual sound about real thing. Earcon is more abstract than Auditory Icon.) You can compose MIDI using a single text string!

Installation

Make sure you have installed Directx8 or higher. If not, download it from Microsoft Website.
Download an evaluation version of ViaVoice TTS runtime engine (TTS! Not just "ViaVoice") from IBM's partner. You can choose from Chinese, US English, and UK English.
[Optional] You can download ViaVoice TTS SDK Evaluation Edition if you want to learn more about ViaVoice TTS.
[Optional] Download free A3D 3.0 SDK. (I'm not sure whether I can include it in my distribution of SDI legally, so I leave it to you.)
Double-click on setup.bat to register A3D component. (If you have installed A3D 3.0 SDK, this registration process is not compulsory).
Done. You can check if everything is OK by running the sample program.

Using SDI

If you want to learn how to use SDI in a minute, please refer to the sample program.
If you are Chinese or you can speak Chinese, you can turn to NewsEverywhere for a complete online news-reader, including its source code. You can experience two "persons" presenting different news to you at the same time. It may sound unnatural, but it really works.

Because sound can't be rendered in a flash, lots of threads and synchronization objects are used in SDI, even the simplistic "Sleep(XXXX)" delay. So, the system suffers a lot from the intensive use of threading, and thus becoming very instable. The debug is also a nightmare.

There are some guidelines for you to diminish the number of frustrating problems when you use SDI:

Keep in mind that SDI is not fully thread-safe.
ECI (the API provided by ViaVoice TTS) may have a function limit (possibly due to the evaluation version), so one more thread is used for each Speech object. In addition, ECI does not support thread-reentering. So, please make sure there are intervals between operations. (It's not a MediaPlayer anyway ^_^ )
Get/Set Position/Volume are always safe, because they are handed over to A3D or DirectX internally. You can achieve "sound animation" by using them.
Earcon is relatively more robust than Speech, except for its restrictions on data. Refer to Play/ParseNotation and PlaySegment to see how to use Earcon correctly.
Some knowledge about MIDI and MIDI files is highly recommended. Please refer to "MIDI Specification" and "MIDI File Format" for detailed information on them.

For more information about SDI, see sdi_readme.txt and SDI v1.0 document in the zip package.

A brief demo on how to use SDI

First, initialize SDI Environment.

int APIENTRY WinMain(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR     lpCmdLine,
                     int       nCmdShow)
{
    ......
    // Initialize SDI Environment
     br=InitializeSDI(NULL, NULL, hInstance, 0);
    ......

Then you can create an Earcon object for playing music, and set its position and instrument

/////////////////////////////////////////////////////////////////
// Test 1: Earcon Object

HEARCON hEarcon1=CreateEarcon(666666L, 1024, // Get tempo info from MIDI file
    MAKE_TIMESIGNATURE(4,2,0x18,0x08),MAKE_KEYSIGNATURE(0,0));

// Set position
pos.x=1.0f;
pos.z=1.0f;
br=SetPosition(hEarcon1, &pos);
// the midi will be positioned on you right.
// It's not perfect, though.

// Load MIDI data
......

// Select Instrument, you can change the instrument as you like
//                                  ↓change it
br=SetChannelInstrument(hEarcon1, 0, 0, NULL);
// channel 0: Acoustic Grand Piano

// Play
br=PlaySegment(hEarcon1, pData,1);
// pData contains the MIDI track data.
......

You can compose your own MIDI using a single text string. I hope you are excited about this.

// This is the Carol composed by Beethoven,
// and now by a single text string.
PTSTR psNotation=
    "@I00 E42 E42 F42 G42 G42 F42 E42 D42 C42
    C42 D42 E42 E42. D43 D41;@I34 E40 G40 C40 E40";
br=PlayNotation(hEarcon2, psNotation);
......

You can even output the composed MIDI to a standard MIDI file.

// Fill the buffer
ParseNotation(hEarcon2,
    "@I00 E42 E42 F42 G42 G42 F42 E42 D42
    C42 C42 D42 E42 E42. D43 D41;@I34 E40 G40 C40 E40",
    data);

// Write to a file
HANDLE hFile=CreateFile("test.mid", GENERIC_READ|GENERIC_WRITE, 0,
                NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
_ASSERTE(hFile!=INVALID_HANDLE_VALUE);
DWORD dwWritten;
WriteFile(hFile, data, nLen, &dwWritten, NULL);

CloseHandle(hFile);
......

Now create a Speech object, and position it on your left. The positioning is much better than Earcon object.

/////////////////////////////////////////////////////////////////
// Test 2 : Speech Object
HSPEECH hSpeech1=CreateSpeech(SDI_LANGUAGE_GENERAL_AMERICAN_ENGLISH,
                                SDI_VOICE_ADULTMALE2, 0);
Sleep(1000);    // At present, some delay is highly recommanded

// Set position
pos.x=-1.0f;    // positioned at the left
pos.z=0;
br=SetPosition(hSpeech1, &pos);

// Play a sentence.
br=PlayText(hSpeech1, "The sound is coming from the left. ");

......

You can animate it by changing its position in a loop.

// Move the speech object from left to right,
// and turn down the volume a little bit
for(i=-20; i<20; i++)
{
    SDIVECTOR animation={(FLOAT)i/20, 0, 1.0f};
    SetPosition(hSpeech1, &animation);
    Sleep(100);
}
SetVolume(hSpeech1, 0.4f);
......

Start another Speech object, and you can distinguish the two clearly. It is impossible without the necessary positioning. You can try setting both of them right in front of you to see whether you can pick out one of them. :-)

// Start another Speech Objects
HSPEECH hSpeech2=CreateSpeech(SDI_LANGUAGE_GENERAL_AMERICAN_ENGLISH,
                              SDI_VOICE_ADULTFEMALE1, 0);
pos.x=-1.0f;
pos.z=0;
SetPosition(hSpeech2, &pos);

// You can feel two sound sources coming from different direction
PlayText(hSpeech2, "Everything is possible--Li Ling");
......

At last, release SDI environment. This operation will delete all the remaining sound objects.

    // Release SDI Environment when you left.
    ReleaseSDI();
    // Traverse the object list to delete hSpeech2 and hEarcon2

    return 0;
}

Points of Interest

Possibly due to the evaluation version of ViaVoice TTS SDK, I found out that eciSpeaking didn't work! So, I added one more thread to each Speech object, only to find I can't achieve actual Pause function on the Speech object, because ECI (API provided by ViaVoice TTS) does not support thread-reentry. For Earcon, I found something really interesting and, well, frustrating indeed. When you play DirectSound Buffer, the play cursor will erase the data it's read. The illustration below describes what the DirectSound Buffer looks like when a specific notification point is encountered. (It took me quite a while to investigate into the DirectSound Buffer ^o^)

From the picture above, you can see that I have only approx. 70ms to lock the buffer and copy the data between two notification points to A3D source buffer, if I want to achieve best positioning effect. The test was disappointing. Because I can't finish processing in 70ms, some notifications are lost and the sound coming out of my earphone is dreadful. (The code in WAIT_OBJECT_0+1 section in SDIEarconThread was used for this heavy work.) So, I have to fall back on DirectX itself. However, although I specify DS3DALG_HRTF_FULL for DSound buffers, the positioning is still unsatisfactory. I've thought about creating MS synthesizer object manually and calling IDirectMusicSynth::Render to pull the synthesized wave data. That way I should have enough time to transfer the data to A3D. But this method requires that I fill the instrument data manually, too! That means I have to study the Downloadable Sounds Specification, and.... you know, I'm just a university student and I have more important things to do. So, if you are interested in this, you can implement a better Earcon. And do let me know when you've finished, OK? ^_^

Enjoy!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Dong Lin

Engineer

China

A student at Zhejiang University, Zhejiang, China.
Major in Automation.
Now I want to study machine vision and robotics, but I'm really consumed with choices between hardware and software, and between research and engineering.
I'll be glad if you can give some suggestions.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.