Using DynamicSoundEffectInstance to Create Sounds at Runtime on Windows Phone 7

Joel Ivory Johnson

4.93/5 (7 votes)

Oct 7, 2010

CPOL

10 min read

42040

A brief introduction to the DynamicSoundEffectInstance class for Windows Phone 7 and a demonstration on how to communicate with the class and control pitch.

Download the code - 93.1 KB

After an Atlanta Silverlight Users meeting, I was eating with a couple of other MVPs and we were talking about the things we were doing and would like to do with Windows Phone 7. I had mentioned that I would like to have direct access to the sound buffer used in XNA. James Ashley immediately responded with "DynamicSoundEffectInstance!" At the time, James had never used it, and I had just discovered it, so I needed to get some more information on how it works. So that night, I stayed up a little later than usual so that I could figure out how it works. With the documentation for the method still being in early form, I didn't quite find everything that I wanted to know but was able to figure it out.

In writing this, I'm going to assume that you know a bit about the mechanics of how sound and speakers work. If not, you'll want to read the Wikipedia article on Digital to Analog converters.

In this post, I simply want to get to the point of being able to play a tone and control its frequency. From a high level view, this is what we will need to do:

Create a few byte buffers that will hold the next part of the sound to be played
Populate one of the byte buffers with the wave form to be played
Give the buffer to a DynamicSoundEffectInstance
Tell the SoundEffectInstance to start playing
In response to the BufferNeeded event, populate the next buffer and submit it
Go to step 5

Now to convert those steps into something more concrete. Let's start with allocating the buffers.

Populate the Buffer

The size of the buffer you choose is largely going to be driven by what type of latency you want your sounds to have and the desired quality of the sound you are generating. In general, low latency is good as there is less of a time difference from when your program generates a sound to when the user hears it. If you made a program to simulate a piano, you would want low latency so that the user perceives that the device is playing sound as soon as they press a key on the screen. Naturally, you will also want high quality. But there are trade-offs as you aim for higher quality and lower latency just as there are trade-offs in aiming for low quality and high latency.

To produce higher quality sounds, you will need a higher sample rate. If you raise the sample rate used to play back a sound, then you will either need to increase the size of your buffer (so more memory is being consumed) or you will need to populate and supply smaller buffers more frequently (so more CPU time is being consumed). While lower quality uses less memory and less CPU time, the negative part is evident; your program won't sound as good. If you were aiming for lower latency, you will need to use smaller buffers but that will also mean that the DynamicSoundEffectInstance is requesting new buffers more often (once again more CPU time). My suggestion for the quality of a sound is to aim for something that is good enough. Don't start off at the 48KHz sample rate. Start instead at around 22KHz or lower and see how well that works for you. As for latency with an XNA program, aim for a latency that is determined by the FPS of your game. If your game is made to run at 30 frames per second, then make buffers that are big enough to play 1/30 seconds of sound. A sound can also be in stereo or mono. It goes without saying that twice the memory is needed to generate a sound in stereo than mono.

Let's for now assume that we are creating a DynamicSoundEffectInstance with a sample rate of 22KHz in mono. We could instantiate one with the following:

var dynamicSoundEffectInstance = new DynamicSoundEffectInstance(22000,AudioChannels.Mono);

We can calculate the size of the buffers in one of two ways. The DynamicSoundEffectInstance always play 16-bit sound samples (2 bytes). If I wanted to be able to play 1/30^th seconds of sound at a 22KHz sample rate, the number of bytes needed for this buffer would be 22000*(1/30)*2*1 = 1466. The last two numbers in the equation (2*1) are the number of bytes in a sample multiplied by the number of channels to be played. Were I playing a stereo sample, the second number would have been 2 instead of 1. I could have instead asked the DynamicSoundEffectInstance to calculate the size of the needed buffer.

22000*(1d/30d)*dynamicSoundEffectInstance.GetSampleSizeInBytes
	(TimeSpan.FromSeconds(1d/30d))

Populate the Buffer

The data that you put into the buffer is derived from the sound that you are playing. If you've been astutely reading, you may have noticed that I've stated that DynamicSoundEffectInstance consumes an array of bytes (8-bits) but the audio must be composed of 16-bit samples. In C++, one might just pass an array to whatever held the data. It would let you do that, even if doing that made no sense. In the C# language, one can also do that by wrapping his/her code in an unsafe block. But many feel that code wrapped in unsafe blocks is potentially not safe (I wonder why). Silverlight won't let you do such things. So it's necessary to convert your 16-bit data to byte data using other means. There's a method available for doing so, but I'll also describe how to do so manually.

A 16-bit (two byte) number has a high order byte and a low order byte. High and Low order could also be taken to be more significant and less significant. In the decimal number 39, the three is in a more significant position than the nine; it has more of an impact on the final value. The same concept transfers to numbers composed of bytes. Our bytes need to have little endian ordering. The low order byte will need to be placed in our array before the high order byte. The low order byte can be singled out with a bit mask. The high order byte with bit shifting.

byte lowOrder = (byte)(SomeNumber & 0xFF);
byte highOrder = (byte)(SomeNumber >> 0x08);

Now that you know what needs to be done, here's the utility method that will essentially do the same thing.

Buffer.BlockCopy(
                   sourceBuffer
                 , sourceStartIndex
                 , destinationBuffer
                 , destinationStartIndex
                 , ByteCount)

The sourceBuffer element in this case would be the array of 16-bit integers. The destinationBuffer would be the destination byte buffer. Two things to note. First, the destination buffer must have twice the number of elements as the source buffer (since bytes are half the size of short integers). Second, the last argument is the number of bytes to be copied and not the number of elements. If you get this wrong, you'll either get an IndexOutOfRange exception or something that sounds pretty bad.

Start Playing the Sound

Once the DynamicSoundEffectInstance has a buffer, I call Play to get things rolling.

Submitting the Buffers to the DynamicSoundEffectInstance

The DynamicSoundEffectInstance has an event called BufferNeeded that will be called when the object is ready for more sound data to be played. If you are making an XNA program, you may want to avoid the object getting to the point where it needs to call this. You can reduce overhead by feeding the class data at the same rate at which it is consuming it. This can be easily done by making the buffers big enough to play as much sound as can be played in one cycle of your game loop. If you are making a Silverlight application, you'll be making use of this event. From what I've found, the DynamicSoundEffectInstance class will hold up to two buffers; playing from one, and has the other in place to be played next. So I prefer to make three buffers so that I have a third buffer into which I can render the next block of sound. When the BufferNeeded event is called, it populates and passes the buffer through the SubmitBuffer method. I use the same buffers in a round robin fashion.

FrameworkDispatcher.Update()

This is only needed if you are using the class from within Silverlight. FrameworkDispatcher.Update will need to be called at least once before playing your sound and must continue to be called periodically. The Windows Phone documentation already walks one through a class that will do this. Take a look at this article to see how this class works.

My Sound Function and Controlling the Frequency

While the sound data passed to DynamicSoundEffectInstance must be signed 16-bit integers, I wanted to keep my sound generating functions decoupled from this constraint and also decouple from the specific frequency that was being played. I achieved these goals in a class I've created named SoundManager. While SoundManager contains the code to generate a sin wave the actual sound function used is assigned to the property SoundFunction. One only needs to assign a different function to this property to generate a different sound.

To decouple from the function from the data format, I've created my program so that it expects the sound function to return its data as a double. The value range returned by the sound function should be in the range of [-1..1]. I'm not doing range checking to avoid the overhead (so if you use my code, it's up to you to make sure your code behaves). The function consumes two parameters: a double value to represent time and an integer value to represent channel. Channel would presumably be 0 for the left channel and 1 for the right channel. For generating mono sound, this parameter can be ignored. The time parameter indicates which part of the cycle of a sound wave is being requested. The values returned by the sound function from the 0 to 1 would be for one cycle of the sound. From 1 to 2 would be for the second value of the sound, and so on. Since the time parameter is being used to represent the position within a cycle instead of actual time, the sound function is insulated from the actual frequency being generated. I can change the frequency of the sound being played by increasing or decreasing the intervals between the time values passed. Shorter intervals will lead to lower frequencies. Larger intervals will lead to higher frequencies. Note that the highest frequency that you can create is going to be no higher than half the sample rate. So with a 22 KHz sample rate, you would only be able to generate sounds with frequency components as high as 11 KHz. Given that most sounds we hear are a complex mixture of sound components, keep in mind that there may be some frequency components higher than what may be recognized as the dominant frequency. So playing such sounds at a high frequency could result in some of the higher frequency components being stripped out. You can find more information on this concept under the topic Nyquist Rate.

The method FillBuffer will call this function for each sample that it needs to fill the next buffer.

double MySin(double time, int channel) { return Math.Sin(time*Math.PI*2.0d); }

The code for filling the sound buffer is as follows:

void FillBuffer()
{
    if (SoundFunction == null)
        throw new NullReferenceException("SoundFunction");
    byte[] destinationBuffer = _audioBufferList[CurrentFillBufferIndex];
    if (++CurrentFillBufferIndex >= _audioBufferList.Length)
        CurrentFillBufferIndex = 0;
    short result;
    int currentBufferIndex = 0;
    int deltaBufferIndex = ChannelCount * BytesPerSample;
    
    for (int i = 0; i < destinationBuffer.Length / (ChannelCount * BytesPerSample); ++i)
    {
        int baseIndex = ChannelCount * BytesPerSample * i;
        //currentBufferIndex = 0;
        for (int c = 0; c < ChannelCount; ++c)
        {
            result = (short)(MaxWaveMagnitude * SoundFunction(_Time, c));
            
            #if(MANUAL_COPY)
            destinationBuffer[baseIndex + currentBufferIndex] = (byte)(0xFF & result);
            destinationBuffer[baseIndex + currentBufferIndex] = 
					(byte)(0xFF & (result >> 0x8));
            currentBufferIndex += deltaBufferIndex;
            #else
            _renderingBuffer[i * ChannelCount + c] = result;
            #endif                                        
        }
        _Time += _deltaTime;
    }
    #if(!MANUAL_COPY)
    Buffer.BlockCopy(_renderingBuffer, 0, destinationBuffer, 
			0, _renderingBuffer.Length*sizeof(short));
    #endif
    OnPropertyChanged("Time");
    OnPropertyChanged("PendingBufferCount");
}

If you deploy the code attached to this entry, you'll have a program that can play a Sin wave. Pretty boring, I know. But I wanted to keep the sound that I was playing in this first mention of DynamicSoundEffectInstance simple. The next time I mention it, I want to talk about generating more complex sounds and will probably say little about using the class itself outside of referencing this entry.

CodeProject