Rendering the audio captured by a Windows Phone device

Mattias Larsson

Rate me:

4.92/5 (10 votes)

12 Mar 2012CPOL5 min read

20.2K

399

A small app that displays the audio captured by the microphone of a Windows Phone device and displays it as a continous waveform on the screen using the XNA framework.

Introduction

This small app displays the audio captured by the microphone of a Windows Phone device and displays it as a continous waveform on the screen using the XNA framework.

A slightly modified version of the app (changing color when touching the screen) can be found in the Windows Phone Marketplace.

Background

In order to capture the audio on a Windows Phone device, you need an instance to the default microphone (Microphone.Default), decide how often you want samples using the BufferDuration-property and hook up the BufferReady-event. Then you control the capturing with the Start() and Stop() methods.

The microphone is giving you samples at a fixed rate of 16 000 Hz, i.e. 16 000 samples per second. There is a property SampleRate that will tell this value. This means that you won't be able to capture audio of higher frequency than 8000 Hz (without distortion) according to the sampling theorem.

You are also limited when it comes to choose the value for the BufferDuration-property; it must be between 0.1 and 1 seconds (100 - 1000 ms) in 10ms-steps. This means that you must choose a value of 100, 110, 120, ..., 990, 1000 milliseconds.

When the microphone event BufferReady is fired, you should call the microphone.GetData(myBuffer)-method, in order to copy the samples from the microphone's internal buffer to a buffer that belongs to you. The recorded audio comes in the form of a byte-array, but since the samples are actually signed 16-bits integers (i.e. an integer in the range of -32'768 ... 32'767), you will probably need to do some convertion before you can process them.

Using the code

The way this application works is keeping a fixed number of narrow images, here called "(image) slices", arranged in a linked list. The images are rendered on the screen and smoothly moved from the right to the left. When the left-most slice has gone off the screen, it is moved to the far right (still outside the screen) in order to create the illusion of an unlimited number of images.

Each slice holds the rendered samples from the content of one microphone buffer. When the buffer is filled by the microphone mechanism, the rightmost slice (outside of the screen) is rendered with these new samples and started to be moved inwards the screen.

The speed of how fast the slices are moving across the screen is correlated to the duration of the buffer in such a way that the slices are moved a total of "one slice width" during the time the microphone is capturing the next buffer.

Since the buffer of captured audio is rendered as graphic on a texture as soon it is received, there is no reason to keep any old buffer data. Therefore the application only keeps one buffer in memory which is reused over and over.

A flag is set each time the microphone buffer is ready. Since the BufferReady event is fired on the main thread, there is no need for any lock-mechanism.

In the Update()-method of the XNA app, the flag is checked whether new data has arrived, and if so, the slice in line is drawn. In the Draw()-method, the slices are drawn on the screen and slightly moved as time goes by.

Here's a description of the structure of the main Game-class.

Some constants:

// The size of the screen.
private const int LandscapeWidth = 800;
private const int LandscapeHeight = 480;
 
// The number of milliseconds per time slice.
private const int SliceMilliseconds = 100;

Fields regarding the microphone and the captured data:

// The microphone and the sample data.
private readonly Microphone microphone;
private readonly byte[] microphoneData;

// The time it takes for a sample to pass over the screen.
private readonly TimeSpan screenMilliseconds = TimeSpan.FromSeconds(5);

Choose a color that is almost transparent (the last of the four parameters; it's the red, green, blue and alpha-component of the color). The reason is that many samples are drawn on top of each other, and keeping each individual sample almost see-through makes an interesting visual effect.

// The color of the samples.
private readonly Color sampleColor = new Color(0.4f, 0.9f, 0.2f, 0.07f);

The drawing classes. The white pixel texture is doing all the drawing.

// The sprite batch and a white single-pixel texture for drawing.
private SpriteBatch spriteBatch;
private Texture2D whitePixelTexture;

The size of each image slice.

// The size in pixels of one time slice.
private int imageSliceWidth;
private int imageSliceHeight;

There's no need to keep a reference to the linked list itself; just the first and last link. These links keeps references to their neighbors. The currentImageSlice is the one to draw on the next time.

// The first, last and current image slices.
private LinkedListNode<RenderTarget2D> firstImageSlice;
private LinkedListNode<RenderTarget2D> lastImageSlice;
private LinkedListNode<RenderTarget2D> currentImageSlice;

The speed of the slices moving across the screen.

// The current speed of the samples.
private float pixelsPerSeconds;

In order to know how far the current samples should be moved, the application must keep track of when they appeared.

// The time in seconds when the current microphone data appeared.
private float microphoneDataAppearedAtSeconds;

The signal that tells the Update()-method that there is new data to handle.

// A flag telling whether new data has been read.
private bool hasNewMicrophoneData;

The density of samples per pixel.

// The number of samples squeezed into the width of one pixel.
private int samplesPerPixel;

Here's the constructor. In it the graphics mode is set and the microphone is wired up and asked to start listening.

public Waveform()
{
    // Set the screen mode.
    new GraphicsDeviceManager(this)
    {
        PreferredBackBufferWidth = LandscapeWidth,
        PreferredBackBufferHeight = LandscapeHeight,
        IsFullScreen = true,
        SupportedOrientations =
            DisplayOrientation.Portrait |
            DisplayOrientation.LandscapeLeft |
            DisplayOrientation.LandscapeRight
    };

    // Standard setup.
    Content.RootDirectory = "Content";
    TargetElapsedTime = TimeSpan.FromTicks(333333);
    InactiveSleepTime = TimeSpan.FromSeconds(1);

    // Refer to the default microphone and hook the BufferReady-event.
    microphone = Microphone.Default;
    microphone.BufferReady += MicrophoneBufferReady;

    // Set the buffer duration to the length of one slice.
    microphone.BufferDuration = TimeSpan.FromMilliseconds(SliceMilliseconds);

    // Calculate the size in bytes of the sound buffer and create the byte array.
    var microphoneDataLength = microphone.GetSampleSizeInBytes(microphone.BufferDuration);
    microphoneData = new byte[microphoneDataLength];

    // Start listening.
    microphone.Start();
}

In the XNA's LoadContent nothing is actually loaded since the app is not dependent on any predrawn images. The SpriteBatch is created, the white pixel texture is generated and the image slices are initialized (as black images).

protected override void LoadContent()
{
    // Create a SpriteBatch for drawing.
    spriteBatch = new SpriteBatch(GraphicsDevice);

    // Create a 1x1 texture containing a white pixel.
    whitePixelTexture = new Texture2D(GraphicsDevice, 1, 1);
    var white = new[] { Color.White };
    whitePixelTexture.SetData(white);

    // Create the image slices.
    CreateSliceImages();
}

The CreateSliceImages is calculating how many slices that are needed to cover the entire screen (plus two so there's room for movement). In the end of the method the regular RenderSamples-method is called in order to initial all the images. Since there is no data yet (all samples are zero) it will generate black images.

private void CreateSliceImages()
{
    // Calculate how many slices that fits the screen (rounding upwards).
    var imageSlicesOnScreenCount = (int)Math.Ceiling(screenMilliseconds.TotalMilliseconds / SliceMilliseconds);

    // Calculate the width of each slice.
    imageSliceWidth = (int)Math.Ceiling((float)LandscapeWidth / imageSlicesOnScreenCount);

    // Set the height of each slice to the largest screen size
    // (this way the full height is utilized in Portrait mode without stretching)
    imageSliceHeight = LandscapeWidth;

    // Create a linked list with the required number of slices, plus two
    // so that there's room for scrolling off-screen a bit.
    var imageSlices = new LinkedList<RenderTarget2D>();
    for (var i = 0; i < imageSlicesOnScreenCount + 2; i++)
    {
        var imageSlice = new RenderTarget2D(GraphicsDevice, imageSliceWidth, imageSliceHeight);
        imageSlices.AddLast(imageSlice);
    }

    // Reference the first, last and current slice.
    firstImageSlice = imageSlices.First;
    lastImageSlice = imageSlices.Last;
    currentImageSlice = imageSlices.Last;

    // Calculate the speed of the pixels for an image slice.
    pixelsPerSeconds = imageSliceWidth / (SliceMilliseconds / 1000f);

    // Since the byte-array buffer really holds 16-bit samples, the actual
    // number of samples in one buffer is the number of bytes divided by two.
    var sampleCount = microphoneData.Length / 2;

    // Calculate how many samples that should be squeezed in per pixel (width).
    samplesPerPixel = (int)Math.Ceiling((float)sampleCount / imageSliceWidth);

    // Iterate through all the image slices and render with the empty microphone buffer.
    var slice = firstImageSlice;
    while (slice != null)
    {
        RenderSamples(slice.Value);
        slice = slice.Next;
    }
}

The XNA's UnloadContent is just cleaning up what the LoadContent created.

protected override void UnloadContent()
{
    // Dispose the SpriteBatch.
    spriteBatch.Dispose();

    // Dispose the white pixel.
    whitePixelTexture.Dispose();

    // Dispose all the image slices.
    var slice = firstImageSlice;
    while (slice != null)
    {
        slice.Value.Dispose();
        slice = slice.Next;
    }
}

The event handler to the microphone's BufferReady-event. It copies the data from the microphone buffer and raises the flag that new data has arrived.

private void MicrophoneBufferReady(object sender, EventArgs e)
{
    // New microphone data can now be fetched from its buffer.

    // Copy the samples from the microphone buffer to our buffer.
    microphone.GetData(microphoneData);

    // Raise the flag that new data has come.
    hasNewMicrophoneData = true;
}

The XNA's Update method checks the phone's Back-button to see if it's time to quit. After that it checks the flag to see if new data has been recorded. If so, the new samples are rendered by calling the RenderSamles-method.

 protected override void Update(GameTime gameTime)
{
    // Exit the app if the user presses the back-button.
    if (GamePad.GetState(PlayerIndex.One).Buttons.Back == ButtonState.Pressed)
    {
        Exit();
    }

    // If new data has been captured, a new slice should be drawn.
    if (hasNewMicrophoneData)
    {
        // Reset the flag.
        hasNewMicrophoneData = false;

        // Express the current point in time as "seconds passed since start of app".
        var currentSeconds = (float)gameTime.TotalGameTime.TotalSeconds;

        // Remember the current time as "when the mic data appeared".
        microphoneDataAppearedAtSeconds = currentSeconds;

        // Render the new samples on the current image slice.
        RenderSamples(currentImageSlice.Value);

        // Select the next image slice as the new "current".
        currentImageSlice = currentImageSlice.Next ?? firstImageSlice;
    }

    base.Update(gameTime);
}

The XNA's Draw-method takes care of drawing the rendered slices. It handles the two screen orientation modes; landscape and portrait, by scaling the images accordingly. If it is landscape mode the height of the images are squeezed and if it is portrait mode the width of the images are squeezed.

When all is setup, the method iterates through the images and render them one-by-one on the screen, adjusted a bit along the X-axis to make up for the time that has passed.

protected override void Draw(GameTime gameTime)
{
    // Clear the device. (Actually unnecessary since the whole screen will be painted below.)
    GraphicsDevice.Clear(Color.Black);

    // Calculate the "screen width-scale", to allow the app to be drawn both in
    // Landscape and Portrait mode.
    // In Landscape mode, the screenWidthScale will be 1.0 (i.e. original scale)
    // In Portrait mode, the screen must be squeezed.
    var screenWidthScale = (float)GraphicsDevice.Viewport.Width / LandscapeWidth;

    // Calculate the scaled width of one slice.
    var scaledWidth = (int)Math.Ceiling(imageSliceWidth * screenWidthScale);

    // Express the current point in time as "seconds passed since start of app".
    var currentSeconds = (float)gameTime.TotalGameTime.TotalSeconds;

    // Calculate how many seconds that has passed since the current microphone data was captured.
    var secondsPassed = currentSeconds - microphoneDataAppearedAtSeconds;

    // For a smooth move of the pixels, calculate the offset of the current microphone data
    // (where the offset is zero at the time of the new data arrived, and then growing up
    // one full width of a slice.
    var drawOffsetX = secondsPassed * pixelsPerSeconds;

    // Since it is not certain that the next microphone data will come before the current
    // slice has moved its full distance, the offset needs to be truncated so it doesn't
    // move too far.
    if (drawOffsetX > scaledWidth)
    {
        drawOffsetX = scaledWidth;
    }

    try
    {
        // Start draw the slices
        spriteBatch.Begin();

        // Start with one slice before the current one, wrap if necessary.
        var imageSlice = currentImageSlice.Previous ?? lastImageSlice;

        // Prepare the rectangle to draw within, starting with the newest
        // slice far to the right of the screen (a bit outside, even).
        var destinationRectangle = new Rectangle(
            (int)(GraphicsDevice.Viewport.Width + scaledWidth - drawOffsetX),
            0,
            scaledWidth,
            GraphicsDevice.Viewport.Height);

        // Draw the slices in the linked list one by one from the right
        // to the left (from the newest sample to the oldest)
        // and move the destinationRectangle one slice-width at a time
        // until the full screen is covered.
        while (destinationRectangle.X > -scaledWidth)
        {
            // Draw the current image slice.
            spriteBatch.Draw(imageSlice.Value, destinationRectangle, Color.White);

            // Move the destinationRectangle one step to the left.
            destinationRectangle.X -= scaledWidth;

            // Select the previous image slice to draw next time, wrap if necessary.
            imageSlice = imageSlice.Previous ?? lastImageSlice;
        }
    }
    finally
    {
        // Drawing done.
        spriteBatch.End();
    }

    base.Draw(gameTime);
}

The RenderSamples is taking a RenderTarget2D as an argument, which is the texture to be drawn on. The routine iterates through the samples and render them one by one.

private void RenderSamples(RenderTarget2D target)
{
    try
    {
        // Redirect the drawing to the given target.
        GraphicsDevice.SetRenderTarget(target);

        // Clear the target slice.
        GraphicsDevice.Clear(Color.Black);

        // Begin to draw. Use Additive for an interesting effect.
        spriteBatch.Begin(SpriteSortMode.Deferred, BlendState.Additive);

        // The X-variable points out to which column of pixels to
        // draw on.
        var x = 0;

        // Since the byte-array buffer really holds 16-bit samples, the actual
        // number of samples in one buffer is the number of bytes divided by two.
        var sampleCount = microphoneData.Length / 2;

        // The index of the current sample in the microphone buffer.
        var sampleIndex = 0;

        // The vertical mid point of the image (the Y-position
        // of a zero-sample and the height of the loudest sample).
        var halfHeight = imageSliceHeight / 2;

        // The maximum number of a 16-bit signed integer.
        // Dividing a signed 16-bit integer (the range -32768..32767)
        // by this value will give a value in the range of -1 (inclusive) to 1 (exclusive).
        const float SampleFactor = 32768f;

        // Iterate through the samples and render them on the image.
        for (var i = 0; i < sampleCount; i++)
        {
            // Increment the X-coordinate each time 'samplesPerPixel' pixels
            // has been drawn.
            if ((i > 0) && ((i % samplesPerPixel) == 0))
            {
                x++;
            }

            // Convert the current sample (16-bit value) from the byte-array to a
            // floating point value in the range of -1 (inclusive) to 1 (exclusive).
            var sampleValue = BitConverter.ToInt16(microphoneData, sampleIndex) / SampleFactor;

            // Scale the sampleValue to its corresponding height in pixels.
            var sampleHeight = (int)Math.Abs(sampleValue * halfHeight);

            // The top of the column of pixels.
            // A positive sample should be drawn from the center and upwards,
            // and a negative sample from the center and downwards.
            // Since a rectangle is used to describe the "pixel column", the
            // top must be modified depending on the sign of the sample (positive/negative).
            var y = (sampleValue < 0)
                ? halfHeight
                : halfHeight - sampleHeight;

            // Create the 1 pixel wide rectangle corresponding to the sample.
            var destinationRectangle = new Rectangle(x, y, 1, sampleHeight);

            // Draw using the white pixel (stretching it to fill the rectangle).
            spriteBatch.Draw(
                whitePixelTexture,
                destinationRectangle,
                sampleColor);

            // Step the two bytes-sample.
            sampleIndex += 2;
        }
    }
    finally
    {
        // Drawing done.
        spriteBatch.End();

        // Restore the normal rendering target (the screen).
        GraphicsDevice.SetRenderTarget(null);
    }
}

You can download the solution file from here.

This article was originally posted at http://laserbrain.se/post/2012/03/11/Rendering-the-audio-captured-by-a-Windows-Phone-device.aspx

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Mattias Larsson

Software Developer Diversify

Sweden

I work as a senior software developer for the expert consulting firm Diversify in Malmö, Sweden.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Rendering the audio captured by a Windows Phone device

Introduction

Background

Using the code

License

Comments and Discussions