Recording Audio to WAV with WASAPI in Windows Store Apps

padmore

5.00/5 (13 votes)

Jan 20, 2013

CPOL

6 min read

108407

1009

Getting started recording audio to WAV with WASAPI in Windows Store apps

Download source - 195.8 KB

Introduction

To record audio in Windows Store apps, the Windows Runtime provides the MediaCapture class to easily and quickly get started recording audio. You are however limited to outputting the available formats specified in the MediaEncodingProfile, WAV isn't one of them currently.

So to record to WAV, you need another solution and because you do not have access to the full .NET stack, your options are limited. WASAPI, included in Microsoft's Core Audio SDK, offers a solution.

Unfortunately, to use WASAPI, you are thrown out of a safe managed haven, to an unmanaged COM part of the woods. This drop can be quite deep with a steep learning curve to climb out of, if you are not used to dealing with unmanaged code in .NET. C#'s dynamic type won't ease the pain either, because the used COM interfaces don't work with it (they don't implement IDispatch).

Finally, writing the result to a WAV file also requires some low(er than normal) level code, which can also be an extra obstacle to overcome if you are not used to audio programming or its concepts.

This article is aimed at getting a C# developer that is a WASAPI novice up & running with a basic working solution.

Background

I wanted to try out an idea for a Windows Store app that deals with basic audio editing. For this, I wanted to use the WAV format for its lossless uncompressed characteristics and its compatibility with other audio software.

Since I consider WAV to be the default uncompressed audio format on Windows, I expected out of the box support for it in WinRT.

This is not the case, so I turned to NAudio as the solution. NAudio will do the heavy lifting, talking to the Core Audio SDK for you. Unfortunately, WinRT support in NAudio is in progress and not completed yet. It does include a working Windows Store app demo to record audio to WAV, but its built-in components to write the result to a WAV file are not available yet in WinRT.

I considered contributing to add the WinRT support I'm looking for. But that requires me to grasp a big part of the NAudio library, to be able to submit a patch that works nicely with the existing code base and its concepts.

Instead, as a starting point, I tried taking just out of NAudio what I needed and making a stripped down WinRT compatible solution, but I quickly realized I don't understand half of what it is going on.

Finally, I accepted I had to learn dealing directly with the Core Audio SDK and the basics of writing WAV files.

While learning, I discovered the official channels mostly use C++ as a default on the topic, which introduces an extra barrier to a C# developer which is more than just a syntax difference.

So with this article, I set out to piece together a solution that demonstrates the basics, obviously I cannot guarantee best practices are not violated.

Prerequisites

Basic knowledge of Windows Store app development is assumed.
Understanding of the MVVM pattern is assumed. I did not oversimplify the solution by stuffing everything from COM interop to UI logic in the codebehind, to avoid surprises when developing a more realistic solution.

Using the Code

Overview

The attached Visual Studio 2012 solution contains a Windows Store app project which demonstrates recording audio to WAV in an MVVM setup using the Core Audio SDK.

Notable namespaces:

CoreAudio namespace: contains the COM interop logic to interact with the Core Audio SDK
Services namespace: contains the business logic to record audio to a WAV file (to be honest there's more than business logic in there, such as the specifics of writing a WAV file which should be refactored out of there)

To just read the code without following this article, use the StartRecordingCommand in the ViewModels namespace as a starting point and follow the logical flow from there.

Capturing Audio via WASAPI

Select an Audio Device for Capturing

The goal is to get an IAudioCaptureClient to capture audio.

You get an IAudioCaptureClient through an IAudioClient. Both are part of the Core Audio SDK's WASAPI.

To get an IAudioClient, you use the Core Audio SDK's MMDevice API by activating an audio device.

public class WindowsMultimediaDevice
{
    [DllImport("Mmdevapi.dll", ExactSpelling = true, PreserveSig = false)]
    public static extern void ActivateAudioInterfaceAsync(
        [In, MarshalAs(UnmanagedType.LPWStr)] string deviceInterfacePath,
        [In, MarshalAs(UnmanagedType.LPStruct)] Guid riid,
        [In] IntPtr activationParams,
        [In] IActivateAudioInterfaceCompletionHandler completionHandler,
        out IActivateAudioInterfaceAsyncOperation createAsync);
}

The above definition exposes the relevant method to call. You can find the definition in the header file, if you install the Windows SDK for Windows 8.0 you can find this in Windows Kits\8.0\Include\um\mmdeviceapi.h.

The unmanaged code in Mmdevapi.dll is exposed with DllImport, the assembly Mmdevapi.dll is assumed to be available by default on Vista and up. Also, since the unmanaged code has different types, a conversion is necessary which is done by marshalling using the MarshalAs keyword.

public void Start()
{
    _isRecording = true;
 
    var defaultAudioCaptureId = MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default);
    var completionHandler = new ActivateAudioInterfaceCompletionHandler(StartCapture);
    IActivateAudioInterfaceAsyncOperation createAsync;
 
    WindowsMultimediaDevice.ActivateAudioInterfaceAsync(
        defaultAudioCaptureId, new Guid(CoreAudio.Components.WASAPI.Constants.IID_IAudioClient), 
        IntPtr.Zero, completionHandler, out createAsync);
}

Used parameters explained:

The defaultAudioCaptureId is easy to get through the MediaDevice class provided by the Windows Runtime.
The completionHandler however is another type defined by MMDevice API, view IActivateAudioInterfaceCompletionHandler for the details.
The third parameter is the IID of the WASAPI COM interface we want to get, which is an IAudioClient in this case. The value for this IID can be found in header file Windows Kits\8.0\Include\um\Audioclient.h
No activation parameters are required, so the COM equivalent of null is passed
The completionHandler is the callback that will receive the IAudioClient, which is the goal
createAsync is not used here, but passed to satisfy the method definition

Start Capturing Audio

After calling ActivateAudioInterfaceAsync, in the ActivateAudioInterfaceCompletionHandler callback, use the activated IAudioClient to get an IAudioCaptureClient.

object audioCaptureClientInterface;
audioClient.GetService(new Guid(CoreAudio.Components.WASAPI.Constants.IID_IAudioCaptureClient), 
                       out audioCaptureClientInterface);

var audioCaptureClient = (IAudioCaptureClient)audioCaptureClientInterface;
var sleepMilliseconds = CalculateCaptureDelay(waveFormat, bufferSize);
audioClient.Start();

while (_isRecording)
{
 Task.Delay(sleepMilliseconds);
 CaptureAudioBuffer(waveFormat, bufferSize, audioCaptureClient, sleepMilliseconds);
}
audioClient.Stop();

The actual audio capturing happens in the while loop. To be honest, the specifics are entirely based on an MSDN example in C++ using the NAudio Windows Store app demo as a help for bringing it to C#.

As I understand it, to optimize the process of capturing, a delay is executed on each pass to ensure the buffer can keep up. No point in hammering an empty buffer.

Then each time the buffer is read, for as long as there is something available (GetNextPacketSize > 0), the buffer is read. The mixformat of the audio device you're capturing with, determines how to interpret the bytes in the buffer.

Finally, any subscribed clients are signaled through an event, with the captured buffer as an argument.

Writing WAV Files

Basically a WAV file consists out of a header in which the format details are specified and the actual data, the different blocks are called chunks.

Create a WAV File to Store the Captured Audio

After getting a binary writer that points to a file path to output to, the file is prepared as a WAV file to write the captured audio in.
You can find this logic in WaveFileWriter.

private void WriteWavRiffHeader()
{
    _binaryWriter.Write("RIFF".ToCharArray());
    _binaryWriter.Write((uint)0);               // to be updated with length of file after this point
    _binaryWriter.Write("WAVE".ToCharArray());
}

The header starts with the main chunk, which specifies that this is a WAV file. The length of the file is unknown at this point and therefore initialized as zero.

private void WriteWavFormatChunkHeader(WaveFormat waveFormat)
{
    _binaryWriter.Write("fmt ".ToCharArray());

    uint samplesPerSecond = (uint)waveFormat.SampleRate;
    ushort channels = (ushort)waveFormat.Channels;
    ushort bitsPerSample = (ushort)waveFormat.BitsPerSample;
    ushort blockAlign = (ushort)(channels * (bitsPerSample / 8));
    uint averageBytesPerSec = (samplesPerSecond * blockAlign);

    _binaryWriter.Write((uint)(18 + waveFormat.ExtraSize));     // Length of header in bytes
    unchecked { _binaryWriter.Write((short)0xFFFE); }           // Format tag, 65534 
                                                                // (WAVE_FORMAT_EXTENSIBLE)
    _binaryWriter.Write(channels);                              // Number of channels
    _binaryWriter.Write(samplesPerSecond);                      // Frequency of the audio in Hz... 44100
    _binaryWriter.Write(averageBytesPerSec);                    // For estimating RAM allocation
    _binaryWriter.Write(blockAlign);                            // Sample frame size, in bytes
    _binaryWriter.Write(bitsPerSample);

    _binaryWriter.Write((short)waveFormat.ExtraSize);           // Extra param size
    _binaryWriter.Write(bitsPerSample);                         // Should be valid bits per sample
    _binaryWriter.Write((uint)3);                               // Should be channel mask
    byte[] subformat = new Guid(KsMedia.WAVEFORMATEX).ToByteArray();
    _binaryWriter.Write(subformat, 0, subformat.Length);
}

The next chunk above, specifies the details of the WAV file, using the format of the activated IAudioClient.

private void WriteWavDataChunkHeader()
{
    // Write the data chunk
    _binaryWriter.Write("data".ToCharArray());                // Chunk id

    _dataSizePosition = _fileStream.Position;
    _binaryWriter.Write((uint)0);                             // to be updated with length of data
}

Finally, the last chunk before the actual data, specifies the start and length of the data which is currently unknown.

Write the Captured Audio to the Wave File

Writing the capture audio is straightforward, the received bytes are appended raw to the file.

public void Write(byte[] buffer, int bytesRecorded)
{
    _fileStream.Write(buffer, 0, bytesRecorded);

    _dataChunkSize += bytesRecorded;
}

When capturing is done and the last buffer is written, the headers are updated with the required length according to the specification.

private void UpdateWavRiffHeader()
{
    _binaryWriter.Seek(4, SeekOrigin.Begin);
    _binaryWriter.Write((uint)(_binaryWriter.BaseStream.Length - 8));
}

private void UpdateDataChunkHeader()
{
    _binaryWriter.Seek((int)_dataSizePosition, SeekOrigin.Begin);
    _binaryWriter.Write((uint)_dataChunkSize);
}

References

History

1.0 - Initial version