Click here to Skip to main content
Click here to Skip to main content

Recording audio to WAV with WASAPI in Windows Store apps

By , 19 Jan 2013
 

Introduction

To record audio in Windows Store apps, the Windows Runtime provides the MediaCapture class to easily and quickly get started recording audio. You are however limited to outputting the available formats specified in the MediaEncodingProfile, WAV isn't one of them currently.

So to record to WAV you need another solution and because you do not have access to the full .NET stack your options are limited. WASAPI, included in Microsoft's Core Audio SDK, offers a solution.

Unfortunately, to use WASAPI you are thrown out of a safe managed haven, to an unmanaged COM part of the woods. This drop can be quite deep with a steep learning curve to climb out of, if you are not used to dealing with unmanaged code in .NET. C#'s dynamic type won't ease the pain either, because the used COM interfaces don't work with it (they don't implement IDispatch).

Finally writing the result to a WAV file also requires some low(er than normal) level code, which can also be an extra obstacle to overcome if you are not used to audio programming or its concepts.

This article is aimed at getting a C# developer that is a WASAPI novice up & running with a basic working solution.

Background

I wanted to try out an idea for a Windows Store app that deals with basic audio editing. For this, I wanted to use the WAV format for its lossless uncompressed characteristics and its compatibility with other audio software.

Since I consider WAV to be the default uncompressed audio format on Windows, I expected out of the box support for it in WinRT.

This is not the case, so i turned to NAudio as the solution. NAudio will do the heavy lifting, talking to the Core Audio SDK for you. Unfortunately WinRT support in NAudio is in progress and not completed yet. It does include a working Windows Store app demo to record audio to WAV, but its built-in components to write the result to a WAV file are not available yet in WinRT.

I considered contributing to add the WinRT support I'm looking for. But that requires me to grasp a big part of the NAudio library, to be able to submit a patch that works nicely with the existing code base and its concepts.

Instead, as a starting point I tried taking just out of NAudio what I needed and making a stripped down WinRT compatible solution, but I quickly realized I don't understand half of what it is going on.

Finally I accepted I had to learn dealing directly with the Core Audio SDK and the basics of writing WAV files.

While learning I discovered the official channels mostly use C++ as a default on the topic, which introduces an extra barrier to a C# developer which is more than just a syntax difference.

So with this article I set out to piece together a solution that demonstrates the basics, obviously I cannot guarantee best practices are not violated.

Prerequisites:

  • Basic knowledge of Windows Store app development is assumed.
  • Understanding of the MVVM pattern is assumed. I did not oversimplify the solution by stuffing everything from COM interop to UI logic in the codebehind, to avoid surprises when developing a more realistic solution.

Using the code

Overview

The attached Visual Studio 2012 solution contains a Windows Store app project which demonstrates recording audio to WAV in an MVVM setup using the Core Audio SDK.

Notable namespaces:

  • CoreAudio namespace: contains the COM interop logic to interact with the Core Audio SDK
  • Services namespace: contains the business logic to record audio to a WAV file (to be honest there's more than business logic in there, such as the specifics of writing a WAV file wich should be refactored out of there)

To just read the code without following this article, use the StartRecordingCommand in the ViewModels namespace as a starting point and follow the logical flow from there.

Capturing audio via WASAPI

Select an audio device for capturing

The goal is to get an IAudioCaptureClient to capture audio.

You get an IAudioCaptureClient through an IAudioClient. Both are part of the Core Audio SDK's WASAPI.

To get an IAudioClient you use the Core Audio SDK's MMDevice API by activating an audio device.

public class WindowsMultimediaDevice
{
    [DllImport("Mmdevapi.dll", ExactSpelling = true, PreserveSig = false)]
    public static extern void ActivateAudioInterfaceAsync(
        [In, MarshalAs(UnmanagedType.LPWStr)] string deviceInterfacePath,
        [In, MarshalAs(UnmanagedType.LPStruct)] Guid riid,
        [In] IntPtr activationParams,
        [In] IActivateAudioInterfaceCompletionHandler completionHandler,
        out IActivateAudioInterfaceAsyncOperation createAsync);
}

The above definition exposes the relevant method to call. You can find the definition in the header file, if you install the Windows SDK for Windows 8.0 you can find this in Windows Kits\8.0\Include\um\mmdeviceapi.h

The unmanaged code in Mmdevapi.dll is exposed with DllImport, the assembly Mmdevapi.dll is assumed to be available by default on Vista and up. Also, since the unmanaged code has different types, a conversion is necessary which is done by marshalling using the MarshalAs keyword.

public void Start()
{
    _isRecording = true;
 
    var defaultAudioCaptureId = MediaDevice.GetDefaultAudioCaptureId(AudioDeviceRole.Default);
    var completionHandler = new ActivateAudioInterfaceCompletionHandler(StartCapture);
    IActivateAudioInterfaceAsyncOperation createAsync;
 
    WindowsMultimediaDevice.ActivateAudioInterfaceAsync(
        defaultAudioCaptureId, new Guid(CoreAudio.Components.WASAPI.Constants.IID_IAudioClient), 
        IntPtr.Zero, completionHandler, out createAsync);
}

Used parameters explained:

  • The defaultAudioCaptureId is easy to get through the MediaDevice class provided by the Windows Runtime.
  • The completionHandler however is another type defined by MMDevice API, view IActivateAudioInterfaceCompletionHandler for the details.
  • The third parameter is the IID of the WASAPI COM interface we want to get, which is an IAudioClient in this case. The value for this IID can be found in header file Windows Kits\8.0\Include\um\Audioclient.h
  • No activation parameters are required, so the COM equivalent of null is passed
  • The completionHandler is the callback that will receive the IAudioClient, which is the goal
  • createAsync is not used here, but passed to satisfy the method definition

Start capturing audio

After calling ActivateAudioInterfaceAsync, in the ActivateAudioInterfaceCompletionHandler callback use the activated IAudioClient get an IAudioCaptureClient.

object audioCaptureClientInterface;
audioClient.GetService(new Guid(CoreAudio.Components.WASAPI.Constants.IID_IAudioCaptureClient), out audioCaptureClientInterface);

var audioCaptureClient = (IAudioCaptureClient)audioCaptureClientInterface;
var sleepMilliseconds = CalculateCaptureDelay(waveFormat, bufferSize);
audioClient.Start();

while (_isRecording)
{
 Task.Delay(sleepMilliseconds);
 CaptureAudioBuffer(waveFormat, bufferSize, audioCaptureClient, sleepMilliseconds);
}
audioClient.Stop();

The actual audio capturing happens in the while loop. To be honest, the specifics are entirely based on an MSDN example in C++ using the NAudio Windows Store app demo as a help for bringing it to C#.

As I understand it, to optimize the process of capturing, a delay is executed on each pass to ensure the buffer can keep up. No point in hammering an empty buffer.

Then each time the buffer is read, for as long as there is something available (GetNextPacketSize > 0), the buffer is read. The mixformat of the audio device you're capturing with, determines how to interpret the bytes in the buffer.

Finally, any subscribed clients are signaled through an event, with the captured buffer as an argument.

Writing WAV files

Basically a WAV file consists out of a header in which the format details are specified and the actual data, the different blocks are called chunks.

Create a WAV file to store the captured audio

After getting a binary writer that points to a file path to output to, the file is prepared as a WAV file to write the captured audio in.
You can find this logic in WaveFileWriter.

private void WriteWavRiffHeader()
{
    _binaryWriter.Write("RIFF".ToCharArray());
    _binaryWriter.Write((uint)0);               // to be updated with length of file after this point
    _binaryWriter.Write("WAVE".ToCharArray());
}

The header starts with the main chunk, which specifies that this is a WAV file. The length of the file is unknown at this point and therefore initialized as zero.

private void WriteWavFormatChunkHeader(WaveFormat waveFormat)
{
    _binaryWriter.Write("fmt ".ToCharArray());

    uint samplesPerSecond = (uint)waveFormat.SampleRate;
    ushort channels = (ushort)waveFormat.Channels;
    ushort bitsPerSample = (ushort)waveFormat.BitsPerSample;
    ushort blockAlign = (ushort)(channels * (bitsPerSample / 8));
    uint averageBytesPerSec = (samplesPerSecond * blockAlign);

    _binaryWriter.Write((uint)(18 + waveFormat.ExtraSize));               // Length of header in bytes
    unchecked { _binaryWriter.Write((short)0xFFFE); }                     // Format tag, 65534 (WAVE_FORMAT_EXTENSIBLE)
    _binaryWriter.Write(channels);                                        // Number of channels
    _binaryWriter.Write(samplesPerSecond);                                // Frequency of the audio in Hz... 44100
    _binaryWriter.Write(averageBytesPerSec);                              // For estimating RAM allocation
    _binaryWriter.Write(blockAlign);                                      // Sample frame size, in bytes
    _binaryWriter.Write(bitsPerSample);

    _binaryWriter.Write((short)waveFormat.ExtraSize);                     // Extra param size
    _binaryWriter.Write(bitsPerSample);                                   // Should be valid bits per sample
    _binaryWriter.Write((uint)3);                                         // Should be channel mask
    byte[] subformat = new Guid(KsMedia.WAVEFORMATEX).ToByteArray();
    _binaryWriter.Write(subformat, 0, subformat.Length);
}

The next chunk above, specifies the details of the WAV file, using the format of the activated IAudioClient.

private void WriteWavDataChunkHeader()
{
    // Write the data chunk
    _binaryWriter.Write("data".ToCharArray());                // Chunk id

    _dataSizePosition = _fileStream.Position;
    _binaryWriter.Write((uint)0);                             // to be updated with length of data
}

Finally the last chunk before the actual data, specifies the start and length of the data which is currently unknown.

Write the captured audio to the wave file

Writing the capture audio is straightforward, the received bytes are appended raw to the file.

public void Write(byte[] buffer, int bytesRecorded)
{
    _fileStream.Write(buffer, 0, bytesRecorded);

    _dataChunkSize += bytesRecorded;
}

When capturing is done and the last buffer is written, the headers are updated with the required length according to the specification.

private void UpdateWavRiffHeader()
{
    _binaryWriter.Seek(4, SeekOrigin.Begin);
    _binaryWriter.Write((uint)(_binaryWriter.BaseStream.Length - 8));
}

private void UpdateDataChunkHeader()
{
    _binaryWriter.Seek((int)_dataSizePosition, SeekOrigin.Begin);
    _binaryWriter.Write((uint)_dataChunkSize);
}

References

History

1.0 - Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

padmore.be
Software Developer (Senior)
Belgium Belgium
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionHow to get PCMmemberaspaitm7-Jun-13 20:39 
how to get pcm data at following format
Sample Rate:8000
Channel:1(mono)
Bit Rate:-16
QuestionHow do I download the source code?memberElmerBacon7-May-13 3:20 
Hi,
 
I'm terribly sorry if I'm being thick or something, but since the site's redesign, I can't find a download link for the code as a single archive, nor a link to a code repository or anything.
And while the new "Browse Code" is a nice feature, I don't fancy copying and pasting every single file one by one...
Any help, please?
AnswerRe: How do I download the source code?memberpadmore.be7-May-13 4:16 
Answer[^] Smile | :)
Bugcant play audiomemberMember 974254121-Apr-13 22:00 
Hi,
 
This seems to work great, but i can't play the resulting file. Is this a problem with the WAV header? the file size looks about right.
 
Thanks for your help,
 
Oli
GeneralRe: cant play audiomemberpadmore.be22-Apr-13 11:34 
I don't know, "Works on my machine" Smile | :)
 
Have you tried different media players to play the file?
If it fails in all players, a mismatch between the actual data and the header seems like a possible explanation.
 
What is the default format that's set on your system? (Recording Devices > tab Recording > Properties of default device > tab Advanced > Default format; mine's: 2 channel, 16 bit, 44100 Hz..if your's different, can you try changing to what I have to troubleshoot the problem)
QuestionRe: cant play audiomemberMember 974254124-Apr-13 0:59 
Thanks for your reply, it was on the right lines.
 
I have changed the header information as follows;
 
uint samplesPerSecond = (uint)44100;
ushort channels = (ushort)2;
ushort bitsPerSample = (ushort)32;
 
It produces a playable audio file, but it plays twice as fast as it should for half the duration. I believe it is because I should have stereo, but when I change channels = (ushort)1, I get a gibberish audio file.
 
Does changing the number of channels here effect anything else?
i.e what is the 'channel mask' at the end of the header.
 
Thanks for your help! It's really helping us out!
BugRe: cant play audiomemberMember 974254124-Apr-13 1:05 
I seem to have found it, the channel mask is hard coded to 3, this should be changed to the channels var.
Seems to fix the problem.
 
Might need to update the code?
 
Thanks!
 

Bug Line example:
_binaryWriter.Write((uint)3); // Should be channel mask
GeneralRe: cant play audiomemberpadmore.be25-Apr-13 11:29 
Glad you got it working on your end!
 
Here's more details about the channel mask:
http://msdn.microsoft.com/en-us/library/windows/hardware/ff538802(v=vs.85).aspx[^]
 
As I understand it, it should not necessarily equal the number of channels.
GeneralMy vote of 5memberMember 976397611-Apr-13 11:23 
Very useful. Thanks a lot
QuestionSource codemembermichalburger19-Mar-13 2:47 
Hello, is it possible to download the whole source code or you don't want to allow that?
Maybe I am just missing a link somewhere.
AnswerRe: Source codememberpadmore.be19-Mar-13 8:58 
I had to look hard, but I found a link to the zip file:
http://www.codeproject.com/KB/audio-video/525620/RecordingWavFilesWithWasapiInWindowsRT.zip[^]
 
You must be logged in to download.
GeneralRe: Source codemembermichalburger19-Mar-13 9:00 
Oh, thank you very much. I wasn't able to find that link. Great!
QuestionSetProperty inaccessible due to protection levelmemberjayinatlanta28-Feb-13 6:06 
To anyone who tries to test the methods "as is" and gets a problem with setting _isRecording, find the bool SetProperty in the BindableBase and change it to a public bool instead.
 
Thanks padmore.be for a great article on this important ability.
QuestionHow to set sample rate ?memberakito092424-Jan-13 20:08 
Thanks for your great article. I want to get lower latency but I have some questions.
 
1.
In the ActivateAudioInterfaceCompletionHandler's function InitializeAudioClient()
 
Marshal.ThrowExceptionForHR(audioClient.Initialize(AudioClientShareMode.Shared, AudioClientStreamFlags.None, hnsBufferDuration, 0, waveFormat,audioSessionGuid))
 
The waveFormat is fixed, if I change the property of waveFormat like BitsPerSample or SampleRate, then I will get ArgumentException (Value does not fall within the expected range), even though I Initialize audioClient in Exclusive mode.
 
How should I do ?
 
2.How to change AudioClient's BufferSize ?
AnswerRe: How to set sample rate ?memberpadmore.be27-Jan-13 4:41 
(Sorry for the slow reply, I didn't check back often because I expected an email for new comments, i'll check if there are alert settings)
 
1. I haven't changed the mixformat, so I can't speak from experience.
But as I understand, you should use IAudioClient's IsFormatSupported[^] to check if the format is supported, to check if you can initialize the audio client with it.
As you can see in the documentation, the method can also return the closest match if there is no exact match, which should be helpful.
 
2. You specify the buffer size you'd like in the Initialize method of IAudioClient, parameter hnsBufferDuration. Then after you called Initialize, you call GetBufferSize [^] to see what the actual buffer size is.
As I understand, IAudioClient will have a minimum buffer size to avoid glitches, so the buffer size might differ from what you requested.
 
I'm interested in how you eventually got it working Smile | :)
GeneralRe: How to set sample rate ? [modified]memberakito092428-Jan-13 19:31 
Thanks for your reply.
 
1.I have use two different device to record audio, the MixFormat is the same :
 
AverageBytesPerSecond = 352800
BitsPerSample = 32
BlockAlign = 8
Channels = 2
ExtraSize = 22
SampleRate = 44100
WaveFormatTag = Extensible
 
I tried to use IAudioClient's IsFormatSupported,and set several format , one is:
 
AverageBytesPerSecond = 1024
BitsPerSample = 16
BlockAlign = 8
Channels = 1
ExtraSize = 22
SampleRate = 11025
WaveFormatTag = WaveFormatEncoding.Pcm
 
It always return -2004287480, and the closest match is always return null.
 
2.Even though I changed hnsBufferDuration(10000),the GetBufferSize always return 72765

modified 29-Jan-13 6:31am.

GeneralRe: How to set sample rate ?memberpadmore.be29-Jan-13 9:55 
1. I have the same problem, I'm experimenting with setting the mixformat and have not been successful yet.
When I change the format of my default capture device in Windows, the mixformat returned after Initialize matches it.
But when I try to Initialize it to one of those supported formats, it is not accepted.
I'll report back, if I get it working.
 
2. I guess that means 72765 is the smallest possible buffer size?
On my machine 882 is the smallest buffer size for mixformat 32 bit 44100Hz.
Try requesting a bigger buffer size, to verify that it works.
 
I have found this seems to hold up when requesting the buffer size:
Actual BufferSize = SampleRate * hnsBufferDuration in seconds (unless the buffer size is smaller than the minimum, than it equals the minimum)
 
Since "hnsBufferDuration..is of type REFERENCE_TIME and is expressed in 100-nanosecond units", this means:
hnsBufferDuration in seconds = (hnsBufferDuration * 100) / 1 000 000 000
 
Finally, take into account that:
"If the client requests a buffer size (through the hnsBufferDuration parameter) that is not an integral number of audio frames, the method rounds up the requested buffer size to the next integral number of frames."
GeneralRe: How to set sample rate ?memberakito092430-Jan-13 15:28 
Thanks for your reply.
 
1. Because I want to record audio and send it to someone by udp socket, so I need to set
low sample rate for low lantecy. Or could you give me some suggests ?
 
2.You are right, if I set hnsBufferDuration = Constants.REFTIMES_PER_MILLISEC * 10000,
the buffer size becomes 441000.
GeneralRe: How to set sample rate ? [modified]memberpadmore.be31-Jan-13 10:29 
I understand your problem and what you are trying to accomplish, but I'm afraid I don't have an answer immediately.
 
If changing sample rate won't work, I guess you could consider resampling the captured audio on the fly to a lower quality before sending it over the network.
But again, I don't have enough experience in this area to point you to a clear cut solution.
 
But are you sure you want to capture the raw audio data as WAV and send that over the network?
Remember, with the MediaCapture class[^] you can record directly to a compressed format like mp3 [^] for example.
I'd consider compressed audio instead of lowering the quality of uncompressed audio to cut down on the size.

modified 2-Feb-13 10:48am.

GeneralRe: How to set sample rate ?memberakito09243-Feb-13 21:06 
Thanks for your help.
 
At the beginning, I use MediaCapture class to record mp3 and send that over the network
between windows 8 device successfully.
 
But now I want to communicate with other platform(iOS、Android), other platform record sound to WAV , so I have to record sound to WAV too.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130617.1 | Last Updated 20 Jan 2013
Article Copyright 2013 by padmore.be
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid