It's surprising that there are no components for sound capturing in .NET Framework 3.5. Even designers of WPF and Silverlight 2.0 were focused on graphics so deeply, that they forgot about applications recording sound from user's microphone. It is said that the next version of Silverlight will provide such functionality.
However, what you often want to achieve is to store the recorded sound in MP3 file (or send it as MP3 stream). That's legally complicated due to MP3 patent constraints. And for the same legal reason, we can assume that we will not see MP3 functionality in Microsoft technologies soon (there is WMA instead).
Here you find an easy to use .NET 3.5 component, providing Sound Capturing functionality for a Windows application. It outputs data as raw PCM samples or a regular WAV file. Or you can just set one boolean property to use LAME DLL and perform MP3 compression on the fly.
This article uses a subset of C# MP3 Compressor libraries written by Idael Cardoso which in turn are partially based on A low level audio player in C# by Ianier Munoz. See this website for technical and copyright information regarding the LAME project.
I chose Managed DirectX (MDX) 1.1 to capture sound. The MDX project is currently frozen since Microsoft moved to XNA Game Studio Express (a solution rather inadequate just for capturing bits of sound). MDX is descended by SlimDX Open Source project which exposes roughly similar interfaces and delegates to the native DirectSound libraries.
MDX comprises of .NET assemblies delegating calls to native DirectX DLLs of 2006. We expect DirectX in a version backward compatible with the 2006 interfaces to be installed on virtually every Windows computer (yes, also works on Vista).
The component captures sound via MDX from a sound card in raw PCM format. PCM format is a simple sequence of sound sample values. The samples can be 8bit (0..255) or 16bit (-32768..32767) each. Stereo sound is a sequence of pairs of samples (left, right, left, right...). PCM is a proper format for streaming of raw data. Streaming means passing data in small chunks while the total volume of the data may be unknown. Raw PCM is the most basic type of output of
Mp3SoundCapture component. If you direct stream this format to a WAV file, you will not be able to play it as WAV file must additionally contain a RIFF header.
WAV (RIFF) format requires the raw PCM data to be prefixed with RIFF header, which apart from format information also contains information about the total length of the PCM data in the file. That's why you can't really stream RIFF data, as the total size of the stream is usually unknown. WAV (RIFF) format is the second type of output of the
Mp3SoundCapture component. You can direct stream this format to a WAV file.
The third type of
Mp3SoundCapture component output is MP3. A Bit rate (kbit/s) is a parameter which decides MP3 sound quality. Apart from that, the sound quality depends on the format of PCM data being compressed. Not every combination of the bit rate and sampling parameters is allowed. You can usher MP3 stream to an MP3 file.
Using the Code
From your application, add a reference to Istrib.Sound.Mp3.dll assembly (see
Istrib.Sound.Example.WinForms for example). MP3 compression also requires lame_enc.dll library to be located in binaries directory. Istrib.Sound.Mp3.dll assembly references Managed DirectX assemblies (located in DirectX subdirectory). The DirectX assemblies can be also installed in GAC via DirectX End-User Runtimes redistributable available for download at Microsoft site (here: Nov 2008 version).
If you experience "Loader Lock" warning while debugging your application, refer here for a workaround.
Istrib.Sound.Mp3.dll assembly contains a single component:
Mp3SoundCapture. You may think of this component as of a sound recorder. You set up its properties prior to calling
Stop() methods ("recorder buttons"). You provide a writable stream or a file path to each call to the
Start method. Single instance of the component may capture sound many times to many streams/files one by one.
You may use Visual Studio Component Designer to drag and drop the
Mp3SoundCapture component from the Visual Studio toolbox to the component surface or create the component manually:
mp3SoundCapture = new Mp3SoundCapture();
The component is ready to use just after construction. The default output is MP3 128kbit/s sampled at 22kHz, 16bit, mono. You specify sampling parameters and output format by setting the component properties.
Capturing Device (e.g. A Microphone)
You may use a default Windows recording device:
mp3SoundCapture.CaptureDevice = SoundCaptureDevice.Default;
Or choose one of the installed system sound capture devices:
mp3SoundCapture.CaptureDevice = SoundCaptureDevice.AllAvailable.First();
You set one of the 3 output types:
Mp3SoundCapture.Outputs.Mp3 - MP3 format
Mp3SoundCapture.Outputs.RawPcm - Raw sample data (without a RIFF header)
Mp3SoundCapture.Outputs.Wav - WAV file data (including the RIFF header)
mp3SoundCapture.OutputType = Mp3SoundCapture.Outputs.Mp3;
For PCM or WAV output, you may select any available sampling parameters supported by the sound card (
mp3SoundCapture.WaveFormat = PcmSoundFormat.StandardFormats.First();
... or if you wish to hardcode it:
mp3SoundCapture.WaveFormat = PcmSoundFormat.Pcm22kHz16bitMono;
Sampling parameters for MP3 format are restricted to values returned by
Mp3SoundFormat.AllSourceFormats. Not every combination of the sampling parameters and bit rate is allowed. If you choose the bit rate prior to sampling parameters, then you may use
Mp3BitRate.CompatibleSourceFormats property to list compatible values.
mp3SoundCapture.WaveFormat = myMp3BitRate.CompatibleSourceFormats.First();
MP3 Bit Rate
For MP3 output format, you specify one of the available bit rates. Again - you cannot pair each bit rate with each sampling parameters. If you choose sampling parameters prior to bit rate, then you may use
PcmSoundFormat.GetCompatibleMp3BitRates() extension method to enumerate through compatible MP3 bit rates.
mp3SoundCapture.Mp3BitRate = myPcmSoundFormat.GetCompatibleMp3BitRates().First();
... or if you wish to hardcode it:
mp3SoundCapture.Mp3BitRate = Mp3BitRate.BitRate128;
Volume Normalization Option
Often when an application records and stores many pieces of sound, it is required to adjust their volume so that all of them were at similar volume level. The
Mp3SoundCapture has the
NormalizeVolume property at your disposal to perform this transformation for you. Setting true causes all recorded sound pieces to be normalized, i.e. volume of the most loud section of the piece will be turned up to the highest possible level and all other sections will be turned up proportionally.
mp3SoundCapture.NormalizeVolume = true;
Note that the normalization algorithm must read the whole stream to find the loudest place, then rewrite the whole stream adjusting the volume of each sample. It means that the entire stream must be buffered before it is directed to the output.
Mp3SoundCapture uses a temporary file to buffer the data when normalizing. MP3 compression, if applied, is done after the normalization. When you have recorded a sizeable piece of sound, the gross of processing takes place after calling Stop() met method, not on the fly (as it is when
false). It may take time. Here
Mp3SoundCapture offers an asynchronous stopping.
To start capturing, just call
Start(Stream) method passing an open, writable stream (you must close it yourself after capturing has stopped - not obvious when using asynchronous stopping). You may also call
Start(string) method passing an output file name.
To stop capturing, just call the
As mentioned above, when normalizing, it may take some time after calling
Stop() before all captured data is written to the output stream.
Mp3SoundCapture has an option of immediately leaving
Capturing state and passing all buffer processing to a separate thread. You can start the next recording session not waiting for the last bytes of data from the previous one. By default, the asynchronous behavior is disabled. To enable it, set:
mp3SoundCapture.WaitOnStop = false;
Note that you cannot close your output buffer passed to
Start(Stream) method until a
Mp3SoundCapture.Stopped event is fired. Use
Stopped event arguments to get the reference to the stream which is ready for closing or - if you used
Start(string filePath) - a path of the file which has just been closed by
private void mp3SoundCapture_Stopped(object sender, Mp3SoundCapture.StoppedEventArgs e)
dataAvailableLbl.Text = "Data available in " + e.OutputFileName;
dataAvailableLbl.Visible = true;
Points of Interest
In some development environment configurations, you may get "Loader Lock" error (which is really a warning) while starting your application under the debugger. It's a well-known design issue in Managed DirectX. You may disable this error in Visual Studio debugger settings (most people do this without observable consequences). I preferred not to do this. Instead I found a workaround: if the project which references Istrib.Sound.Mp3.dll also explicitly references Managed DirectX assemblies (
Microsoft.DirectX.DirectSound), then the warning is not raised. Otherwise the warning is shown by the debugger each time any assembly referencing Managed DirectX libraries is loaded into the Application Domain.
However - what I experienced - you cannot use Visual Studio Add Reference... wizard to add the DirectX GAC assembly reference. That's not an issue when you reference a local copy of DirectX assemblies (like in the example: DirectX subdirectory).
The workaround for the GAC problem is to add the reference to GAC assemblies manually editing your csproj file with a text editor:
Visual Studio Add Reference... wizard generates identical XML except it includes full assembly version info. This should work as well... but it does not, at least on my machines.
- 29th November, 2008: Initial post
- 1st December, 2008: More MP3 formats available
- 19th October, 2009: Updated source code and demo project