Click here to Skip to main content
12,758,605 members (32,263 online)
Click here to Skip to main content
Add your own
alternative version


9 bookmarked
Posted 28 Jun 2013

Low-Level Control of *.wav Data (Part II)

, 29 Jun 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
How to use waveIn* functions to record audio data, low-level


 As I stated in Part I of this series, I've been looking for a way to grab and control low-level data for audio playback. In the grand scheme of things, this research I did was all for a larger project (which I won't go over in these articles). The mini-project I used as a stepping stone for this is a program I call Wave Recorder, which if you ask me is about as fitting a name as one can get. This article is Part II of IV and introduces us to the waveIn API. Ultimately, this API gives us a quick look into low-level data control of voice recording utilizing a specified device (most likely your default recording device on your computer). This article assumes you have a solid basis to C/C++ programming with a good knowledge of memory, pointers, and the overall Windows programming environment (this is to say, how to create a Windows window, manipulate the message pump, and such). 

So, What's the Difference?

 Where waveOut, as the name implies, puts your sound outwards, waveIn, brings your sound inwards. It manipulates sound devices all the same like waveOut, only with the true difference being that it brings it in. 

Think of recording sound in C/C++ like how a host would pour guests glasses. Let the recorded sound from the microphone be the pitcher of water and the glasses your empty buffers, waiting to be filled. See, with waveOut, you started with a buffer that had data in it (yes we supplied it, but it was nonetheless data; in the future it will come from a file), but with waveIn you have an empty buffer that you give it data. So, being the brilliant host you can be, you fill your guests glasses with water. As each glass gets full, you stop filling that particular glass and move on to the next empty glass.

 This is how recording works. We obtain the data from the microphone and it returns to us our filled buffer. We gave the device an empty buffer of our own specified size and it gave us back a full container with data. What we do with that data, well... that's to be determined. 

Let's Get Our Chicken Lo Mein

Because our meat and potatoes is giving me heartburn...

So, how do we get started? Let's take a look at the "to do list". You'll notice that the waveIn list is, if not simpler, just as simple as waveOut and, in fact, you'll find the steps to be very similar (who'd have thought?):

  1. Allocate your memory buffer. We'll cover this more later.
  2. Open the device.  
  3. Next, you prepare the headers and add them to the queue.  
  4. Start recording.  
  5. Do what you want with the returned data buffer.  
  6. Return the empty buffer.  
  7. Stop recording or let finish without adding back to the queue.  
  8. Unprepared your headers.  
  9. Close device.  
  10. Free your memory.
Take a quick look at step 6 for me. You see what it says? We're returning the empty buffer back to the device. Why are we doing this? Showing my green side here, we're recycling! But it's a necessary thing to do too, for multiple reasons: namely, it cuts down on memory usage, and secondly, it keeps the device recording (there are other additional reasons, but these are the two that we must focus on). 

Remember back to the previous article: what happened when you added your buffer to the waveOut device and started playing? It played the sound, then stopped. Well, that's what waveIn does too: it records data into your buffer you supplied to it until it reaches the end, and if no more buffers are there, it stops recording (so keep this in mind when you're programming!).

O'Malley got his start on this yesterday...where were you?

So, enough talk and theory, let's just jump into it and see where it goes! Like with waveOut, I like to keep things short and simple as possible and I like to keep it all neat. So there are a few conventions I use here to keep it all together, but if you feel you have a better way, then by all means!

struct wIn
 int devnum, count;
 HWAVEIN handle;
 char *buffers[MAXBUFFERS];

So, first thing's first: I create a struct for my necessary variables for recording. MAXBUFFERS I've defined as 3, the minimum recommend number of buffers for recording. Now that that's out of the way, let's open a device and start recording.

On your mark, get set, allocate!

 Looking back up at the process involved, the first step is to allocate memory. That's all well and good, but why do I have to allocate more than one buffer? Why won't one buffer alone work? Well, without pictures, it's a bit hard to describe, but ultimately the answer is that you cannot work on the same buffer which is being used by the microphone. 

See, you give the recording device an empty buffer and say, "Give me X bytes of recorded data!" and well, the device cooperates and takes that buffer and fills it. Key word here being, takes. It takes it away from your control temporarily and uses it for you. If you were to try to use only one buffer, you'd have a buffer that you supply, gets filled (recording stops once full), you process (i.e. place into a file), and then supply back to the buffer (upon which recording would resume). So sure, theoretically, you could record with only one buffer--the speeds computers these days reach are phenominal, but even with that said, the time from the moment it stops to the time you give it the buffer back is eternally long to a computer and as such you would hear a noticeable pause/click (click like the sound when you plug a studio mic into a live feed).

Well that's all well and good, but why then can't I just use two? I'll just swap them out! While this is a step closer in the right direction, it's still very difficult. For most computers nowadays, you probably could get away with it really easily, but what if there's a hang up and the buffer that you're processing is temporarily held up by your processing routine? What then? You'll get a pause/click in recording again and could even lose data. 

So here's were three is optimal: one to be on the queue and wait for recording, one to be recorded on, and one to be processed in the meantime. With this thought process it almost becomes like a factory line: 1, 2, 3; 1 gets full and goes to processing (and eventually comes back to the end of the line); 2, 3, 1, and 2 gets full... etc. You get the picture--and hey, now that you got it, let's get to it!

 For my recording program, I like to allocate my memory at the creation and free it at the destruction. In this way I continually and perpetually hold on to my memory and don't have to worry about sharing. So, using my setformat() routine I created in the last article, I allocate my memory like this (again assuming you understand the Windows message pump):  
/* ... */

case WM_INITDIALOG: /* (or WM_CREATE) */
 setformat(&mic.fmt, 2, 44100, 16);
 mic.buffers[0] = (char*)malloc(mic.fmt.nAvgBytesPerSec/100);
 mic.buffers[1] = (char*)malloc(mic.fmt.nAvgBytesPerSec/100);
 mic.buffers[2] = (char*)malloc(mic.fmt.nAvgBytesPerSec/100);

if(mic.buffers[0] && mic.buffers[1] && mic.buffers[2])
  memset(mic.buffers[0], 0, mic.fmt.nAvgBytesPerSec/100);
  memset(mic.buffers[1], 0, mic.fmt.nAvgBytesPerSec/100);
  memset(mic.buffers[2], 0, mic.fmt.nAvgBytesPerSec/100);

/* ... */




Now, you'll see that I don't do too much in the way of error checking; I've already done it in my own library and I feel that if I were just to do it all here, you'd either learn nothing or would feel hampered about doing it my way. So do it your way (also, if you want to allocate your memory at a different time, or right before you start recording, that's fine too). 

Get me some Tylenol...geez, can this guy go on...

I know I'm kind of dragging here, but if you're still reading this (and I hope you are), you've gotten to the good part. But here's one last addendum to this whole thing before we hit the code: I'm basing a lot of this off of your own knowledge and ability to create your own code, not just copy and paste from mine. With that said, you'll see that in the code segment to follow, I'm not really doing very much error checking, or setting up the system (i.e. finding devices, selecting between, creating the window, etc). Additionally, don't forget the process order when reading this code.

In my Win32 environments I tend to lean (perhaps a bit too heavily) on DialogBoxes and so the following code is mostly geared towards that, but really, to switch to a WndProc from DlgProc, the differences aren't all that different, so I'll leave it up to you. 

/* ... */

  case IDC_STARTRECORD: /* record */
   /* here's where I'd allocate memory if you want to just allocate prior to recording */
   if(waveInOpen(&mic.handle, mic.devnum, &mic.fmt, (DWORD)hwnd, NULL, CALLBACK_WINDOW) == MMSYSERR_NOERROR)
    /* create our headers */
    memset(&mic.hdrs, 0, sizeof(mic.hdrs));
    mic.hdrs[0].lpData = &mic.buffers[0];
    mic.hdrs[0].dwBufferLength = mic.fmt.nAvgBytesPerSec/100;
    mic.hdrs[0].dwUser = 1;   
    mic.hdrs[1].lpData = &mic.buffers[1];
    mic.hdrs[1].dwBufferLength = mic.fmt.nAvgBytesPerSec/100;
    mic.hdrs[1].dwUser = 2;

    mic.hdrs[2].lpData = &mic.buffers[2];
    mic.hdrs[2].dwBufferLength = mic.fmt.nAvgBytesPerSec/100;
    mic.hdrs[2].dwUser = 3;

    if(waveInPrepareHeader(mic.handle, &mic.hdrs[0], sizeof(mic.hdrs[0])) == MMSYSERR_NOERROR)
     if(waveInPrepareHeader(mic.handle, &mic.hdrs[1], sizeof(mic.hdrs[1])) == MMSYSERR_NOERROR)
      if(waveInPrepareHeader(mic.handle, &mic.hdrs[2], sizeof(mic.hdrs[2])) == MMSYSERR_NOERROR)
       if(waveInAddBuffer(mic.handle, &mic.hdrs[0], sizeof(mic.hdrs[0])) == MMSYSERR_NOERROR)
        if(waveInAddBuffer(mic.handle, &mic.hdrs[1], sizeof(mic.hdrs[1])) == MMSYSERR_NOERROR)
         if(waveInAddBuffer(mic.handle, &mic.hdrs[2], sizeof(mic.hdrs[2])) == MMSYSERR_NOERROR)
          if(waveInStart(mic.handle) == MMSYSERR_NOERROR)
           recording = TRUE;
  case IDC_STOPRECORD: /* stop recording */
   if(waveInStop(mic.handle) == MMSYSERR_NOERROR)
    if(waveInReset(mic.handle) == MMSYSERR_NOERROR)
     if(waveInUnprepareHeader(mic.handle, &mic.hdrs[0], sizeof(mic.hdrs[0])) == MMSYSERR_NOERROR)
      if(waveInUnprepareHeader(mic.handle, &mic.hdrs[1], sizeof(mic.hdrs[1])) == MMSYSERR_NOERROR)
       if(waveInUnprepareHeader(mic.handle, &mic.hdrs[2], sizeof(mic.hdrs[2])) == MMSYSERR_NOERROR)
        if(waveInClose(mic.handle) == MMSYSERR_NOERROR)
         /* if you've allocated prior to recording, here's where I would release (note freeing has to be done after waveInUnprepareHeader */
         /* clear up the wav variables */
         mic.handle = NULL;
         recording = FALSE;

/* ... */  

Alright, let's talk about this code a bit. In the IDC_STARTRECORD section we're obviously starting up our recording process. Looking above we see that this follows the list: open, prepare, add, record.

Again, I'm assuming that you've done the work necessary to find your devices and set the current device into mic.devnum, which we pass to waveInOpen. Additionally, you'll see that in waveInOpen, we're passing the variable hwnd to the function. hwnd is  the handle to the dialog procedure's actual window and what we're telling waveInOpen is that when it's finished with our buffer, let us know through this window. There are other ways to callback, but especially in a Win32 environment, I prefer CALLBACK_WINDOW.

The next thing you'll notice is what we're doing to mic.hdrs. Initially, we're setting it all to 0, the reason for this is one, the prevention of garbage, and two, that the dwFlags member of a WAVEHDR struct must be set to 0 before sending it to waveInPrepareHeader. This is to clear any flags (such as WHDR_DONE) from the WAVEHDR struct so when our following functions use it, they don't think it's already been used up.

With the headers sent off to waveInPrepareHeader, we come to our function waveInAddBuffer. In my own opinion, I think this is probably the most crucial set, and I know for a fact that it's crucial in one other major way that would be a show-stopper! waveInAddBuffer allows us to set our buffers onto the queue. Remember that factory line metaphor I talked about? Well think of it this way: this factory line conveyor belt is stopped and waiting to be started. Using waveInAddBuffer we add a buffer to this factory line where it will sit there and wait to be filled up. Once we waveInStart our conveyor belt, the first buffer we added gets taken and starts to fill up.

Let's quickly cover the IDC_STOPRECORD section. First we see waveInStop, which does just that--it stops the conveyor belt. Next we see waveInReset, but what is that? What does that mean. Well, waveInStop only stops the recording process; at this point, when you call waveInStart again, it picks up from where it left off. waveInReset, however, will stop the conveyor belt and clear everything off of it. Perhaps a better name for waveInStop would've been waveInPause and for waveInReset, waveInStop? Well, regardless, once the recording has been stopped, the best process to work before calling your waveInUnprepareHeader is to call waveInReset because you might actually try unpreparing a header that's currently being held in memory and would cause a memory leak.

Let's visit our local recycling center, oh boy!

Last but not least is our recycling center (the heart of the whole beast!), or basically the process that enables the conveyor belt to keep going without running out of data buffers. Now, I'm really hoping that many of you have already pieced it together but are just waiting for the message to switch, but if not, that's okay too because believe me, I had to sift through the MSDN trying to make sense of all that because I couldn't find a good tutorial like this one to help me, so I feel your pain.

Anyway, here we go:

/* ... */

case MM_WIM_DATA: /* data buffer has been filled */

/* todo - process the data (i.e. save in file) */

 waveInAddBuffer(mic.handle, (WAVEHDR*)lp, sizeof(WAVEHDR));

/* ... */

Alright, so slap this sucker straight into your message pump and you've pretty much got it. Remember back to waveInOpen, where we specified hwnd in the function and presented the flag CALLBACK_WINDOW? Well, waveInOpen remembered that and has sent us the message MM_WIM_DATA straight to our message pump to tell us that the buffer we supplied (pointed to by the LPARAM lp) is full. And if you'll look at the ((WAVEHDR*)lp)->dwUser parameter you can see whether it's 1, 2, or 3.

So with this message at our hands we now have access to the very data that we wanted in the first place. That low-level data that's oh so very important and that we can use to create a .wav file with. Unfortunately, I'm not showing that in this message. It takes a whole lot more than just writing to a file. But at least we have access to it. With that data in hand, we can do just about anything we want with it.

Notice though that the function waveInAddBuffer has cropped up again! I told you it's an important one. Once we've received our recorded data, our header is still prepared (because you'll notice, we're not messing with it at all; once prepared, always prepared until waveInUnprepareHeader is called) and we've processed our data, so here's where we call waveInAddBuffer to put our buffer back at the end of the line to wait for further instructions.

When in Rome, you'll notice a different language...

So a few key things to take away from this tutorial:

  • The process take a specific order, don't forget that or you could run into problems!
  • A minimum of three buffers is necessary for optimum recording performance
  • MM_WIM_DATA gives us callback to perform our processing wishes
  • waveInAddBuffer is the bread and butter of recording
  • Without this tutorial you'd be sitting behind the screen for much longer!

In all seriousness though, waveIn is very much like waveOut and while even I admit that not only the jump from waveOut to waveIn but also from DOS to Win32 could make an appearance of a very dauting and confusing task, I do try to keep it simple and to the point to get across what you need in order to do what you want with your low-level recorded data. If you feel that I have perhaps jumped over something important or don't understand something, please feel free to comment and ask. I'll be more than happy to clarify.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Software Developer
United States United States
I have been programming in C since 2004. Even with what I know now, I find that I am continually learning very rewarding stuff every single day.

You may also be interested in...


Comments and Discussions

GeneralMy vote of 5 Pin
Sreram K19-Oct-15 9:33
memberSreram K19-Oct-15 9:33 
GeneralRe: My vote of 5 Pin
suendisra27-Dec-15 18:04
membersuendisra27-Dec-15 18:04 
GeneralMy vote of 3 Pin
KarstenK8-Jul-14 22:28
memberKarstenK8-Jul-14 22:28 
GeneralRe: My vote of 3 Pin
suendisra19-Jul-14 9:26
membersuendisra19-Jul-14 9:26 
GeneralRe: My vote of 3 Pin
bling2-Jan-15 10:54
memberbling2-Jan-15 10:54 
GeneralRe: My vote of 3 Pin
suendisra27-Dec-15 18:01
membersuendisra27-Dec-15 18:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170217.1 | Last Updated 29 Jun 2013
Article Copyright 2013 by suendisra
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid