|







Introduction
The Fast Fourier Transform (FFT) allows users to view the spectrum content of an audio signal. The FFT code presented here was written by Don Cross, his homepage appears to have subsequently been taken down. Rather than explain the mathematical theory of the FFT, I will attempt to explain its usefulness as it relates to audio signals.
The FFT allows users to obtain the spectral makeup of an audio signal, obtain the decibels of its various frequencies, or obtain the intensity of its various frequencies. Spectral viewers (shown in the image above), Equalizers, or VU-Meters may all use the FFT in order to display their results. The difference between them then depends upon one of a couple of equations that take the real and imaginary components of the FFT, and return either the intensity or decibel levels to be used in the graphed result. The following code takes both the real and imaginary components of the FFT result, and returns the intensity and decibels. inline double GetFrequencyIntensity(double re, double im)
{
return sqrt((re*re)+(im*im));
}
#define mag_sqrd(re,im) (re*re+im*im)
#define Decibels(re,im) ((re == 0 && im == 0) ? (0) :
10.0 * log10(double(mag_sqrd(re,im))))
#define Amplitude(re,im,len) (GetFrequencyIntensity(re,im)/(len))
#define AmplitudeScaled(re,im,len,scale) ((int)Amplitude(re,im,len)%scale)
The FFT uses the audio signal as its real component, and uses a NULL pointer for its imaginary component indicating that the imaginary data does not exist. Upon its return, the FFT will return both the real and imaginary data components based upon the data given as the real component. The is mirrored with the return samples so that 0-FFT_LEN/2 contains the data, and FFT_LEN/2 to FFT_LEN contains a reverse of the data. This mistake was corrected in my code. The code that performs the FFT follows: DWORD nCount = 0;
for (DWORD dw = 0; dw < FFT_LEN; dw++)
{
{
finleft[nCount] = (double)((short*)pwh->lpData)[dw++];
finright[nCount++] = (double)((short*)pwh->lpData)[dw];
}
}
fft_double(FFT_LEN/2,0,finleft,NULL,fout,foutimg);
float re,im,fmax=-99999.9f,fmin=99999.9f;
for(int i=1;i < FFT_LEN/4-1;i++)
{
re = fout[i];
im = foutimg[i];
fdraw[i]=AmplitudeScaled(re,im,FFT_LEN/2,256);
if (fdraw[i] > fmax)
{
fmax = fdraw[i];
}
if (fdraw[i] < fmin)
{
fmin = fdraw[i];
}
}
int nAvg, nBars=16, nCur = 0;
for(int i=1;i < FFT_LEN/4;i++)
{
nAvg = 0;
for (int n=0; n < nBars; n++)
{
nAvg += (int)fdraw[i];
}
nAvg /= nBars;
TRACE("Average for Bar#%d is %d\n",nCur++,nAvg);
i+=nBars-1;
}
DataHolder* pDataHolder = (DataHolder*)lpData;
CFrequencyGraph* pPeak = (CFrequencyGraph*)pDataHolder->pData;
if (::IsWindow(pPeak->GetSafeHwnd()))
{
pPeak->SetYRange(0,256);
pPeak->Update(FFT_LEN/4,fdraw);
}
fmax=-99999.9f,fmin=99999.9f;
fft_double(FFT_LEN/2,0,finright,NULL,fout,foutimg);
fdraw[0] = fdraw[FFT_LEN/4] = 0;
for(i=1;i < FFT_LEN/4-1;i++)
{
re = fout[i];
im = foutimg[i];
fdraw[i] = Decibels(re,im);
if (fdraw[i] > fmax)
{
fmax = fdraw[i];
}
if (fdraw[i] < fmin)
{
fmin = fdraw[i];
}
}
CFrequencyGraph* pPeak2 = (CFrequencyGraph*)pDataHolder->pData2;
if (::IsWindow(pPeak2->GetSafeHwnd()))
{
pPeak2->SetNumberOfSteps(50);
pPeak2->SetYRange((int)fmin,(int)fmax);
pPeak2->Update(FFT_LEN/4,fdraw);
}
This code is contained in a callback function that is called every time the waveIn functions return with updated audio signal data. The code that actually performs the FFT looks like: void fft_double (unsigned int p_nSamples, bool p_bInverseTransform,
double *p_lpRealIn, double *p_lpImagIn,
double *p_lpRealOut, double *p_lpImagOut)
{
if(!p_lpRealIn || !p_lpRealOut || !p_lpImagOut) return;
unsigned int NumBits;
unsigned int i, j, k, n;
unsigned int BlockSize, BlockEnd;
double angle_numerator = 2.0 * PI;
double tr, ti;
if( !IsPowerOfTwo(p_nSamples) )
{
return;
}
if( p_bInverseTransform ) angle_numerator = -angle_numerator;
NumBits = NumberOfBitsNeeded ( p_nSamples );
for( i=0; i < p_nSamples; i++ )
{
j = ReverseBits ( i, NumBits );
p_lpRealOut[j] = p_lpRealIn[i];
p_lpImagOut[j] = (p_lpImagIn == NULL) ? 0.0 : p_lpImagIn[i];
}
BlockEnd = 1;
for( BlockSize = 2; BlockSize <= p_nSamples; BlockSize <<= 1 )
{
double delta_angle = angle_numerator / (double)BlockSize;
double sm2 = sin ( -2 * delta_angle );
double sm1 = sin ( -delta_angle );
double cm2 = cos ( -2 * delta_angle );
double cm1 = cos ( -delta_angle );
double w = 2 * cm1;
double ar[3], ai[3];
for( i=0; i < p_nSamples; i += BlockSize )
{
ar[2] = cm2;
ar[1] = cm1;
ai[2] = sm2;
ai[1] = sm1;
for ( j=i, n=0; n < BlockEnd; j++, n++ )
{
ar[0] = w*ar[1] - ar[2];
ar[2] = ar[1];
ar[1] = ar[0];
ai[0] = w*ai[1] - ai[2];
ai[2] = ai[1];
ai[1] = ai[0];
k = j + BlockEnd;
tr = ar[0]*p_lpRealOut[k] - ai[0]*p_lpImagOut[k];
ti = ar[0]*p_lpImagOut[k] + ai[0]*p_lpRealOut[k];
p_lpRealOut[k] = p_lpRealOut[j] - tr;
p_lpImagOut[k] = p_lpImagOut[j] - ti;
p_lpRealOut[j] += tr;
p_lpImagOut[j] += ti;
}
}
BlockEnd = BlockSize;
}
if( p_bInverseTransform )
{
double denom = (double)p_nSamples;
for ( i=0; i < p_nSamples; i++ )
{
p_lpRealOut[i] /= denom;
p_lpImagOut[i] /= denom;
}
}
}
And it requires the following supporting functions:
bool IsPowerOfTwo( unsigned int p_nX )
{
if( p_nX < 2 ) return false;
if( p_nX & (p_nX-1) ) return false;
return true;
}
unsigned int NumberOfBitsNeeded( unsigned int p_nSamples )
{
int i;
if( p_nSamples < 2 )
{
return 0;
}
for ( i=0; ; i++ )
{
if( p_nSamples & (1 << i) ) return i;
}
}
unsigned int ReverseBits(unsigned int p_nIndex, unsigned int p_nBits)
{
unsigned int i, rev;
for(i=rev=0; i < p_nBits; i++)
{
rev = (rev << 1) | (p_nIndex & 1);
p_nIndex >>= 1;
}
return rev;
}
double Index_to_frequency(unsigned int p_nBaseFreq,
unsigned int p_nSamples, unsigned int p_nIndex)
{
if(p_nIndex >= p_nSamples)
{
return 0.0;
}
else if(p_nIndex <= p_nSamples/2)
{
return ( (double)p_nIndex /
(double)p_nSamples * p_nBaseFreq );
}
else
{
return ( -(double)(p_nSamples-p_nIndex) /
(double)p_nSamples * p_nBaseFreq );
}
}
The included sample class, CFrequencyGraph, will draw a sample EQ, peak meter, and spectral graph using the frequency intensity. Hopefully, this serves as a decent introduction to the uses and basics of the Fast Fourier Transform for audio signals. Other functions of the FFT include using it in combination with a Beat Detection Algorithm to detect beats in an audio signal. Another page with useful FFT information is located at FFT Spectrum Analyser.
This article uses waveIn* functions to retrieve the soundcard's data from the playing source. Therefore, you must manually open your soundcard properties and change the recording source to use mono/stereo mix or waveIn as the selected recording line. Also, if you set the playback wave option to false, this will override the recording options and no sound will appear. For information about doing an inverse FFT for transforming frequencies into an audio signal, please do a Google search on "Inverse FFT" or "IFFT".
History
- Version 1.3: Fixed Spectrum drawing, Changed Pixelgram color, changed Process routine to match new Recording parameters.
- Version 1.2: Fixed many drawing bugs, changed to use amplitude scaled and decibels.
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 145 (Total in Forum: 145) (Refresh) | FirstPrevNext |
|
 |
|
|
hi, i got a program from codeproject which playes wave file with directsound. and i want to modify it and add spectrum support to it. but i got some questiones about it, can you help me resolve it?
question one: the program open wave file and read wave file information to struct WAVEFORMATEX, the members of it like wFormatTag=1, nChannels=2, nSamplesPerSec=44100, nAvgBytesPerSec=176400, nBlockAlign=4, wBitsPerSample=16, cbSize=0. then program initial DirectSoundBuffer's size as 176400, we must set size to 176400, it can be set any number?
question two: i add code to log DirectSoundBuffer's current play position and current write position to listbox control, the max value displayed in it which each of them may big than 176400, how can i get right start postion what i want to sample the buffer? code like:
DWORD dwCurPlayPos, dwCurWritePos; m_lpDSB->GetCurrentPosition(&dwCurPlayPos, &dwCurWritePos);
question three: if i get right start postion of buffer, what right sample size may be suggest?
code modified by me can download here.
regards, jacky_zz
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello,
i don't understand, why the mod-operator % and not the div-operator / is used for scaling the amplitude:
#define AmplitudeScaled(re,im,len,scale) ((int)Amplitude(re,im,len)%scale)
Maybe someone can answer this question?
Best regards, Tabor25
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
I was wondering if anyone has attempting to integrate the above code into an xcode project? If so, what ammendments have you made? Cheers Eoghan
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi....
your article is great.... it's very useful for me. but i have a question for you.. can the source code add by sound from using sound card?? so the result is waveform from the sound by using sound card.
I really appreciate it if you can help me. because it can help finish my task. please.....
thank you....
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
 |
|
|
Thank you everybody, can someone help me please, i downloaded the code and tried to compile with, visual C++ 6.0 but i get this error, any idea of how fix it:
--------------------Configuration: waveInFFT - Win32 Debug------------------- Compiling... waveInFFTDlg.cpp C:\waveinFFT_demo\waveInFFTDlg.cpp(143) : error C2374: 'i' : redefinition; multiple initialization C:\waveinFFT_demo\waveInFFTDlg.cpp(125) : see declaration of 'i' C:\waveinFFT_demo\waveInFFTDlg.cpp(169) : error C2374: 'i' : redefinition; multiple initialization C:\waveinFFT_demo\waveInFFTDlg.cpp(125) : see declaration of 'i' Error executing cl.exe. waveInFFT.exe - 2 error(s), 0 warning(s)
ED
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
Rename one of the variables "i" to another letter, perhaps "j", change the use of "i" within the following "for" loop from i to j, and that should fix the problem. I do not have Visual C++ 6.0 so I can not test the code on that compiler.
Nothing is impossible, It's merely a matter of finding an answer to the question of HOW? ... And answering that question is usually the most difficult part of the job!!!
|
| Sign In·View Thread·PermaLink | 4.00/5 (2 votes) |
|
|
|
 |
|
|
Hi really useful programme. Just trying to have a go at some simple DSP. Quick question, if you print the audio sample values "(double)((short*)pwh->lpData)[dw++];" to a text file and load it up into a sound editor, adobe audition can take plain text input, so can matlab, and set the sampling rate to 44100, as defined in the programme's sampling rate, the audio you are doing the FFT on plays back at the wrong speed (speed up) and is of poor, but audible, quality. Any ideas why there is a difference in the sampling rate and why the audio recording is so noisy.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The noise in the recording, which you probably noticed by the program display when no audio is playing, has to do with the recording properties. Depending upon what sound card a person is using, there may or may not be a Wave/MP3 line in the recording properties. Otherwise the selected recording line, (and even if there is a wave line) is usually set to "Stereo" or "What U Hear" or some such term. In effect what this does is combine all the recording lines into one source. This includes feedback from the Mic Line. If you can try selecting the WAVE/MP3 line in the recording options of your sound card. That should take care of the issue. If you select that line, you will notice the program no longer displays any noise when no audio is playing. If you do not have such a line in your volume control recording options, then I do not know what you can do other than changing the program from waveIn recording, to waveOut playing but that will require extensive changes.
Nothing is impossible, It's merely a matter of finding an answer to the question of HOW? ... And answering that question is usually the most difficult part of the job!!!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Fred, thanks for the help although this is not the case, I have tested it to be sure. All are muted apart from the microphone I am trying to record off and the value comes out distorted, not just with added low level noise. This manifests itself audible like distortion as if the sound card was being overdriven (the input was too loud) expect it is not reaching the max values of the buffers 2 bytes per samples for the 16 bit sample resolution and is within the normal recording levels for the device. Any more ideas would be greatfuly recieved. I have ran a sweeping sin wav through and got frequency components addeded that arn't in the original. I also ran through a fixed 100 Hz signal and found that what ever is in each buffer does not produce a continous sine wav. ie. when it starts a new set of 1024 samples it does not follow on from the old ones (for example with max and mins. 1 0 -1 0 ... [up to 1023 then] ... 0 (does not follow and does to something it shouldn't) 0 -1 ....). This sounded like a buffer problem to me, so I upped the number of buffers being used from 16 (as per the original code) to 300 which decreases the frequent occurance of these errors. Any ideas?
-- modified at 19:48 Wednesday 30th May, 2007
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The very reason is what you stated.
ChrisMitchell_ wrote: the microphone I am trying to record off
Microphones have inherent distortion in them on most systems. Try using a line in or wave file as input instead of microphone.
Nothing is impossible, It's merely a matter of finding an answer to the question of HOW? ... And answering that question is usually the most difficult part of the job!!!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Something seems severely wrong in this entire demo. I wouldn't recommend use this as a basis for anyone learning FFT or beat detection.
1) The code below does almost nothing since 'nAvg' is calculating the average of the same number 'fdraw[i]'. 2) 'i' is incremented in the loop statement AND in the body of the loop inexplicably 3) Even if you mute all audio input devices, you'll still get garbage.
for(int i=1;i < FFT_LEN/4;i++) { nAvg = 0; for (int n=0; n < nBars; n++) { nAvg += (int)fdraw[i]; } nAvg /= nBars; //Send data here to something, //nothing to send it to so we print it. TRACE("Average for Bar#%d is %d\n",nCur++,nAvg); i+=nBars-1; }
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I just wanted to say thanks for posting this project. I have only just started with C/C++ , and this project has been a great help .as I said before in my last post I wanted to get this to work with an audio file , its taken me a few weeks but I have now got a small program that runs in a dos box ,it reads and audio file and writes out the FFT data to a text file. this is something I have needed for some time but I had no way of knowing how to do it . but thanks to you and a few late nights I have learnt how to make it work. I still have a few small bits to sort out in my program but its well on its way to being 100% ready . I would not have been able to do it with out your waveinfft project to get me started so Fred thank you very much ...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Very nice demo!
One question, though. In the code, you specify the number of channels is 1 (in both the PCMFORMAT and WAVEFORMATEX structs). However, you are still reading data from the both the left and right channels. Why did you not specify the number of channels as 2? What would have been the difference (if any) if 2 channels were specified?
Thanks.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
jakeparks wrote: Why did you not specify the number of channels as 2?
This was a minor oversite on my part. I corrected the issue, and made a few other changes, like changing the FFT size and correcting the spectrum display. I have submitted the code and it should be available in a few days.
Nothing is impossible, It's merely a matter of finding an answer to the question of HOW? ... And answering that question is usually the most difficult part of the job!!!
|
| Sign In·View Thread·PermaLink | 2.25/5 (3 votes) |
|
|
|
 |
|
|
It seems true, however, that even if you specify the number of channels as 1, that the data from the left and right channels are interleaved. What then, if any, is the result of specifying the number of channels as 1? Does it really not matter at all?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The number of channels specified in the WAVEFORMATEX structure makes a difference. The is sent to the waveInOpen function to tell the input how to format the data that we will retrieve. If one channel is specified, the data returned in the WAVEHDR buffer is for one channel only, if two channels are specified, then the data returned is interleaved for two channels. Attempting to extract two channels from one channel data would cause an issue such as:
One channel data 10101010 Left extracted 1111 Right Extracted 0000
instead of: Two channel left 10101010 Two channel right 10101010
Nothing is impossible, It's merely a matter of finding an answer to the question of HOW? ... And answering that question is usually the most difficult part of the job!!!
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Nice app fred, thanks
Here is an active link to a Don Cross page which may be of interest this project's users, http://groovit.disjunkt.com/analog/time-domain/timefilt.html
It contains a link to FFTW.org which is also a good FFT resource.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Fred,
Do you know what frequencies the groupings of the FFT results correspond to?
Or even the frequency range of the entire FFT output?
Thanks. Conor.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Just to elaborate on my question; The range would be from 0Hz to Fmax where Fmax is given by
the sample rate / 2.
The sample rate as set in the constructor of the recorder is 11025Hz which suggests the FFT has a range from 0 to 5.5KHz. I am almost positive this is not the case however after testing the FFT one some isolated recordings of different instruments. A human voice has a frequency range of about 350Hz somewhere between 85Hz to 1.1KHz, yet when the FFT is performed on a human voice, it shows frequencies in the upper bands (about 5KHz). Surely, even the harmonics of the voice couldn't stretch this far.
Hence, I think maybe I'm wrong about the sample rate being 11025. Could you help me out? If you could reply quickly, I'd really appreciate it.
Thanks. Conor.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I am not sure exactly what the human voice response range is, but I do know that the telephone system cuts off part of the voice, and they have a frequency range of 0 to 4Khz. Though I have heard that the voice will range above 5KHz sometimes.
Nothing is impossible, It's merely a matter of finding an answer to the question of HOW? ... And answering that question is usually the most difficult part of the job!!!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
One last question..
When I change the frequency in the Recorder constructor (m_PcmFormat.dwSampleRate = 11025 to say, 44100, this obviously changes the sample rate for the microphone. I'm not actually using the microphone to collect samples though, I've changed the code so it's now taking them from wav files.
It seems to slow the machine down a lot, which suggests that more FFT's are being performed(44100 as opposed to 11025 a second). Is this the case? If so, where in the code does this occur? The fft_double routine is called in Process() but this doesn't seem to be dependant in any way on the sample rate which is why I can't see why it's slowing down so much.
It must have something to do with the dwSampleRate but I just can't find where. I'm guessing maybe the sound device is returning more, 44100 times a second instead of 11025 times (when the sample rate is changed to 44100Hz), and this callback calls the FFT routine, but I'm not sure as I can't find it in the code.
Thanks.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
There are actually a number of things occuring here when you increase the sample rate. 1) FFT is being called more often. It is a relavitely computationally intensive routine increase CPU use. 2) You are reading from the hard drive which is relatively slow to begin with compared to using memory and CPU while reading from a hardware device. 3) Just as the FFT is being called more often, the Drawing routines are being called more often. They are using SetPixel and LineTo functions which are relatively slow methods of drawing. As they are called more often, that means more pixels need to be processed increase CPU usage.
Nothing is impossible, It's merely a matter of finding an ans | | | | | |