Click here to Skip to main content
11,705,763 members (43,615 online)
Click here to Skip to main content

Spectrogram generation in SampleTagger

, 10 Aug 2014 CPOL 8.7K 22
Rate this:
Please Sign up or sign in to vote.
Learn how to generate a spectrogram image from WAV,AIFF and REX files

Introduction

Universal Patch Finder (UPF) is a freeware designed to index various audio files from multiple directories. Every file can be tagged manually or automatically using plug-ins. The purpose of this article is to demonstrate how the plug-in SampleTagger, part of UPF 1.6, generate spectrograms.

How Fast Fourier Transform works

The goal of FFT is to transform a signal from the time domain to the frequency domain. Depending on which implementation you choose you have to do some computations before and after the FFT is performed. The first thing to do is to generate a very simple signal in a WAV file and pass it to the FFT to see what's happen. SampleTagger use the class FFT2 implemented by Gerald T. Beauregard

The input signal

Here a 10khz sine wave generated with Adobe Audition with sampling rate at 44.1Khz and bit depth at 16 (Standard CD quality). The first thing to do is to normalize the input values to the range [-1,1] because the FFT work in this range. SampleReader is responsible to do this normalization.

Figure 1: Test signal 10khz sampled at 44.1Khz in 16 bits

The FFT size

Now we pass the samples (the left channel of the stereo file) to the FFT. We must decide the size of the FFT (SampleTagger use 256 samples). It is very important to understand the following:

  • Half of the FFT output is relevant according to the Nyquist frequency. This mean if the input is 256 samples, the FFT produce only 128 values. If the input sample rate is 44100Hz, the 128 values represent frequencies from 0 to 22050Hz.
  • The FFT produce frequency samples (or spectral bin). A frequency sample is a complex number with real and imaginary part. The imaginary part give the phase and the real part give the amplitude. We have to compute the magnitude in dB from this to produce a nice spectrogram. The magnitude of a spectral bin is simply the amount of energy for the corresponding frequency.
  • Depending of the implementation, the FFT output is often not normalized. We have to normalize the output values.

Here the output using 256 samples. We only have 128 frequency samples. (note: this is the real part of complex numbers)

Figure 2: FFT output with 256 samples
 

The sampling frequency is 44100Hz, so each spectral bin is (44100/2)/128 = 172Hz. We have a peak near spectral bin 60 which is 60*172=10320 Hz. That's good ! This is our signal.

Here the output using 512 samples. We only have 256 frequency samples.

Figure 3: FFT output with 512 samples

The sampling frequency is 44100Hz, so each spectral bin is (44100/2)/256 = 86Hz. We have a peak near spectral bin 119 which is 119*86=10234 Hz. That's good but...

Did you see the amplitude is not the same ? (Two times bigger because the FFT size is now 512). This is because the output is not normalized. In SampleTagger we just divide the real part with the size of the FFT to do the trick. (Note: imaginary part does not need to be normalized)

// normalize output
fftState.re[i] = fftState.re[i] / fft_size;

Compute magnitudes

Now we can compute the magnitude from complex values. This is done with the good old Pythagorean theorem. Each complex number can be represented in a two-dimensional space. The real part is a, and the imaginary part is b.

Figure 4: Magnitude of a complex number a+ib

Magnitudes are stored in a two dimentional array magnitudes[x,y] where x is the nth FFT performed by SampleTagger and y is the nth magnitude in range [0,fft_size/2[

// compute magnitude with the good old Pythagorean theorem 
double product = re[y] * re[y] + im[y] * im[y];

magnitudes[x, y] = Math.Sqrt(product);

The final step is to convert magnitude in logarithmic scale because it is more readable. It is very important to understand the following:

  • The unit for logarithmic scale is the decibel (dB)
  • The decibel express the ratio between two physical quantity that can be power or intensity.

    If you deal with power, the formula is 10log(P1/P2)

    If you deal with intensity, the formula is 20log(I1/I2)

  • The output of our FFT is a magnitude, and it is an intensity, not a power. So the formula is 20log(magnitude)

So we convert the magnitude in dB with the following:

// convert to dB
magnitudes[x, y] = ((double)20) * Math.Log10(magnitudes[x, y] + Double.Epsilon);

Note the Double.Epsilon to handle the case where magnitude is 0

Here the output in dB for 256 samples

Figure 5: FFT output with 256 samples

 Here the output in dB for 512 samples

Figure 6: FFT output with 512 samples

Dealing with reality

The theory behind the FFT suppose the signal is periodic which is absolutely false in our case. This mean when you feed a FFT with 256 samples, the FFT make the assumption that this signal repeat itself to the infinite on both sides.  

Figure 7: FFT periodicity

Because the signal is not periodic, artifacts will appear on both sides of the FFT. To prevent this, the input signal must pass through a window function to fade the signal to 0 on both sides.

The window functions can be looked at as attempts to retain as much information of the data in the window as possible whilst shaping the influence of the data at the edges. There is tons of windows available. SampleTagger use the Hanning window (not the Hamming window)

// prepare Hanning analysis window 
for (int i = 0; i < fft_size; i++)
{
    window[i] = (4.0 / fft_size) * 0.5 * (1 - Math.Cos(2 * Math.PI * i / fft_size)); // Hanning Vector [1] = 1 / 4595889085.750801
}

Each input sample of the FFT is multiplied by the window:

// apply the Hanning window 
for (int i = 0; i < fft_size; i++)
{
    fftState.re[i] = fftState.input[i] * fftState.window[i];
    fftState.im[i] = 0;
}

Here an example how the input signal will be presented to the FFT. Note how the sides fade to 0.

Figure 8: Windowed input signal

 

This brings the following issue: The FFT accuracy is low on window sides. So if you plan to read an entire WAV file using a buffer of 256 samples, the accuracy of your spectrogram will be bad.

To avoid this, you have to use a sliding window:

Figure 9: Sliding Window Analysis

Sliding window analysis is a technique used to increase temporal precision whilst maintaining accuracy in lower frequencies; remember, when using the FFT, there’s a trade-off between temporal precision (window size) and frequency accuracy (especially in the lower frequencies).

SampleTagger slide the window with half of the FFT size, the following code slide the input signal. (Note: this is not the Hanning windowed signal ! It's the original input signal.)

// overlaping window fft_size / 2
Array.Copy(fftState.input, fft_size / 2, fftState.input, 0, fft_size / 2);
current_sample_count = fft_size/2;

Generating the audio spectrum

Audio spectrum (also named spectrogram) uses colors to plot magnitudes. Horizontal axis is the time, and vertical axis is the frequencies. As we saw, magnitudes are stored in a two dimentional array magnitudes[x,y]. So we have to choose a color for each magnitude to produce the spectrogram.

Figure 10: Typical Audio spectrum

The idea is to clip magnitudes values into a fixed range like [0dB,108dB]. If a magnitude is higher than 108dB, we force it to 108dB. Then we translate this to the range [0,100] as percentage. Then we pick a color in a gradient of 100 pixels. The following gradient is used by SampleTagger:

Figure 11: Color gradient

Because our FFT size is 256 samples, the spectrogram height is 128. The spectrogram width depends on the length of the input signal. We can calculate how many FFT can be performed in a signal if we know how many samples will be read:

nbFFT = (nbSamples / (fft_size / 2)) - 1; // overlaping window with fft_size/2
magnitudes = new double[nbFFT, fft_size/2];

The final step will stretch the spectrogram bitmap to the right size. Universal patch finder use 200x16

private Bitmap RescaleSpectrum(Bitmap spectrum, int width, int height)
{
    Bitmap rescaledSpectrum = new Bitmap(width, height,PixelFormat.Format24bppRgb);
    using (Graphics graph = Graphics.FromImage(rescaledSpectrum))
    {
        graph.InterpolationMode = InterpolationMode.High;
        graph.CompositingQuality = CompositingQuality.HighQuality;
        graph.SmoothingMode = SmoothingMode.AntiAlias;
        graph.DrawImage(spectrum, new Rectangle(0, 0, width, height));
        return rescaledSpectrum;
    }
}

Source code

The source code is provided in Universal Patch Finder SDK 

Documentation is provided in CHM format.

 
 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Hypercube Softwares
Software Developer (Senior)
France France
No Biography provided

You may also be interested in...

Comments and Discussions

 
BugHow about sample code that compiles? Pin
alex2001_ts12-Aug-14 3:14
professionalalex2001_ts12-Aug-14 3:14 
GeneralRe: How about sample code that compiles? Pin
Hypercube Softwares12-Aug-14 4:22
memberHypercube Softwares12-Aug-14 4:22 
GeneralRe: How about sample code that compiles? Pin
Hypercube Softwares12-Aug-14 7:54
memberHypercube Softwares12-Aug-14 7:54 
Questionimages are not display. Pin
jacky_zz11-Aug-14 0:05
memberjacky_zz11-Aug-14 0:05 
AnswerRe: images are not display. Pin
Hypercube Softwares11-Aug-14 1:25
memberHypercube Softwares11-Aug-14 1:25 
GeneralMy vote of 5 Pin
Volynsky Alex10-Aug-14 11:38
professionalVolynsky Alex10-Aug-14 11:38 
GeneralRe: My vote of 5 Pin
Hypercube Softwares10-Aug-14 22:15
memberHypercube Softwares10-Aug-14 22:15 
GeneralRe: My vote of 5 Pin
Volynsky Alex11-Aug-14 10:02
professionalVolynsky Alex11-Aug-14 10:02 
GeneralRe: My vote of 5 Pin
Hypercube Softwares11-Aug-14 10:29
memberHypercube Softwares11-Aug-14 10:29 
GeneralRe: My vote of 5 Pin
Volynsky Alex11-Aug-14 11:19
professionalVolynsky Alex11-Aug-14 11:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150819.1 | Last Updated 10 Aug 2014
Article Copyright 2014 by Hypercube Softwares
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid