Click here to Skip to main content
12,949,601 members (73,717 online)
Click here to Skip to main content
Add your own
alternative version


33 bookmarked
Posted 10 Aug 2014

Spectrogram Generation in SampleTagger

, 12 Dec 2016 CPOL
Rate this:
Please Sign up or sign in to vote.
Learn how to generate a spectrogram image from WAV, AIFF and REX files


Universal Patch Finder (UPF) is a freeware designed to index various audio files from multiple directories. Every file can be tagged manually or automatically using plug-ins. The purpose of this article is to demonstrate how the plug-in SampleTagger, part of UPF 1.6, generates spectrograms.

How Fast Fourier Transform Works

The goal of FFT is to transform a signal from the time domain to the frequency domain. Depending on which implementation you choose, you have to do some computations before and after the FFT is performed. The first thing to do is to generate a very simple signal in a WAV file and pass it to the FFT to see what's happen. SampleTagger use the class FFT2 implemented by Gerald T. Beauregard.

The Input Signal

Here, a 10khz sine wave is generated with Adobe Audition with sampling rate at 44.1Khz and bit depth at 16 (Standard CD quality). The first thing to do is to normalize the input values to the range [-1,1] because the FFT works in this range. <span target="T:SampleTagger.Parsers.SampleReader">SampleReader</span> is responsible to do this normalization.

Figure 1: Test signal 10khz sampled at 44.1Khz in 16 bits

The FFT Size

Now we pass the samples (the left channel of the stereo file) to the FFT. We must decide the size of the FFT (SampleTagger uses 256 samples). It is very important to understand the following:

  • Half of the FFT output is relevant according to the Nyquist frequency. This means if the input is 256 samples, the FFT produce only 128 values. If the input sample rate is 44100Hz, the 128 values represent frequencies from 0 to 22050Hz.
  • The FFT produce frequency samples (or spectral bin). A frequency sample is a complex number with real and imaginary part. The imaginary part gives the phase and the real part gives the amplitude. We have to compute the magnitude in dB from this to produce a nice spectrogram. The magnitude of a spectral bin is simply the amount of energy for the corresponding frequency.
  • Depending on the implementation, the FFT output is often not normalized. We have to normalize the output values.

Here, the output is using 256 samples. We only have 128 frequency samples. (Note: This is the real part of complex numbers.)

Figure 2: FFT output with 256 samples

The sampling frequency is 44100Hz, so each spectral bin is (44100/2)/128 = 172Hz. We have a peak near spectral bin 60 which is 60*172=10320 Hz. That's good ! This is our signal.

Here, the output is using 512 samples. We only have 256 frequency samples.

Figure 3: FFT output with 512 samples

The sampling frequency is 44100Hz, so each spectral bin is (44100/2)/256 = 86Hz. We have a peak near spectral bin 119 which is 119*86=10234 Hz. That's good but...

Did you see the amplitude is not the same ? (Two times bigger because the FFT size is now 512). This is because the output is not normalized. In SampleTagger, we just divide the real part with the size of the FFT to do the trick. (Note: Imaginary part does not need to be normalized.)

// normalize output[i] =[i] / fft_size;

Compute Magnitudes

Now we can compute the magnitude from complex values. This is done with the good old Pythagorean theorem. Each complex number can be represented in a two-dimensional space. The real part is a, and the imaginary part is b.

Figure 4: Magnitude of a complex number a+ib

Magnitudes are stored in a two dimensional array magnitudes[x,y] where x is the nth FFT performed by SampleTagger and y is the nth magnitude in range [0,fft_size/2].

// compute magnitude with the good old Pythagorean theorem 
double product = re[y] * re[y] + im[y] * im[y];

magnitudes[x, y] = Math.Sqrt(product);

The final step is to convert magnitude in logarithmic scale because it is more readable. It is very important to understand the following:

  • The unit for logarithmic scale is the decibel (dB).
  • The decibel expresses the ratio between two physical quantities that can be power or intensity.

    If you deal with power, the formula is 10log(P1/P2).

    If you deal with intensity, the formula is 20log(I1/I2).

  • The output of our FFT is a magnitude, and it is an intensity, not a power. So the formula is 20log(magnitude).

So we convert the magnitude in dB with the following:

// convert to dB
magnitudes[x, y] = ((double)20) * Math.Log10(magnitudes[x, y] + Double.Epsilon);

Note the Double.Epsilon to handle the case where magnitude is 0.

Here is the output in dB for 256 samples:

Figure 5: FFT output with 256 samples

Here is the output in dB for 512 samples.

Figure 6: FFT output with 512 samples

Dealing with Reality

The theory behind the FFT supposes the signal is periodic which is absolutely false in our case. This means when you feed a FFT with 256 samples, the FFT makes the assumption that this signal repeats itself to the infinite on both sides.

Figure 7: FFT periodicity

Because the signal is not periodic, artifacts will appear on both sides of the FFT. To prevent this, the input signal must pass through a window function to fade the signal to 0 on both sides.

The window functions can be looked at as attempts to retain as much information of the data in the window as possible whilst shaping the influence of the data at the edges. There are tons of windows available. SampleTagger uses the Hanning window (not the Hamming window).

// prepare Hanning analysis window 
for (int i = 0; i < fft_size; i++)
    window[i] = (4.0 / fft_size) * 0.5 * 
     (1 - Math.Cos(2 * Math.PI * i / fft_size)); // Hanning Vector [1] = 1 / 4595889085.750801

Each input sample of the FFT is multiplied by the window:

// apply the Hanning window 
for (int i = 0; i < fft_size; i++)
{[i] = fftState.input[i] * fftState.window[i];[i] = 0;

Here is an example of how the input signal will be presented to the FFT. Note how the sides fade to 0.

Figure 8: Windowed input signal

This brings the following issue: The FFT accuracy is low on window sides. So if you plan to read an entire WAV file using a buffer of 256 samples, the accuracy of your spectrogram will be bad.

To avoid this, you have to use a sliding window:

Figure 9: Sliding Window Analysis

Sliding window analysis is a technique used to increase temporal precision whilst maintaining accuracy in lower frequencies; remember, when using the FFT, there’s a trade-off between temporal precision (window size) and frequency accuracy (especially in the lower frequencies).

SampleTagger slides the window with half of the FFT size, the following code slides the input signal. (Note: This is not the Hanning windowed signal! It's the original input signal.)

// overlapping window fft_size / 2
Array.Copy(fftState.input, fft_size / 2, fftState.input, 0, fft_size / 2);
current_sample_count = fft_size/2;

Generating the Audio Spectrum

Audio spectrum (also named spectrogram) uses colors to plot magnitudes. Horizontal axis is the time, and vertical axis is the frequencies. As we saw, magnitudes are stored in a two dimensional array magnitudes[x,y]. So we have to choose a color for each magnitude to produce the spectrogram.

Figure 10: Typical Audio spectrum

The idea is to clip magnitudes values into a fixed range like [0dB,108dB]. If a magnitude is higher than 108dB, we force it to 108dB. Then we translate this to the range [0,100] as percentage. Then we pick a color in a gradient of 100 pixels. The following gradient is used by SampleTagger:


Figure 11: Color gradient

Because our FFT size is 256 samples, the spectrogram height is 128. The spectrogram width depends on the length of the input signal. We can calculate how many FFT can be performed in a signal if we know how many samples will be read:

nbFFT = (nbSamples / (fft_size / 2)) - 1; // overlapping window with fft_size/2
magnitudes = new double[nbFFT, fft_size/2];
nbFFT = (nbSamples / (fft_size / 2)) - 1; // overlapping window with fft_size/2
magnitudes = new double[nbFFT, fft_size/2];

The final step will stretch the spectrogram bitmap to the right size. Universal patch finder uses 200x16.

private Bitmap RescaleSpectrum(Bitmap spectrum, int width, int height)
    Bitmap rescaledSpectrum = new Bitmap(width, height,PixelFormat.Format24bppRgb);
    using (Graphics graph = Graphics.FromImage(rescaledSpectrum))
        graph.InterpolationMode = InterpolationMode.High;
        graph.CompositingQuality = CompositingQuality.HighQuality;
        graph.SmoothingMode = SmoothingMode.AntiAlias;
        graph.DrawImage(spectrum, new Rectangle(0, 0, width, height));
        return rescaledSpectrum;

Source Code

The source code is provided in Universal Patch Finder SDK.

Documentation is provided in CHM format.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Hypercube Softwares
Software Developer (Senior)
France France
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralMy vote of 5 Pin
0x01AA27-Apr-17 23:25
professional0x01AA27-Apr-17 23:25 
BugHow about sample code that compiles? Pin
alex2001_ts12-Aug-14 3:14
professionalalex2001_ts12-Aug-14 3:14 
GeneralRe: How about sample code that compiles? Pin
Hypercube Softwares12-Aug-14 4:22
memberHypercube Softwares12-Aug-14 4:22 
GeneralRe: How about sample code that compiles? Pin
Hypercube Softwares12-Aug-14 7:54
memberHypercube Softwares12-Aug-14 7:54 
Questionimages are not display. Pin
jacky_zz11-Aug-14 0:05
memberjacky_zz11-Aug-14 0:05 
AnswerRe: images are not display. Pin
Hypercube Softwares11-Aug-14 1:25
memberHypercube Softwares11-Aug-14 1:25 
GeneralMy vote of 5 Pin
Volynsky Alex10-Aug-14 11:38
professionalVolynsky Alex10-Aug-14 11:38 
GeneralRe: My vote of 5 Pin
Hypercube Softwares10-Aug-14 22:15
memberHypercube Softwares10-Aug-14 22:15 
GeneralRe: My vote of 5 Pin
Volynsky Alex11-Aug-14 10:02
professionalVolynsky Alex11-Aug-14 10:02 
GeneralRe: My vote of 5 Pin
Hypercube Softwares11-Aug-14 10:29
memberHypercube Softwares11-Aug-14 10:29 
GeneralRe: My vote of 5 Pin
Volynsky Alex11-Aug-14 11:19
professionalVolynsky Alex11-Aug-14 11:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170524.1 | Last Updated 13 Dec 2016
Article Copyright 2014 by Hypercube Softwares
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid