12,949,601 members (73,717 online)
alternative version

#### Stats

16.7K views
33 bookmarked
Posted 10 Aug 2014

# Spectrogram Generation in SampleTagger

, 12 Dec 2016 CPOL
 Rate this:
Learn how to generate a spectrogram image from WAV, AIFF and REX files

## Introduction

Universal Patch Finder (UPF) is a freeware designed to index various audio files from multiple directories. Every file can be tagged manually or automatically using plug-ins. The purpose of this article is to demonstrate how the plug-in SampleTagger, part of UPF 1.6, generates spectrograms.

## How Fast Fourier Transform Works

The goal of FFT is to transform a signal from the time domain to the frequency domain. Depending on which implementation you choose, you have to do some computations before and after the FFT is performed. The first thing to do is to generate a very simple signal in a WAV file and pass it to the FFT to see what's happen. SampleTagger use the class FFT2 implemented by Gerald T. Beauregard.

## The Input Signal

Here, a 10khz sine wave is generated with Adobe Audition with sampling rate at 44.1Khz and bit depth at 16 (Standard CD quality). The first thing to do is to normalize the input values to the range [-1,1] because the FFT works in this range. `<span target="T:SampleTagger.Parsers.SampleReader">SampleReader</span>` is responsible to do this normalization.

Figure 1: Test signal 10khz sampled at 44.1Khz in 16 bits

## The FFT Size

Now we pass the samples (the left channel of the stereo file) to the FFT. We must decide the size of the FFT (`SampleTagger` uses 256 samples). It is very important to understand the following:

• Half of the FFT output is relevant according to the Nyquist frequency. This means if the input is 256 samples, the FFT produce only 128 values. If the input sample rate is 44100Hz, the 128 values represent frequencies from 0 to 22050Hz.
• The FFT produce frequency samples (or spectral bin). A frequency sample is a complex number with real and imaginary part. The imaginary part gives the phase and the real part gives the amplitude. We have to compute the magnitude in dB from this to produce a nice spectrogram. The magnitude of a spectral bin is simply the amount of energy for the corresponding frequency.
• Depending on the implementation, the FFT output is often not normalized. We have to normalize the output values.

Here, the output is using 256 samples. We only have 128 frequency samples. (Note: This is the real part of complex numbers.)

Figure 2: FFT output with 256 samples

The sampling frequency is 44100Hz, so each spectral bin is (44100/2)/128 = 172Hz. We have a peak near spectral bin 60 which is 60*172=10320 Hz. That's good ! This is our signal.

Here, the output is using 512 samples. We only have 256 frequency samples.

Figure 3: FFT output with 512 samples

The sampling frequency is 44100Hz, so each spectral bin is (44100/2)/256 = 86Hz. We have a peak near spectral bin 119 which is 119*86=10234 Hz. That's good but...

Did you see the amplitude is not the same ? (Two times bigger because the FFT size is now 512). This is because the output is not normalized. In `SampleTagger`, we just divide the real part with the size of the FFT to do the trick. (Note: Imaginary part does not need to be normalized.)

```// normalize output
fftState.re[i] = fftState.re[i] / fft_size;```

## Compute Magnitudes

Now we can compute the magnitude from complex values. This is done with the good old Pythagorean theorem. Each complex number can be represented in a two-dimensional space. The real part is `a`, and the imaginary part is `b`.

Figure 4: Magnitude of a complex number a+ib

Magnitudes are stored in a two dimensional array `magnitudes[x,y]` where `x` is the nth FFT performed by `SampleTagger` and `y` is the nth magnitude in range `[0,fft_size/2]`.

```// compute magnitude with the good old Pythagorean theorem
double product = re[y] * re[y] + im[y] * im[y];

magnitudes[x, y] = Math.Sqrt(product);```

The final step is to convert magnitude in logarithmic scale because it is more readable. It is very important to understand the following:

• The unit for logarithmic scale is the decibel (dB).
• The decibel expresses the ratio between two physical quantities that can be power or intensity.

If you deal with power, the formula is `10log(P1/P2)`.

If you deal with intensity, the formula is `20log(I1/I2)`.

• The output of our FFT is a magnitude, and it is an intensity, not a power. So the formula is `20log(magnitude)`.

So we convert the magnitude in dB with the following:

```// convert to dB
magnitudes[x, y] = ((double)20) * Math.Log10(magnitudes[x, y] + Double.Epsilon);```

Note the `Double.Epsilon` to handle the case where magnitude is `0`.

Here is the output in dB for 256 samples:

Figure 5: FFT output with 256 samples

Here is the output in dB for 512 samples.

Figure 6: FFT output with 512 samples

## Dealing with Reality

The theory behind the FFT supposes the signal is periodic which is absolutely false in our case. This means when you feed a FFT with 256 samples, the FFT makes the assumption that this signal repeats itself to the infinite on both sides.

Figure 7: FFT periodicity

Because the signal is not periodic, artifacts will appear on both sides of the FFT. To prevent this, the input signal must pass through a window function to fade the signal to 0 on both sides.

The window functions can be looked at as attempts to retain as much information of the data in the window as possible whilst shaping the influence of the data at the edges. There are tons of windows available. `SampleTagger` uses the Hanning window (not the Hamming window).

```// prepare Hanning analysis window
for (int i = 0; i < fft_size; i++)
{
window[i] = (4.0 / fft_size) * 0.5 *
(1 - Math.Cos(2 * Math.PI * i / fft_size)); // Hanning Vector [1] = 1 / 4595889085.750801
}```

Each input sample of the FFT is multiplied by the window:

```// apply the Hanning window
for (int i = 0; i < fft_size; i++)
{
fftState.re[i] = fftState.input[i] * fftState.window[i];
fftState.im[i] = 0;
}```

Here is an example of how the input signal will be presented to the FFT. Note how the sides fade to `0`.

Figure 8: Windowed input signal

This brings the following issue: The FFT accuracy is low on window sides. So if you plan to read an entire WAV file using a buffer of 256 samples, the accuracy of your spectrogram will be bad.

To avoid this, you have to use a sliding window:

Figure 9: Sliding Window Analysis

Sliding window analysis is a technique used to increase temporal precision whilst maintaining accuracy in lower frequencies; remember, when using the FFT, there’s a trade-off between temporal precision (window size) and frequency accuracy (especially in the lower frequencies).

`SampleTagger` slides the window with half of the FFT size, the following code slides the input signal. (Note: This is not the Hanning windowed signal! It's the original input signal.)

```// overlapping window fft_size / 2
Array.Copy(fftState.input, fft_size / 2, fftState.input, 0, fft_size / 2);
current_sample_count = fft_size/2;```

## Generating the Audio Spectrum

Audio spectrum (also named spectrogram) uses colors to plot magnitudes. Horizontal axis is the time, and vertical axis is the frequencies. As we saw, magnitudes are stored in a two dimensional array `magnitudes[x,y]`. So we have to choose a color for each magnitude to produce the spectrogram.

Figure 10: Typical Audio spectrum

The idea is to clip magnitudes values into a fixed range like [0dB,108dB]. If a magnitude is higher than 108dB, we force it to 108dB. Then we translate this to the range [0,100] as percentage. Then we pick a color in a gradient of 100 pixels. The following gradient is used by `SampleTagger`:

Because our FFT size is 256 samples, the spectrogram height is 128. The spectrogram width depends on the length of the input signal. We can calculate how many FFT can be performed in a signal if we know how many samples will be read:

```nbFFT = (nbSamples / (fft_size / 2)) - 1; // overlapping window with fft_size/2
magnitudes = new double[nbFFT, fft_size/2];```
```nbFFT = (nbSamples / (fft_size / 2)) - 1; // overlapping window with fft_size/2
magnitudes = new double[nbFFT, fft_size/2];```

The final step will stretch the spectrogram bitmap to the right size. Universal patch finder uses 200x16.

```private Bitmap RescaleSpectrum(Bitmap spectrum, int width, int height)
{
Bitmap rescaledSpectrum = new Bitmap(width, height,PixelFormat.Format24bppRgb);
using (Graphics graph = Graphics.FromImage(rescaledSpectrum))
{
graph.InterpolationMode = InterpolationMode.High;
graph.CompositingQuality = CompositingQuality.HighQuality;
graph.SmoothingMode = SmoothingMode.AntiAlias;
graph.DrawImage(spectrum, new Rectangle(0, 0, width, height));
return rescaledSpectrum;
}
}```

## Source Code

The source code is provided in Universal Patch Finder SDK.

Documentation is provided in CHM format.

## Share

 Software Developer (Senior) France
No Biography provided

## You may also be interested in...

 First Prev Next
 My vote of 5 0x01AA27-Apr-17 23:25 0x01AA 27-Apr-17 23:25
 How about sample code that compiles? alex2001_ts12-Aug-14 3:14 alex2001_ts 12-Aug-14 3:14
 Re: How about sample code that compiles? Hypercube Softwares12-Aug-14 4:22 Hypercube Softwares 12-Aug-14 4:22
 Re: How about sample code that compiles? Hypercube Softwares12-Aug-14 7:54 Hypercube Softwares 12-Aug-14 7:54
 images are not display. jacky_zz11-Aug-14 0:05 jacky_zz 11-Aug-14 0:05
 Re: images are not display. Hypercube Softwares11-Aug-14 1:25 Hypercube Softwares 11-Aug-14 1:25
 My vote of 5 Volynsky Alex10-Aug-14 11:38 Volynsky Alex 10-Aug-14 11:38
 Re: My vote of 5 Hypercube Softwares10-Aug-14 22:15 Hypercube Softwares 10-Aug-14 22:15
 Re: My vote of 5 Volynsky Alex11-Aug-14 10:02 Volynsky Alex 11-Aug-14 10:02
 Re: My vote of 5 Hypercube Softwares11-Aug-14 10:29 Hypercube Softwares 11-Aug-14 10:29
 Re: My vote of 5 Volynsky Alex11-Aug-14 11:19 Volynsky Alex 11-Aug-14 11:19
 Last Visit: 31-Dec-99 18:00     Last Update: 25-May-17 7:52 Refresh 1