## Introduction

In this short article, we will compare a couple of Fast Fourier Transform (FFT) implementations for the .NET platform. The contestants are:

| Accord | Exocortex | Math.NET | NWaves | NAudio | Lomont | DSPLib | FFTW |

Version: | 3.8.0 | 1.2 | 5.0 | 0.9.6 | 2.1 | 1.1 | (2017) | 3.3.9 |

License: | LGPL | BSD | MIT | MIT | MIT | - | MIT | GPL |

Assemblies: | 3 | 1 | 1 | 1 | 1 | - | - | 1+1 |

Size: | 3.6 MB | - | 1.6 MB | 0.3 MB | 0.2 MB | - | - | 2.3 MB |

Nuget: | yes | no | yes | yes | yes | no | no | no |

### Remarks

- Accord.NET is a machine learning framework combined with audio and image processing. It is no longer actively developed.
- The Exocortex project was created in the early .NET 1.0 days. The copy provided with this article was updated to target .NET standard 2.0 and to use the
`Complex`

type of the `System.Numerics`

namespace. - NAudio uses a custom
`Complex`

type implementation with single precision real and imaginary part. - DSPLib is a concise library for FFTs of real valued input and spectrum analysis. Inverse FFT isn't implemented
- FFTW is a popular, native FFT implementation. It is probably the fastest open source implementation one can find on the internet, so comparison with the managed code is a bit unfair. Still, I thought it might be interesting to see how the code competes.

The FFTW binaries are not distributed with this article. If you want FFTW to be included in the benchmark, *fftw3.dll* and *fftw3f.dll *binaries have to be downloaded manually. For an up-to-date build try Conda or visit the Github project for more options: https://github.com/wo80/SharpFFTW.

## Benchmark

I was particularly interested in 1D FFTs for real valued input (audio processing). The following interface is used for all tests. If you have your own FFT implementation, it should be easy to incorporate it into the benchmark by implementing this interface and instantiate the test in the `Util.LoadTests()`

method.

interface ITest
{
string Name { get; }
int Size { get; }
bool Enabled { get; set; }
void Initialize(double[] data);
void FFT(bool forward);
double[] Spectrum(double[] input, bool scale);
}

Take a look at the different tests in the `fftbench.Benchmark`

namespace of the *fftbench.Common* project to see how to implement the interface properly.

Exocortex, Lomont and FFTW have specialised implementations for real valued data and the code can be expected to be about twice as fast as the standard complex implementation.

Accord.NET, Math.NET and FFTW support input arrays of any size (i.e., the size doesn't have to be a power of 2). Since Exocortex, NAudio, NWaves and Lomont support radix 2 only, the benchmark uses arrays with sizes that are a power of 2.

### Results

Running the *fftbench* console application, the output might look like the following. The first column displays the relative speed compared to `Exocortex (real)`

:

$ ./fftbench 10 200
FFT size: 1024
Repeat: 200
[14/14] Done
FFTWF (real): 0.2 [min: 1.29, max: 1.64, mean: 1.33, stddev: 0.03]
FFTW (real): 0.2 [min: 1.34, max: 1.60, mean: 1.43, stddev: 0.05]
FFTW: 0.5 [min: 2.86, max: 3.13, mean: 2.87, stddev: 0.03]
Exocortex (real): 1.0 [min: 5.72, max: 6.20, mean: 5.76, stddev: 0.05]
Lomont (real): 1.1 [min: 6.12, max: 8.04, mean: 6.26, stddev: 0.17]
NWaves (real): 1.5 [min: 8.44, max: 10.73, mean: 8.52, stddev: 0.24]
NWaves: 1.7 [min: 9.70, max: 11.90, mean: 9.79, stddev: 0.21]
Exocortex: 1.9 [min: 10.56, max: 12.93, mean: 10.71, stddev: 0.22]
Lomont: 1.9 [min: 10.58, max: 15.90, mean: 10.77, stddev: 0.38]
NAudio: 2.1 [min: 11.80, max: 14.17, mean: 12.03, stddev: 0.20]
AForge: 2.6 [min: 14.72, max: 15.90, mean: 14.93, stddev: 0.12]
DSPLib: 2.8 [min: 15.30, max: 22.10, mean: 15.91, stddev: 0.94]
Accord: 3.8 [min: 21.06, max: 29.19, mean: 21.69, stddev: 0.93]
Math.NET: 7.4 [min: 38.26, max: 73.53, mean: 42.74, stddev: 4.60]
Timing in microseconds.

In the benchmark, each FFT is actually called *50 * 200* times (the repeat value taken from the second command line argument (200), multiplied by a default value of 50 inner iterations). The FFT size is *2^10* (first command line argument). The benchmark was run on an AMD Ryzen 3600 processor.

The next chart shows the benchmark result for different FFTs of size 1024, 2048 and 4096. The *fftbench-win* application was used with a repeat value of 200:

## Interpreting FFT Results

The *fftbench-win* application (WinForms project only included in the article download - not on Github) contains a util called *FFT Explorer*. You can open it by clicking on the leftmost icon of the benchmark window.

The *FFT Explorer* lets you select the FFT implementation, an input signal and the FFT size. Three graphs will display the input signal, the spectrum computed by the selected FFT and the signal computed by the inverse FFT.

Let's have a look at an example signal of the `SignalGenerator`

class. The generated signal is a simple sine wave with frequency **1.0 Hz** and amplitude **20.0**:

public static double[] Sine(int n)
{
const int FS = 64;
return MathNet.Numerics.Generate.Sinusoidal(n, FS, 1.0, 20.0);
}

Let the FFT frame size be *n* = **256**. With a sampling rate of **64 Hz**, our periodic signal will be repeated exactly four times over the selected window. Be aware that all values are chosen to have an exact match between signal period, sampling rate and FFT size. This way, we won't have to deal with spectral leakage.

Each bin of the FFT output is spaced by the frequency resolution *(sampling rate) / (FFT size)*, which in our case is **64 / 256 = 0.25**. So we expect a peek corresponding to our **1.0 Hz** signal to be in bin number **4** (since **1.0 = 4 * 0.25**).

Due to the nature of the DFT, the spectrum of the signal will get scaled by *n* = **256**, so if no further scaling is done, we expect the value to be **20.0 * 256 / 2** = **2560**. We divide by **2**, since the amplitude is distributed across two bins. The second bin is located at index *256 - 4* = *252* and will have the same magnitude, because, for real valued input signals, the FFT output is (conjugate) symmetric (across *n/2*, the bin corresponding to the Nyquist frequency).

The actual values of the peek won't be consistent between FFT implementations, since there's no common scaling convention. If the FFT size is *n*, then some implementations scale the FFT by *1/n*, some scale the inverse FFT by *1/n* and some scale both by *1/sqrt(n)*. Some don't scale at all (like FFTW).

The following table shows the amplitudes computed by the different FFTs for the above example:

| Accord.NET | Exocortex.DSP | Math.NET | NAudio | NWaves | Lomont | DSPLib | FFTW |

Value: | 2560 | 2560 | 160 | 10 | 2560 | 160 | 10 | 2560 |

You can see that NAudio and DSPLib scale by *1/n* and that Math.NET and Lomont scale by *1/sqrt(n)* (both Math.NET and Lomont allow the user to change scaling conventions; the values computed above and used in the benchmark represent the default settings).

## Conclusion

Obviously and not completely unexpected, FFTW is the clear winner. So, if using a native, GPL licensed library is an option for you, go for it. Looking at the managed code, NWaves performs pretty well. Both Exocortex and Lomont perform well for smaller sized FFTs, but performance of complex FFT seems to decrease for larger sizes. For real valued signals, both Exocortex and Lomont perform very well - even for larger sizes.

## History

- 2016-05-15 - Initial version
- 2016-06-14 - Add information requested in the comments
- 2018-09-02 - Update libraries, include DSPLib and fix benchmark (thanks to
*I'm Chris*, see comments) - 2022-07-02 - Update libraries, include NWaves, link to SharpFFTW/fftbench Github project

Studied math and computer science at the university of Dortmund.

MCTS .NET Framework 4, Windows Applications