Posted 3 Jun 2013

# Standard Deviation Extension for Enumerable

, 7 Jun 2013
Calcution of a standard deviation and filtering outliers in a LINQ-style.

## Introduction

The following class provides two extensions to the .NET `Enumerable` class:

1. Standard deviation calculation.
2. Outlier removal using a k-sigma filter (which of course becomes a three-sigma rule for k=3).

See http://en.wikipedia.org/wiki/Three_sigma_rule for some basics. Please use the message board below to post suggestions or report bugs. Have fun!

## Using the code

Source code:

```using System;
using System.Collections.Generic;
using System.Linq;
public static class StandardDeviationEnumerableExtensions
{
/// <summary>
/// Calculates a standard deviation of elements, using a specified selector.
/// </summary>
public static double StandardDeviation<T>(
this IEnumerable<T> enumerable, Func<T, double> selector)
{
double sum = 0;
double average = enumerable.Average(selector);
int N = 0;
foreach (T item in enumerable)
{   double diff= selector(item) - average;
sum += diff*diff;
N++;
}
return N == 0 ? 0 : Math.Sqrt(sum / N);
}
/// <summary>
/// Filters elements to remove outliers. The enumeration will be
/// selected three times, first to calculate an average, second
/// for a standard deviation, and third to yield remiaining elements. The outliers are these
/// elements which are further from an average than k*(standard deviation). Set k=3 for
/// standard three-sigma rule.
/// </summary>
public static IEnumerable<T> SkipOutliers<T>(
this IEnumerable<T> enumerable, double k, Func<T, double> selector)
{
// Duplicating a SD code to avoid calculating an average twice.
double sum = 0;
double average = enumerable.Average(selector);
int N = 0;
foreach (T item in enumerable)
{   double diff = selector(item) - average;
sum += diff*diff;
N++;
}
double SD = N == 0 ? 0 : Math.Sqrt(sum / N);
double delta = k * SD;
foreach (T item in enumerable)
{
if (Math.Abs(selector(item) - average) <= delta)
yield return item;
}
}
}```

Usage:

```IEnumerable<double> results = new double[] { 1, 1.1, 1.2, 0.9, 2, 0.8 };
double[] filtered;
// contains all elements
filtered = results.SkipOutliers(k: 3, selector: result => result).ToArray();
// contains all elements except 2.0. That is, filtered={ 1, 1.1, 1.2, 0.9, 0.8 }
filtered = results.SkipOutliers(k: 2, selector: result => result).ToArray();
// contains just one element, 1.2, which is closest to an average. That is, filtered={ 1.2 }
filtered = results.SkipOutliers(k: 0.1, selector: result => result).ToArray();
// a singleton is always equal to it's average, so it's yielded even with k==0.
// That is, filtered={ 1.2 }
filtered = filtered.SkipOutliers(k: 0, selector: result => result).ToArray();```

So, with k parameter you can adjust how strict the filtering is. If k==0, then only those elements which are equal to an average are yielded. However, do not use k==0 because doubles should not be tested for equality in this way.

## History

• 2013-06-03 -- Original version posted.
• 2013-06-04 -- Possible unwanted division by zero bug-fix.

## License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

## About the Author

 Software Developer (Junior) Poland
My name is Jacek Gajek. I have graduated in computer science with a master's degree from Polibuda in Wrocław. I like C# and Monthy Python's sense of humour.

## Comments and Discussions

Article Copyright 2013 by Jacek Gajek
