Click here to Skip to main content
11,705,753 members (58,693 online)
Click here to Skip to main content

Standard Deviation Extension for Enumerable

, 7 Jun 2013 CPOL 10.4K 7
Rate this:
Please Sign up or sign in to vote.
Calcution of a standard deviation and filtering outliers in a LINQ-style.

Introduction  

The following class provides two extensions to the .NET Enumerable class:

  1. Standard deviation calculation.
  2. Outlier removal using a k-sigma filter (which of course becomes a three-sigma rule for k=3).

See http://en.wikipedia.org/wiki/Three_sigma_rule for some basics. Please use the message board below to post suggestions or report bugs. Have fun!

Using the code 

Source code:

using System;
using System.Collections.Generic;
using System.Linq;
public static class StandardDeviationEnumerableExtensions
{
    /// <summary>
    /// Calculates a standard deviation of elements, using a specified selector.
    /// </summary>
    public static double StandardDeviation<T>(
      this IEnumerable<T> enumerable, Func<T, double> selector)
    {
        double sum = 0;
        double average = enumerable.Average(selector);
        int N = 0;
        foreach (T item in enumerable)
        {   double diff= selector(item) - average;
            sum += diff*diff;
            N++;
        }
        return N == 0 ? 0 : Math.Sqrt(sum / N);
    }
    /// <summary>
    /// Filters elements to remove outliers. The enumeration will be
    /// selected three times, first to calculate an average, second
    /// for a standard deviation, and third to yield remiaining elements. The outliers are these
    /// elements which are further from an average than k*(standard deviation). Set k=3 for
    /// standard three-sigma rule.
    /// </summary>
    public static IEnumerable<T> SkipOutliers<T>(
       this IEnumerable<T> enumerable, double k, Func<T, double> selector)
    {
        // Duplicating a SD code to avoid calculating an average twice.
        double sum = 0;
        double average = enumerable.Average(selector);
        int N = 0;
        foreach (T item in enumerable)
        {   double diff = selector(item) - average;
            sum += diff*diff;
            N++;
        }
        double SD = N == 0 ? 0 : Math.Sqrt(sum / N);
        double delta = k * SD;
        foreach (T item in enumerable)
        {
            if (Math.Abs(selector(item) - average) <= delta)
                yield return item;
        }
    }
}

Usage:

IEnumerable<double> results = new double[] { 1, 1.1, 1.2, 0.9, 2, 0.8 };
double[] filtered;
// contains all elements
filtered = results.SkipOutliers(k: 3, selector: result => result).ToArray();
// contains all elements except 2.0. That is, filtered={ 1, 1.1, 1.2, 0.9, 0.8 }
filtered = results.SkipOutliers(k: 2, selector: result => result).ToArray();
// contains just one element, 1.2, which is closest to an average. That is, filtered={ 1.2 }
filtered = results.SkipOutliers(k: 0.1, selector: result => result).ToArray();
// a singleton is always equal to it's average, so it's yielded even with k==0.
// That is, filtered={ 1.2 }
filtered = filtered.SkipOutliers(k: 0, selector: result => result).ToArray();

So, with k parameter you can adjust how strict the filtering is. If k==0, then only those elements which are equal to an average are yielded. However, do not use k==0 because doubles should not be tested for equality in this way.

History

  • 2013-06-03 -- Original version posted.
  • 2013-06-04 -- Possible unwanted division by zero bug-fix.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Jacek Gajek
Software Developer (Junior)
Poland Poland
My name is Jacek Gajek. I have graduated in computer science with a master's degree from Polibuda in Wrocław. I like C# and Monthy Python's sense of humour.

You may also be interested in...

Comments and Discussions

 
QuestionUsing incremental computation to avoid multiple passes Pin
Philippe Bouteleux6-Jun-13 2:02
memberPhilippe Bouteleux6-Jun-13 2:02 
SuggestionMath.Pow(..., 2) is very inefficient Pin
Matt T Heffron4-Jun-13 16:13
professionalMatt T Heffron4-Jun-13 16:13 
GeneralRe: Math.Pow(..., 2) is very inefficient Pin
Jacek Gajek4-Jun-13 20:53
memberJacek Gajek4-Jun-13 20:53 
NewsUpdate Pin
Jacek Gajek4-Jun-13 0:13
memberJacek Gajek4-Jun-13 0:13 
QuestionPossible divide by zero error? Pin
George Swan3-Jun-13 21:29
memberGeorge Swan3-Jun-13 21:29 
AnswerRe: Possible divide by zero error? Pin
Jacek Gajek3-Jun-13 23:16
memberJacek Gajek3-Jun-13 23:16 
GeneralRe: Possible divide by zero error? Pin
George Swan3-Jun-13 23:42
memberGeorge Swan3-Jun-13 23:42 
GeneralRe: Possible divide by zero error? Pin
Jacek Gajek4-Jun-13 0:05
memberJacek Gajek4-Jun-13 0:05 
SuggestionRe: Possible divide by zero error? Pin
Paul R Benson6-Jun-13 21:01
memberPaul R Benson6-Jun-13 21:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150819.1 | Last Updated 7 Jun 2013
Article Copyright 2013 by Jacek Gajek
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid