Click here to Skip to main content
15,885,007 members
Articles / Programming Languages / C#

Calculate Percentiles in O(1) space and O(n) time

Rate me:
Please Sign up or sign in to vote.
4.43/5 (5 votes)
1 Mar 2009CPOL1 min read 34.4K   9   5
An unprecise algorithm to compute percentiles of a stream of data with one scan and a fixed buffer.

Introduction

In my work, I needed to calculate the percentile of a stream of data. The stream can be very long and the data keeps coming. Each time new data arrives, I need to update my percentile. After the update is done, the data is no more needed. The order of the data is pretty random with no obvious pattern. The result does not need to be always 100% precise, some reasonable error is tolerable.

Based on the above requirement, I tried to Google to see if there is any existent code, but I didn't find the exact solution. Although I found John D. Cook's article very inspiring.

Background

The idea is, instead of storing the whole set of data and sorting it, in most cases, we only need to maintain a sorted list to store the data around the current percentile. When new data arrives, we decide if we should insert the new data into the list, and if so, either the head or the tail of the list would be abandoned. As the data keeps coming, the sorted list slides in the whole data set. And in a not too bad environment, we expect the right percentile will stay in the sliding window.

Occasionally, especially when the incoming data is in some order, the right percentile could be lost from the list.

Using the code

Following is the code. As demonstrated, with the buffer size set to 30, we calculate .05 percentile of 10,000 uniform random numbers and get good results.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;


    class testPercentile
    {
        public void Run()
        {
            Random autoRand = new Random( );
            const double percentage = .05;
            Percentile<double> percentile=new Percentile<double>(percentage,50);
            List<double> list=new List<double>();
            for (int i = 0; i < 10000; i++)
            {
                double tmp = autoRand.NextDouble();
                Debug.WriteLine(tmp);
                percentile.add(tmp);
                int index = list.BinarySearch(tmp);
                if (index < 0)
                    list.Insert(~index, tmp);
                else
                    list.Insert(index, tmp);
                index = list.BinarySearch(percentile.value);
                if (index < 0) index = ~index;
                Console.WriteLine( index*1.0 / (i + 1)-percentage);
            }
                
            Console.WriteLine();
                Console.WriteLine(percentile.value);
        }
    }
    class Percentile<T> where T : IComparable 
    {
        uint position, size,count;
        double percentile;
        List<T> list;
        public Percentile(double percentile, uint buffersize)
        {
            size = buffersize;
            this.percentile = percentile;
            count=position = 0;
            list = new List<T>();
        }
        public void add(T newvalue)
        {
            if (++count > size)
            {
                int newpos = (int)System.Math.Floor(count * percentile - size / 2.0);
                if (newpos < 0) newpos = 0;
                if (newvalue.CompareTo(list.First())>0)
                {
                    if (newpos > position)
                    {
                        list.RemoveAt(0);
                        ++position;
                    }
                    else
                    {
                        if (newvalue.CompareTo ( list.Last())>=0)
                            return;
                        list.RemoveAt(list.Count - 1);
                    }
                }
                else
                {
                    if (newpos > position)
                    {
                        ++position;
                    }
                    else
                    {
                        list.RemoveAt(list.Count - 1);
                        list.Insert(0, newvalue);
                    }
                    return;
                }
            }
            int index = list.BinarySearch(newvalue);
            if (index < 0)
                list.Insert(~index, newvalue);
            else
                list.Insert(index, newvalue);
        }
        public T value
        {
            get
            {
                return list.ElementAt((int)System.Math.Floor(count*percentile-position));
            }
        }
    }

Points of interest

I'm lazy. I found my code seems to have bias because when the error occurs, the calculated percentile is always larger than the real percentile. Can anyone help me reduce the bias?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionSome similar issue Pin
cmbtobmc6-Nov-14 4:39
cmbtobmc6-Nov-14 4:39 
GeneralBias progress Pin
krn_2k8-Jan-10 2:53
krn_2k8-Jan-10 2:53 
GeneralMy vote of 1 Pin
joker995-Mar-09 3:24
joker995-Mar-09 3:24 
QuestionHow to remove the bias Pin
wamckee2-Mar-09 13:47
wamckee2-Mar-09 13:47 
AnswerAre you sure? Pin
jerron3-Mar-09 12:07
jerron3-Mar-09 12:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.