Click here to Skip to main content
Licence CPOL
First Posted 1 Mar 2009
Views 10,314
Bookmarked 9 times

Calculate Percentiles in O(1) space and O(n) time

By | 1 Mar 2009 | Article
An unprecise algorithm to compute percentiles of a stream of data with one scan and a fixed buffer.

Introduction

In my work, I needed to calculate the percentile of a stream of data. The stream can be very long and the data keeps coming. Each time new data arrives, I need to update my percentile. After the update is done, the data is no more needed. The order of the data is pretty random with no obvious pattern. The result does not need to be always 100% precise, some reasonable error is tolerable.

Based on the above requirement, I tried to Google to see if there is any existent code, but I didn't find the exact solution. Although I found John D. Cook's article very inspiring.

Background

The idea is, instead of storing the whole set of data and sorting it, in most cases, we only need to maintain a sorted list to store the data around the current percentile. When new data arrives, we decide if we should insert the new data into the list, and if so, either the head or the tail of the list would be abandoned. As the data keeps coming, the sorted list slides in the whole data set. And in a not too bad environment, we expect the right percentile will stay in the sliding window.

Occasionally, especially when the incoming data is in some order, the right percentile could be lost from the list.

Using the code

Following is the code. As demonstrated, with the buffer size set to 30, we calculate .05 percentile of 10,000 uniform random numbers and get good results.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;


    class testPercentile
    {
        public void Run()
        {
            Random autoRand = new Random( );
            const double percentage = .05;
            Percentile<double> percentile=new Percentile<double>(percentage,50);
            List<double> list=new List<double>();
            for (int i = 0; i < 10000; i++)
            {
                double tmp = autoRand.NextDouble();
                Debug.WriteLine(tmp);
                percentile.add(tmp);
                int index = list.BinarySearch(tmp);
                if (index < 0)
                    list.Insert(~index, tmp);
                else
                    list.Insert(index, tmp);
                index = list.BinarySearch(percentile.value);
                if (index < 0) index = ~index;
                Console.WriteLine( index*1.0 / (i + 1)-percentage);
            }
                
            Console.WriteLine();
                Console.WriteLine(percentile.value);
        }
    }
    class Percentile<T> where T : IComparable 
    {
        uint position, size,count;
        double percentile;
        List<T> list;
        public Percentile(double percentile, uint buffersize)
        {
            size = buffersize;
            this.percentile = percentile;
            count=position = 0;
            list = new List<T>();
        }
        public void add(T newvalue)
        {
            if (++count > size)
            {
                int newpos = (int)System.Math.Floor(count * percentile - size / 2.0);
                if (newpos < 0) newpos = 0;
                if (newvalue.CompareTo(list.First())>0)
                {
                    if (newpos > position)
                    {
                        list.RemoveAt(0);
                        ++position;
                    }
                    else
                    {
                        if (newvalue.CompareTo ( list.Last())>=0)
                            return;
                        list.RemoveAt(list.Count - 1);
                    }
                }
                else
                {
                    if (newpos > position)
                    {
                        ++position;
                    }
                    else
                    {
                        list.RemoveAt(list.Count - 1);
                        list.Insert(0, newvalue);
                    }
                    return;
                }
            }
            int index = list.BinarySearch(newvalue);
            if (index < 0)
                list.Insert(~index, newvalue);
            else
                list.Insert(index, newvalue);
        }
        public T value
        {
            get
            {
                return list.ElementAt((int)System.Math.Floor(count*percentile-position));
            }
        }
    }

Points of interest

I'm lazy. I found my code seems to have bias because when the error occurs, the calculated percentile is always larger than the real percentile. Can anyone help me reduce the bias?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

jerron



United States United States

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralBias progress Pinmemberkrn_2k2:53 8 Jan '10  
GeneralMy vote of 1 Pinmemberjoker993:24 5 Mar '09  
don't use "~" to convert -1 to 0.
QuestionHow to remove the bias Pinmemberwamckee13:47 2 Mar '09  
AnswerAre you sure? Pinmemberjerron12:07 3 Mar '09  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120529.1 | Last Updated 1 Mar 2009
Article Copyright 2009 by jerron
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid