Click here to Skip to main content
Click here to Skip to main content

Calculate Percentiles in O(1) space and O(n) time

, 1 Mar 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
An unprecise algorithm to compute percentiles of a stream of data with one scan and a fixed buffer.

Introduction

In my work, I needed to calculate the percentile of a stream of data. The stream can be very long and the data keeps coming. Each time new data arrives, I need to update my percentile. After the update is done, the data is no more needed. The order of the data is pretty random with no obvious pattern. The result does not need to be always 100% precise, some reasonable error is tolerable.

Based on the above requirement, I tried to Google to see if there is any existent code, but I didn't find the exact solution. Although I found John D. Cook's article very inspiring.

Background

The idea is, instead of storing the whole set of data and sorting it, in most cases, we only need to maintain a sorted list to store the data around the current percentile. When new data arrives, we decide if we should insert the new data into the list, and if so, either the head or the tail of the list would be abandoned. As the data keeps coming, the sorted list slides in the whole data set. And in a not too bad environment, we expect the right percentile will stay in the sliding window.

Occasionally, especially when the incoming data is in some order, the right percentile could be lost from the list.

Using the code

Following is the code. As demonstrated, with the buffer size set to 30, we calculate .05 percentile of 10,000 uniform random numbers and get good results.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;


    class testPercentile
    {
        public void Run()
        {
            Random autoRand = new Random( );
            const double percentage = .05;
            Percentile<double> percentile=new Percentile<double>(percentage,50);
            List<double> list=new List<double>();
            for (int i = 0; i < 10000; i++)
            {
                double tmp = autoRand.NextDouble();
                Debug.WriteLine(tmp);
                percentile.add(tmp);
                int index = list.BinarySearch(tmp);
                if (index < 0)
                    list.Insert(~index, tmp);
                else
                    list.Insert(index, tmp);
                index = list.BinarySearch(percentile.value);
                if (index < 0) index = ~index;
                Console.WriteLine( index*1.0 / (i + 1)-percentage);
            }
                
            Console.WriteLine();
                Console.WriteLine(percentile.value);
        }
    }
    class Percentile<T> where T : IComparable 
    {
        uint position, size,count;
        double percentile;
        List<T> list;
        public Percentile(double percentile, uint buffersize)
        {
            size = buffersize;
            this.percentile = percentile;
            count=position = 0;
            list = new List<T>();
        }
        public void add(T newvalue)
        {
            if (++count > size)
            {
                int newpos = (int)System.Math.Floor(count * percentile - size / 2.0);
                if (newpos < 0) newpos = 0;
                if (newvalue.CompareTo(list.First())>0)
                {
                    if (newpos > position)
                    {
                        list.RemoveAt(0);
                        ++position;
                    }
                    else
                    {
                        if (newvalue.CompareTo ( list.Last())>=0)
                            return;
                        list.RemoveAt(list.Count - 1);
                    }
                }
                else
                {
                    if (newpos > position)
                    {
                        ++position;
                    }
                    else
                    {
                        list.RemoveAt(list.Count - 1);
                        list.Insert(0, newvalue);
                    }
                    return;
                }
            }
            int index = list.BinarySearch(newvalue);
            if (index < 0)
                list.Insert(~index, newvalue);
            else
                list.Insert(index, newvalue);
        }
        public T value
        {
            get
            {
                return list.ElementAt((int)System.Math.Floor(count*percentile-position));
            }
        }
    }

Points of interest

I'm lazy. I found my code seems to have bias because when the error occurs, the calculated percentile is always larger than the real percentile. Can anyone help me reduce the bias?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

jerron

United States United States
No Biography provided

Comments and Discussions

 
QuestionSome similar issue Pinmembercmbtobmc6-Nov-14 5:39 
GeneralBias progress Pinmemberkrn_2k8-Jan-10 3:53 
GeneralMy vote of 1 Pinmemberjoker995-Mar-09 4:24 
QuestionHow to remove the bias Pinmemberwamckee2-Mar-09 14:47 
AnswerAre you sure? Pinmemberjerron3-Mar-09 13:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141223.1 | Last Updated 1 Mar 2009
Article Copyright 2009 by jerron
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid