Click here to Skip to main content
15,998,056 members
Please Sign up or sign in to vote.
5.00/5 (2 votes)
See more:
We have a problem, we have a series of data, please check the graph below. Basically it's a 24 hours data, we want to automatically split the 24 hours to a few sections according to the similar values.

for example, we can see that from 0am to 7am the values are more or less the same, we mark that period as group1, and between 7am and 17 we have similar (roughly) then we make it group 2, and 17 to 24 as group 3.

if we input a series of data (x,y), and define there should be a numer of partitions(for example, 3), then the result should come like this:

group1: 0am - 7am
group2: 7am - 17
group3: 16 - 24


graph is here:

http://pot9tw.sn2.livefilestore.com/y1pvO0HSvBYGu22JKakg4XJSys_zM1H9qohg7XqDwxxSOgz1orlRNjJhLKCOYyvZ9RZqHyyUHErZFQbgaNvUHuZ4FRIIOTlFuHV/graph.PNG?psid=1[^]
Posted
Updated 26-Jan-11 16:21pm
v3

This is called "Cluster analysis", your problem is probably the most elementary one.
I don't know your chances to find anyone who has some ready-to-use code or algorithm, but you can start with this article, just to get and idea what's involved:

http://en.wikipedia.org/wiki/Cluster_analysis[^].

The article is very clear and provides very good references you may want to follow. It least you'll learn proper terminology which will help you in your search of relevant approaches and algorithms. I actually just took a close look myself: the algorithms are easy enough to code them from scratch; what you need is to compare them and decide which one is the best for your purpose; I think this is more difficult, will require some research.

Good luck,

—SA
 
Share this answer
 
v3
Comments
Sandeep Mewara 27-Jan-11 0:16am    
Deserve a 5!
JF2015 27-Jan-11 0:46am    
Great answer. 5!
Sergey Alexandrovich Kryukov 27-Jan-11 0:50am    
Thanks a lot, Sandeep, JF2015.
--SA
MCY 27-Jan-11 4:01am    
good one
Huisheng Chen 27-Jan-11 4:58am    
let me simplify my question, imagine a series of values:

1,2,1,3,2,4,5, 11,10,12,10,12,13, 25,21,24,20, 4,5,2,3


we can see clearly see that according to the difference, we could split them into 4 groups bassed on my requirement above.

I searched a lot, finally came to k-means algo, but it seems that it only deals with 2-d data, not for 1-d values.
Hi, I implemented the cluster by my own method. the logic is that it will first try to find out top n delta (differences), then split the data according to the delta.

C#
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication612
{
    class Program
    {
        private class ValueSort : IComparer<int[]>
        {
            public int Compare(int[] x, int[] y)
            {
                if (x[1] > y[1])
                    return -1;
                else if (x[1] < y[1])
                    return 1;
                else
                    return 0;
            }
        }
        private class IndexSort : IComparer<int[]>
        {
            public int Compare(int[] x, int[] y)
            {
                if (x[0] > y[0])
                    return 1;
                else if (x[0] < y[0])
                    return -1;
                else
                    return 0;
            }
        }
        static void Main2(string[] args)
        {
            int[] data = new int[] { 1, 2, 1, 3, 2, 4, 5, 11, 10, 12, 10, 12, 13, 25, 21, 24, 20, 4, 5, 2, 3 };
            int[][] delta = new int[data.Length][];
            delta[0] = new int[2];
            delta[0][0] = 0;
            delta[0][1] = 0;
            for (int i = 0; i < data.Length - 1; i++)
            {
                delta[i + 1] = new int[2];
                delta[i + 1][0] = i + 1;
                delta[i + 1][1] = Math.Abs(data[i + 1] - data[i]);
            }
            Array.Sort(delta, new ValueSort());
            int groups = 4;
            int[][] topDelta = new int[groups - 1][];
            for (int i = 0; i < topDelta.Length; i++)
            {
                topDelta[i] = delta[i];
            }
            Array.Sort(topDelta, new IndexSort());
            List<List<int>> output = new List<List<int>>();
            int index = 0;
            for (int i = 0; i < topDelta.Length; i++)
            {
                List<int> group = new List<int>();
                for (int j = index; j < topDelta[i][0]; j++)
                {
                    group.Add(data[j]);
                }
                index = topDelta[i][0] + 1;
                output.Add(group);
            }
            List<int> group2 = new List<int>();
            for (int j = topDelta[topDelta.Length - 1][0]; j < data.Length; j++)
            {
                group2.Add(data[j]);
            }
            output.Add(group2);
            for (int i = 0; i < output.Count; i++)
            {
                Console.Write("Group" + i + ":");
                foreach (int value in output[i])
                {
                    Console.Write(value + ",");
                }
                Console.WriteLine();
            }
        }
    }
}
 
Share this answer
 
v6
Comments
Sergey Alexandrovich Kryukov 27-Jan-11 10:15am    
You need to reformat this.
You can try to paste from Visual Studio.

To do this, take your original C#, in some text editor (not HTML page) replace all angular brackets with HTML entities < and > in "Improve Answer" check "Allow HTML" radio button and paste the code, manually add <pre lang="cs">...</pre>.

Check up the result on the page after you post.

--SA
Huisheng Chen 27-Jan-11 16:39pm    
thank you very much, finally I made the correct format
Here is a thought:
Use Bollinger Bands[^]

Calibrate to fit your needs and monitor the rate of change for the moving average. This should enable you to detect some of the interesting changes to your data. Comparing the input values against +/- the standard deviation multiplied by a calibrated value, indicates other points of interest. While bollinger bands primarily are associated with visual interpretation of financial data - there is no reason not to take advantage of the method computationally :)

Regards
Espen Harlinn
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 27-Jan-11 16:59pm    
Interesting method, I voted 5. The problem of those cluster analysis methods I tried to overview very quickly is that they are oriented to unordered data while OP has fully ordered data -- as a time sequence, and the clustered data should preserve order. Bands should be closed to the purpose.
--SA
Espen Harlinn 27-Jan-11 17:21pm    
Thanks SAKryukov, thought you might like it - Fire up Delphi and play a bit with TeeChart, it has this functionality as a "function" :) - at least both the Pro and .Net version has it

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900