Click here to Skip to main content
14,331,586 members
Rate this:
Please Sign up or sign in to vote.
See more:
We have a problem, we have a series of data, please check the graph below. Basically it's a 24 hours data, we want to automatically split the 24 hours to a few sections according to the similar values.

for example, we can see that from 0am to 7am the values are more or less the same, we mark that period as group1, and between 7am and 17 we have similar (roughly) then we make it group 2, and 17 to 24 as group 3.

if we input a series of data (x,y), and define there should be a numer of partitions(for example, 3), then the result should come like this:

group1: 0am - 7am
group2: 7am - 17
group3: 16 - 24


graph is here:

http://pot9tw.sn2.livefilestore.com/y1pvO0HSvBYGu22JKakg4XJSys_zM1H9qohg7XqDwxxSOgz1orlRNjJhLKCOYyvZ9RZqHyyUHErZFQbgaNvUHuZ4FRIIOTlFuHV/graph.PNG?psid=1[^]
Posted
Updated 26-Jan-11 16:21pm
v3
Rate this:
Please Sign up or sign in to vote.

Solution 1

This is called "Cluster analysis", your problem is probably the most elementary one.
I don't know your chances to find anyone who has some ready-to-use code or algorithm, but you can start with this article, just to get and idea what's involved:

http://en.wikipedia.org/wiki/Cluster_analysis[^].

The article is very clear and provides very good references you may want to follow. It least you'll learn proper terminology which will help you in your search of relevant approaches and algorithms. I actually just took a close look myself: the algorithms are easy enough to code them from scratch; what you need is to compare them and decide which one is the best for your purpose; I think this is more difficult, will require some research.

Good luck,

—SA
   
v3
Comments
Sandeep Mewara 27-Jan-11 0:16am
   
Deserve a 5!
JF2015 27-Jan-11 0:46am
   
Great answer. 5!
   
Thanks a lot, Sandeep, JF2015.
--SA
MCY 27-Jan-11 4:01am
   
good one
Huisheng Chen 27-Jan-11 4:58am
   
let me simplify my question, imagine a series of values:

1,2,1,3,2,4,5, 11,10,12,10,12,13, 25,21,24,20, 4,5,2,3


we can see clearly see that according to the difference, we could split them into 4 groups bassed on my requirement above.

I searched a lot, finally came to k-means algo, but it seems that it only deals with 2-d data, not for 1-d values.
Sergey Alexandrovich Kryukov 27-Jan-11 10:10am
   
If 1d vs 2d is your only problem, it is not a problem at all. Let's say you have a 1d points X, Y, Z... Now consider 2d points: (X,0), (Y,0), (Z,0)... and operate with them. (In practice we just assume the additional dimension.) In other words, if some 2d points all sit in a horizontal plane, they are still 2d points. Or, if you well, 1d points _are_ 2d points, a special case of 2d points. :-)
--SA
Sergey Alexandrovich Kryukov 27-Jan-11 10:53am
   
No, the thins is: you incorrectly formulated the controversy: it's not 1D vs 3D. Your data is fully ordered. I guess you need clusters without violation of this order -- that's the difference.
Of course you need different algorithm.
Thank you.
--SA
Rate this:
Please Sign up or sign in to vote.

Solution 2

Hi, I implemented the cluster by my own method. the logic is that it will first try to find out top n delta (differences), then split the data according to the delta.

using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication612
{
    class Program
    {
        private class ValueSort : IComparer<int[]>
        {
            public int Compare(int[] x, int[] y)
            {
                if (x[1] > y[1])
                    return -1;
                else if (x[1] < y[1])
                    return 1;
                else
                    return 0;
            }
        }
        private class IndexSort : IComparer<int[]>
        {
            public int Compare(int[] x, int[] y)
            {
                if (x[0] > y[0])
                    return 1;
                else if (x[0] < y[0])
                    return -1;
                else
                    return 0;
            }
        }
        static void Main2(string[] args)
        {
            int[] data = new int[] { 1, 2, 1, 3, 2, 4, 5, 11, 10, 12, 10, 12, 13, 25, 21, 24, 20, 4, 5, 2, 3 };
            int[][] delta = new int[data.Length][];
            delta[0] = new int[2];
            delta[0][0] = 0;
            delta[0][1] = 0;
            for (int i = 0; i < data.Length - 1; i++)
            {
                delta[i + 1] = new int[2];
                delta[i + 1][0] = i + 1;
                delta[i + 1][1] = Math.Abs(data[i + 1] - data[i]);
            }
            Array.Sort(delta, new ValueSort());
            int groups = 4;
            int[][] topDelta = new int[groups - 1][];
            for (int i = 0; i < topDelta.Length; i++)
            {
                topDelta[i] = delta[i];
            }
            Array.Sort(topDelta, new IndexSort());
            List<List<int>> output = new List<List<int>>();
            int index = 0;
            for (int i = 0; i < topDelta.Length; i++)
            {
                List<int> group = new List<int>();
                for (int j = index; j < topDelta[i][0]; j++)
                {
                    group.Add(data[j]);
                }
                index = topDelta[i][0] + 1;
                output.Add(group);
            }
            List<int> group2 = new List<int>();
            for (int j = topDelta[topDelta.Length - 1][0]; j < data.Length; j++)
            {
                group2.Add(data[j]);
            }
            output.Add(group2);
            for (int i = 0; i < output.Count; i++)
            {
                Console.Write("Group" + i + ":");
                foreach (int value in output[i])
                {
                    Console.Write(value + ",");
                }
                Console.WriteLine();
            }
        }
    }
}
   
v6
Comments
Sergey Alexandrovich Kryukov 27-Jan-11 10:15am
   
You need to reformat this.
You can try to paste from Visual Studio.

To do this, take your original C#, in some text editor (not HTML page) replace all angular brackets with HTML entities < and > in "Improve Answer" check "Allow HTML" radio button and paste the code, manually add <pre lang="cs">...</pre>.

Check up the result on the page after you post.

--SA
Huisheng Chen 27-Jan-11 16:39pm
   
thank you very much, finally I made the correct format
Rate this:
Please Sign up or sign in to vote.

Solution 3

Here is a thought:
Use Bollinger Bands[^]

Calibrate to fit your needs and monitor the rate of change for the moving average. This should enable you to detect some of the interesting changes to your data. Comparing the input values against +/- the standard deviation multiplied by a calibrated value, indicates other points of interest. While bollinger bands primarily are associated with visual interpretation of financial data - there is no reason not to take advantage of the method computationally :)

Regards
Espen Harlinn
   
Comments
Sergey Alexandrovich Kryukov 27-Jan-11 16:59pm
   
Interesting method, I voted 5. The problem of those cluster analysis methods I tried to overview very quickly is that they are oriented to unordered data while OP has fully ordered data -- as a time sequence, and the clustered data should preserve order. Bands should be closed to the purpose.
--SA
Espen Harlinn 27-Jan-11 17:21pm
   
Thanks SAKryukov, thought you might like it - Fire up Delphi and play a bit with TeeChart, it has this functionality as a "function" :) - at least both the Pro and .Net version has it

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100