14,331,586 members
Rate this:
See more:
We have a problem, we have a series of data, please check the graph below. Basically it's a 24 hours data, we want to automatically split the 24 hours to a few sections according to the similar values.

for example, we can see that from 0am to 7am the values are more or less the same, we mark that period as group1, and between 7am and 17 we have similar (roughly) then we make it group 2, and 17 to 24 as group 3.

if we input a series of data (x,y), and define there should be a numer of partitions(for example, 3), then the result should come like this:

group1: 0am - 7am
group2: 7am - 17
group3: 16 - 24

graph is here:

http://pot9tw.sn2.livefilestore.com/y1pvO0HSvBYGu22JKakg4XJSys_zM1H9qohg7XqDwxxSOgz1orlRNjJhLKCOYyvZ9RZqHyyUHErZFQbgaNvUHuZ4FRIIOTlFuHV/graph.PNG?psid=1[^]
Posted
Updated 26-Jan-11 16:21pm
v3

Rate this:

## Solution 1

This is called "Cluster analysis", your problem is probably the most elementary one.

http://en.wikipedia.org/wiki/Cluster_analysis[^].

The article is very clear and provides very good references you may want to follow. It least you'll learn proper terminology which will help you in your search of relevant approaches and algorithms. I actually just took a close look myself: the algorithms are easy enough to code them from scratch; what you need is to compare them and decide which one is the best for your purpose; I think this is more difficult, will require some research.

Good luck,

—SA
v3
Sandeep Mewara 27-Jan-11 0:16am

Deserve a 5!
JF2015 27-Jan-11 0:46am

Thanks a lot, Sandeep, JF2015.
--SA
MCY 27-Jan-11 4:01am

good one
Huisheng Chen 27-Jan-11 4:58am

let me simplify my question, imagine a series of values:

1,2,1,3,2,4,5, 11,10,12,10,12,13, 25,21,24,20, 4,5,2,3

we can see clearly see that according to the difference, we could split them into 4 groups bassed on my requirement above.

I searched a lot, finally came to k-means algo, but it seems that it only deals with 2-d data, not for 1-d values.
Sergey Alexandrovich Kryukov 27-Jan-11 10:10am

If 1d vs 2d is your only problem, it is not a problem at all. Let's say you have a 1d points X, Y, Z... Now consider 2d points: (X,0), (Y,0), (Z,0)... and operate with them. (In practice we just assume the additional dimension.) In other words, if some 2d points all sit in a horizontal plane, they are still 2d points. Or, if you well, 1d points _are_ 2d points, a special case of 2d points. :-)
--SA
Sergey Alexandrovich Kryukov 27-Jan-11 10:53am

No, the thins is: you incorrectly formulated the controversy: it's not 1D vs 3D. Your data is fully ordered. I guess you need clusters without violation of this order -- that's the difference.
Of course you need different algorithm.
Thank you.
--SA
Rate this:

## Solution 2

Hi, I implemented the cluster by my own method. the logic is that it will first try to find out top n delta (differences), then split the data according to the delta.

```using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApplication612
{
class Program
{
private class ValueSort : IComparer<int[]>
{
public int Compare(int[] x, int[] y)
{
if (x[1] > y[1])
return -1;
else if (x[1] < y[1])
return 1;
else
return 0;
}
}
private class IndexSort : IComparer<int[]>
{
public int Compare(int[] x, int[] y)
{
if (x[0] > y[0])
return 1;
else if (x[0] < y[0])
return -1;
else
return 0;
}
}
static void Main2(string[] args)
{
int[] data = new int[] { 1, 2, 1, 3, 2, 4, 5, 11, 10, 12, 10, 12, 13, 25, 21, 24, 20, 4, 5, 2, 3 };
int[][] delta = new int[data.Length][];
delta[0] = new int[2];
delta[0][0] = 0;
delta[0][1] = 0;
for (int i = 0; i < data.Length - 1; i++)
{
delta[i + 1] = new int[2];
delta[i + 1][0] = i + 1;
delta[i + 1][1] = Math.Abs(data[i + 1] - data[i]);
}
Array.Sort(delta, new ValueSort());
int groups = 4;
int[][] topDelta = new int[groups - 1][];
for (int i = 0; i < topDelta.Length; i++)
{
topDelta[i] = delta[i];
}
Array.Sort(topDelta, new IndexSort());
List<List<int>> output = new List<List<int>>();
int index = 0;
for (int i = 0; i < topDelta.Length; i++)
{
List<int> group = new List<int>();
for (int j = index; j < topDelta[i][0]; j++)
{
}
index = topDelta[i][0] + 1;
}
List<int> group2 = new List<int>();
for (int j = topDelta[topDelta.Length - 1][0]; j < data.Length; j++)
{
}
for (int i = 0; i < output.Count; i++)
{
Console.Write("Group" + i + ":");
foreach (int value in output[i])
{
Console.Write(value + ",");
}
Console.WriteLine();
}
}
}
}```
v6
Sergey Alexandrovich Kryukov 27-Jan-11 10:15am

You need to reformat this.
You can try to paste from Visual Studio.

To do this, take your original C#, in some text editor (not HTML page) replace all angular brackets with HTML entities < and > in "Improve Answer" check "Allow HTML" radio button and paste the code, manually add <pre lang="cs">...</pre>.

Check up the result on the page after you post.

--SA
Huisheng Chen 27-Jan-11 16:39pm

thank you very much, finally I made the correct format
Rate this:

## Solution 3

Here is a thought:
Use Bollinger Bands[^]

Calibrate to fit your needs and monitor the rate of change for the moving average. This should enable you to detect some of the interesting changes to your data. Comparing the input values against +/- the standard deviation multiplied by a calibrated value, indicates other points of interest. While bollinger bands primarily are associated with visual interpretation of financial data - there is no reason not to take advantage of the method computationally :)

Regards
Espen Harlinn
Sergey Alexandrovich Kryukov 27-Jan-11 16:59pm

Interesting method, I voted 5. The problem of those cluster analysis methods I tried to overview very quickly is that they are oriented to unordered data while OP has fully ordered data -- as a time sequence, and the clustered data should preserve order. Bands should be closed to the purpose.
--SA
Espen Harlinn 27-Jan-11 17:21pm

Thanks SAKryukov, thought you might like it - Fire up Delphi and play a bit with TeeChart, it has this functionality as a "function" :) - at least both the Pro and .Net version has it