15,615,280 members
See more:
Hi, All

I'm a hobbyist developer, and I've hit a corker of a problem, too much for my brain. I have a data set that gives me a set of data points - one Y value for every X-value. The curve that these values give is jagged, and I need to smooth the curve. The second problem is that for some data sets, there are some X-values missing. How do I fill in the missing values?

Any help or sample code would be greatly appreciated.
Posted
twalib 3-Apr-12 6:07am
which tools do use

## Solution 3

Thanks SO MUCH for all the suggestions - I'm going to apply them to the data and will let you know what happens. If I can get my brain to co-operate, it's gone numb!

## Solution 2

Greetings,

As previously stated, the simplest way to fill in the missing data points is to apply "Linear Interpolation". Basically this involves calculating the gradient between the two known data values and then using it to calculate the vertical offset from one of the known data points.

e.g. Given two data points `(X1=1, Y1=10)` and `(X4=4, Y4=25)` we can calculate the Y values for the missing X2 and X3 as follows;
```<br />
Y2 = Y1 + Gradient*(X2-X1)<br />
Y3 = Y1 + Gradient*(X3-X1)<br />```

Automatically filling in the gaps in your data array is simply a matter of locating a block of consecutive missing data points in the array and then applying the above formulas to each point in the block.

e.g.
VB
```Private Sub ArrayInterpolation(ByRef arr() As Double)
Dim FirstMissingItem As Integer = 0
For Current As Integer = arr.GetLowerBound(0) To arr.GetUpperBound(0)
If arr(Current) = 0.0 Then
If FirstMissingItem = 0 Then
FirstMissingItem = Current  'mark start of block
End If
Else
If FirstMissingItem > 0 Then  'true if preceded by a known value

'interpolate values for block
Dim Gradient As Double = (arr(Current) - arr(FirstMissingItem -1)) _
/(Current - FirstMissingItem + 1)
For MissingItem As Integer = FirstMissingItem To (Current - 1)
arr(MissingItem) = arr(FirstMissingItem - 1) _
+ Gradient * (MissingItem - FirstMissingItem + 1)
Next
End If
FirstMissingItem = 0  'clear flag
End If
Next
End Sub```

The outer loop identifies blocks of consecutive missing data items. It sets the `FirstMissingItem `variable to the index of the first missing item encountered; when a second known value is encountered it verifies that the block was preceded by a known data value and then proceeds to the interpolation section. At this point `Current `will hold the index of the first known data item following the block. The gradient is then calculated between the two known data points (`FirstMissingItem-1` and `Current`) and is used by the inner loop to interpolate values for each item within the block.

The above can be tested by running this small test routine;
VB
```Private Sub ArrayTest()
Dim arr() As Double = {3, 8, 0, 0, 4, 2}
Call ArrayInterpolation(arr)
For i As Integer = arr.GetLowerBound(0) To arr.GetUpperBound(0)
Debug.Print(String.Format("X={0}, Y={1}", i, arr(i)))
Next i
End Sub```

Note that the code given assumes that the array index is equal to the X-coordinate. Also, the above routine only interpolates ... it does not extrapolate (to see this, replace the array initialization value in the test Sub with `{0, 3, 8, 0, 0, 4, 2, 0}` .. the zeroes at the start and end of the array will not be replaced because there is only one known data value in each case).

Hope this helps.

## Solution 1

Hi!

Smoothing a curve can be done by calculating what is called a moving average. Hereby, each value (data point) in the original data set is replaced by an average calculated from the value itself and its surrounding values.

An example function would be the following:
VB
```''' <summary>
''' Calculates the moving average with the given range over the given array and returns the result.
''' </summary>
''' <param name="arr">The array to use for calculation.</param>
''' <param name="Range">The number of previous and following values to average to the resulting value.</param>
Public Function CalcMovAvg(ByVal arr As Double(), ByVal Range As Byte) As Double()
'Create an array with the same size as the given one
Dim ret(UBound(arr)) As Double

'The indizes between which the values will be averaged; change with each iteration
Dim FromIndex, ToIndex As Integer
'Buffer for average calculation; changes with each iteration
Dim TempAvg As Double

'Iterate through every element in the given array
For i As Integer = 0 To UBound(arr)
'Set start and end indizes (keep array bounds in mind)
FromIndex = If(i < Range, 0, i - Range)
ToIndex = If((UBound(arr) - i) < Range, UBound(arr), i + Range)

'Clear buffer from previous calculations
TempAvg = 0

'Calculate the average from arr(FromIndex) to arr(ToIndex)
For j As Integer = FromIndex To ToIndex
TempAvg += arr(j) / (ToIndex + 1 - FromIndex)
Next

'Save average in resulting array
ret(i) = TempAvg
Next

'Return result
Return ret
End Function```

If you, for example have a series of values like this:
arr = {3, 8, 1, 7, 4, 2}

and have Range set to 1, the function will calculate as follows:
ret(0) = (3 + 8) / 2
ret(1) = (3 + 8 + 1) / 3
ret(2) = (8 + 1 + 7) / 3
...

The higher the value of Range, the more values will be averaged, the smoother the resulting curve will be.

For your second problem of X values sometimes missing, I suppose the simplest solution would be to perform a linear interpolation that would look like this (treating 0-values as missing):

arr = {3, 8, 0, 0, 4, 2}
->
ret = {3, 8, 6.666, 5.333, 4, 2}

You take the existing values that are next to those that are missing (in the example, 8 and 4) and imagine a linear increase/decrease between them. Maybe somebody else can provide you some code for that.

Hope this helped.

[Edit - Changed the < and > in the code to "& l t;" (minus the spaces) and "& g t;" so the code block wouldn't think they were html tags and will format the code properly]

v2