Click here to Skip to main content
15,615,280 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi, All

I'm a hobbyist developer, and I've hit a corker of a problem, too much for my brain. I have a data set that gives me a set of data points - one Y value for every X-value. The curve that these values give is jagged, and I need to smooth the curve. The second problem is that for some data sets, there are some X-values missing. How do I fill in the missing values?

Any help or sample code would be greatly appreciated.
twalib 3-Apr-12 6:07am    
which tools do use

Thanks SO MUCH for all the suggestions - I'm going to apply them to the data and will let you know what happens. If I can get my brain to co-operate, it's gone numb!
Share this answer

As previously stated, the simplest way to fill in the missing data points is to apply "Linear Interpolation". Basically this involves calculating the gradient between the two known data values and then using it to calculate the vertical offset from one of the known data points.

e.g. Given two data points (X1=1, Y1=10) and (X4=4, Y4=25) we can calculate the Y values for the missing X2 and X3 as follows;
<br />
       Gradient = (Y2-Y1)/(X2-X1)<br />
       Y2 = Y1 + Gradient*(X2-X1)<br />
       Y3 = Y1 + Gradient*(X3-X1)<br />

Automatically filling in the gaps in your data array is simply a matter of locating a block of consecutive missing data points in the array and then applying the above formulas to each point in the block.

Private Sub ArrayInterpolation(ByRef arr() As Double)
  Dim FirstMissingItem As Integer = 0
  For Current As Integer = arr.GetLowerBound(0) To arr.GetUpperBound(0)
    If arr(Current) = 0.0 Then
      If FirstMissingItem = 0 Then
        FirstMissingItem = Current  'mark start of block
      End If
      If FirstMissingItem > 0 Then  'true if preceded by a known value

        'interpolate values for block
        Dim Gradient As Double = (arr(Current) - arr(FirstMissingItem -1)) _
                                 /(Current - FirstMissingItem + 1)
        For MissingItem As Integer = FirstMissingItem To (Current - 1)
           arr(MissingItem) = arr(FirstMissingItem - 1) _
                              + Gradient * (MissingItem - FirstMissingItem + 1)
      End If
      FirstMissingItem = 0  'clear flag
    End If
End Sub

The outer loop identifies blocks of consecutive missing data items. It sets the FirstMissingItem variable to the index of the first missing item encountered; when a second known value is encountered it verifies that the block was preceded by a known data value and then proceeds to the interpolation section. At this point Current will hold the index of the first known data item following the block. The gradient is then calculated between the two known data points (FirstMissingItem-1 and Current) and is used by the inner loop to interpolate values for each item within the block.

The above can be tested by running this small test routine;
Private Sub ArrayTest()
    Dim arr() As Double = {3, 8, 0, 0, 4, 2}
    Call ArrayInterpolation(arr)
    For i As Integer = arr.GetLowerBound(0) To arr.GetUpperBound(0)
        Debug.Print(String.Format("X={0}, Y={1}", i, arr(i)))
    Next i
End Sub

Note that the code given assumes that the array index is equal to the X-coordinate. Also, the above routine only interpolates ... it does not extrapolate (to see this, replace the array initialization value in the test Sub with {0, 3, 8, 0, 0, 4, 2, 0} .. the zeroes at the start and end of the array will not be replaced because there is only one known data value in each case).

Hope this helps.
Share this answer

Smoothing a curve can be done by calculating what is called a moving average. Hereby, each value (data point) in the original data set is replaced by an average calculated from the value itself and its surrounding values.

An example function would be the following:
''' <summary>
''' Calculates the moving average with the given range over the given array and returns the result.
''' </summary>
''' <param name="arr">The array to use for calculation.</param>
''' <param name="Range">The number of previous and following values to average to the resulting value.</param>
Public Function CalcMovAvg(ByVal arr As Double(), ByVal Range As Byte) As Double()
	'Create an array with the same size as the given one
	Dim ret(UBound(arr)) As Double

	'The indizes between which the values will be averaged; change with each iteration
	Dim FromIndex, ToIndex As Integer
	'Buffer for average calculation; changes with each iteration
	Dim TempAvg As Double

	'Iterate through every element in the given array
	For i As Integer = 0 To UBound(arr)
		'Set start and end indizes (keep array bounds in mind)
		FromIndex = If(i < Range, 0, i - Range)
		ToIndex = If((UBound(arr) - i) < Range, UBound(arr), i + Range)

		'Clear buffer from previous calculations
		TempAvg = 0

		'Calculate the average from arr(FromIndex) to arr(ToIndex)
		For j As Integer = FromIndex To ToIndex
			TempAvg += arr(j) / (ToIndex + 1 - FromIndex)

		'Save average in resulting array
		ret(i) = TempAvg

	'Return result
	Return ret
End Function

If you, for example have a series of values like this:
arr = {3, 8, 1, 7, 4, 2}

and have Range set to 1, the function will calculate as follows:
ret(0) = (3 + 8) / 2
ret(1) = (3 + 8 + 1) / 3
ret(2) = (8 + 1 + 7) / 3

The higher the value of Range, the more values will be averaged, the smoother the resulting curve will be.

For your second problem of X values sometimes missing, I suppose the simplest solution would be to perform a linear interpolation that would look like this (treating 0-values as missing):

arr = {3, 8, 0, 0, 4, 2}
ret = {3, 8, 6.666, 5.333, 4, 2}

You take the existing values that are next to those that are missing (in the example, 8 and 4) and imagine a linear increase/decrease between them. Maybe somebody else can provide you some code for that.

Hope this helped.

[Edit - Changed the < and > in the code to "& l t;" (minus the spaces) and "& g t;" so the code block wouldn't think they were html tags and will format the code properly]
Share this answer

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900