Circular Values Math and Statistics with FORTRAN

Andy Allinger

4.95/5 (5 votes)

Dec 12, 2013

Public Domain

10 min read

19558

218

This is an alternative for "Circular Values Math and Statistics with C++11".

Introduction

Circular numbers appear in many applications. Time of day and compass heading the two most important, but there are others: musical intervals, gear trains, and waveforms can be measured in terms of cyclic variables. What sets these measurements apart is that they wrap at some point and an increase past 12:59:59, for example, resets to 1:00:00. This makes operations as familiar as subtraction and averaging treacherous on cyclic quantities.

Lior Kogan's article on the topic was very thorough. Perhaps a less rigorous treatment is in order. This article will show how to represent cyclic numbers, and add, subtract, and average them, in plain honest FORTRAN, without defining any special data types or proving any theorems. A new algorithm with complexity n log n will be introduced for finding the median of a set of cyclic data . The proposed average will preserve not only phase information, but also "lap" information of the input.

Background

My interest in the topic started with music theory. Low C sounds like high C. Musical notes are cyclic with period 12 (in the usual Western 12-tone equally tempered system). One systematic way to represent notes is by MIDI note number. In this system middle C is 60, and naturally B is 59. Their difference is 1. High C, 12 steps higher at 72, also sounds much like middle C. Treating note number as a cyclic quantity with period 12 makes every note equivalent to every note 12 steps apart. Thus, 72 ~= 60, and 60 - 59 = 1

But it's not that easy. The difference between 1 degrees and 359 degrees is 2, not 358. The real difference between the values is measured by the shortest path between them, and with cyclic variables, there are always two paths to consider. (Plus an infinite number of paths obtained by making full laps around the cycle!)

For a full discussion see Kogan's article or Wikipedia. The basic data types and operations will now be described:

Representation

Cyclic variables are adequately represented by default REAL's (C type float). The period of a cyclic variable (eg 24 hours, 360 degrees) can be adequately represented by another default REAL, which shall be a positive number. Let t denote the period, and X the data.

Phase

Various units are used to measure phase, such as degrees, radians, or cycles. Generally, phase may be measured in the same units as the input data. For example 1:00pm on a 24-hour period, has the phase of 13.0 hours. The intrinsic function MOD (C %) is adequate to the task. The phase p, of a measurement x, in period t is:

p = MOD(x, t)      !  that's p = x % t

To make comparisons possible, all phase data must lie in the same range. Let us choose the range 0...T. (-T/2 ... T/2 is another reasonable choice.) Negative data has a negative phase that should be brought into the standard range 0...T. That is:

IF (p < 0.) p = p + t

Addition

Cyclic quantities add just like linear quantities.

x = y + z

If phase is desired, then take

p = MOD(x, t)
IF (p < 0.) p = p + t

Subtraction

Now things start to become more complicated. Two phase differences must be distinguished: a directed difference and an absolute difference. If direction is not important, the absolute difference can be found be subtraction, followed by a choice between the two paths around the circle.

! absolute phase difference between point A and point B
dabs = MOD(ABS(b - a), t)
dabs = MIN(dabs, t - dabs)

To preserve the direction, a little more care must be taken:

! directed phase difference  from point A to point B
ddir = MOD(b - a, t)
IF (ddir < - t/2.) THEN
    ddir = ddir + t
ELSE IF (ddir > t/2.) THEN
    ddir = ddir - t
END IF

The result lies between -t/2 and t/2.

Scalar Multiplication

Cyclic quantities may be multiplied by a scalar in the usual way:

x = x * s

If the purpose is to change the unit of measurement, then it is important to remember to scale the period also:

t = t * s

Cyclic Averages

Vector-sum Average

The following question was asked on a web forum at control.com:

Posted by Ville on 18 May, 2005 - 7:46 pm

Hi! I have a weather station which measures both the direction and speed of the wind. It is connected to a SCADA system (TSX57xxx + Monitor Pro). Now I have a mathematical problem with calculating the average direction of the wind .....

The most common method for finding the average of a cyclic quantity is the vector-sum method. http://en.wikipedia.org/wiki/Mean_of_circular_quantities The wind speeds s and directions h may be broken into vector components by:

i = 1, n
x = x + s(i) * COS(h(i))
y = y + s(i) * SIN(h(i))
END DO

Then the average speed and direction are given by the magnitude and angle of the average of x and y:

x  = x / FLOAT(n)
y = y / FLOAT(n)
sav = SQRT(x**2 + y**2)
hav = ATAN2(y, x)

The vector sum of wind measurements taken at regular intervals yields the overall displacement of air (in the neighborhood of the measuring station). Thus, in a physically meaningful sense, it can be said to be the correct average for some kinds of problems.

More generally, it doesn't have the properties that might be hoped for. For example, the average of {0, 0, 90 degrees} ought to be 30 degrees, but the vector sum method yields ATAN2(1, 2) = 26.56 degrees. Is there a way to represent the wrapping of cyclic variables without imposing an artificial 2-D structure on the problem?

The Cyclic Mean

Yes, in the same thread above, David Radcliffe gave a succinct solution:

I propose the following algorithm for calculating the average of a set of directions, measured from 0 to 360 degrees:

Find the average and standard deviation of the given numbers.

Increase the smallest number by 360.

Repeat steps 1 and 2 until all numbers are greater than 360.

Choose the average that yields the smallest standard deviation.

If the average is greater than 360 then subtract 360.

If the circle is thought of as a line sewn together at a seam, Radcliffe's algorithm considers the intervals between all adjacent data points as a possible location for the seam. Since there are n "seams" one must consider, it is possible to have as many as an n-way tie for the least variance. This will occur if all the data points are equally spaced around the circle.

The algorithm was subsequently rediscovered by Edwin Olson. He observed that by using the short formula for the variance: s² = (1/N) (Σ x² - (Σ x)² / N),

the sums and sums-of-squares need only to be updated at each step, with the formulas:

      m1 = m1 + t
      m2 = m2 + 2*t*X(i) + t**2

The Cyclic Median

A simple extension of the foregoing results in a new, efficient algorithm for the cyclic median. Instead of minimizing the variance, the quantity to seek the minimum of is the mean absolute deviation: s₁ = (1/N) Σ |x_i - x_m|

As with the linear median, the cases of odd and even n must be treated separately. Applying the definition of mean absolute deviation to the sorted data:

n s₁ = (x_m - x₁) + (x_m - x₂) + .... + (x_n-1 - x_m) + (x_n - x_m) ( odd case shown)

n s₁' = (x_m' - x₂) + (x_m' - x₃) + ... + (x_n - x_m') + (x₁ + t - x_m')

where if the median x_m, at one step is x_j, the median at the next step, x_m' is x_j+1leading to the update formulas:

m₁' = m₁ + 2x_i + t - x_(n/2+i)%n - x_(n/2+i+1)%n odd case

m₁' = m₁ + 2x_i + t - 2x_(n/2+i)%n even case

Substitute x_n for x₀ as required.

The limiting step is the n log (n) sorting step. Thus, finding the circular median is perfectly feasible even on large data sets.

The update formula for the cyclic median is an original contribution. Kanti Mardia's 1972 textbook defines the circular median in terms of diameter drawn across the circle that divides the data equally. His definition suggests an algorithm of comparable efficiency to the one described here. The new method has the advantage of conceptual similarity to the algorithm for the mean, and it also computes the mean absolute deviation.

Using the Code

Three relevant subroutines are provided: for vector sum average, for the cyclic mean, and for the cyclic median. The language is "legacy" (anything goes) FORTRAN that compiles fine with g95 on a 32-bit ADM running Linux. The prototypes are:

SUBROUTINE TRIGAV (D, n, t, ave)      ! vector-sum average 
         INTEGER n  
         REAL D, t, ave  
         DIMENSION D(n)

where D is a single-precision floating-point array of length n, and t is the period. This function has no tricks. There are at least three inconsistent definitions for the vector-sum standard deviation. TRIGAV does not compute any of them. To be consistent with the mean and median, TRIGAV does not take vector magnitudes into account, only direction.

SUBROUTINE CYMEAN (X, n, t, AVE, ties, dev) 
         INTEGER n, ties 
         REAL X, t, ave, dev 
         DIMENSION X(n), AVE(n)

X is a single-precision array of length n, t is the period. AVE is another single-precision array of length n, because there can be as many as n values of the cyclic mean. The integer ties tells how many averages are actually returned. Array positions past ties are left undefined. The standard deviation is returned in dev.

To tell if to possible averages are actually a tie, we need to test their variances for almost-equality. One way to compare floats for almost-equality is

(ABS(x1 - x2) < tol)

this can fail when tol is not chosen right. If the data is known to be on the order of 1.0, then a value of tol of around machine epsilon should suffice. But if the data is very small numbers, then it may all appear equal, and large numbers may never test equal. tol must be adjusted to suit the data, but if we knew the data in advance we wouldn't have to test it. This is one case where cyclical math is easier. Since phase always lies in 0...t, a typical size is available in t. Setting:

tol = t * EPSILON(t)

should be a safe choice.

Sums are accumulated in double precision, and some other variables are also double precision to avoid needless type conversions.

CYMEAN calls SSORT by Singleton, Jones, Kahaner, and Wisniewski to do the initial sorting. SSORT is included in the distribution.

Cycle information is preserved. For example, taking the period to be 24 hours, averaging 30 and 32 hours yields 31, not 7. This is done by first computing the linear mean, then choosing the shortest distance to the cyclical (phase only) mean described above. This gives an answer correct both in phase and in magnitude--a potentially very valuable feature. Suppose that a drive pinion turns a much larger gear. Take one turn of the drive pinion as the period. Clearly, it is essential to keep track of the number of revolutions of the pinion to know the position of the gear. An average that keeps only phase discards vital information.

SUBROUTINE CYMED (X, n, t, AVE, ties, dev) 
INTEGER n, ties 
REAL X, t, AVE, dev  
DIMENSION X(n), AVE(n)

X is a single precision array of length n. The period is t. The cyclic median is returned in AVE, another single precision array of length n, because there can be as many as n values of the cyclic median. The integer ties tells how many medians were actually found. Array positions past ties are left undefined. The mean absolute deviation is returned in dev.

CYMED calls SSORT. It also calls MEDIAN by Alan Miller to compute the linear median. MEDIAN is included in the distribution. CYMED preserves cycle information by finding the shortest value from the linear median to the phase-only cyclical median, returning an answer correct both in phase and in magnitude.

The package also includes a random-number routine, code for finding machine constants, and a small test program.

Points of Interest

Sometimes a variable with a latent period can be made amenable to cyclic number treatment by rescaling the input. In music theory, the circle-of-fifths distance between two notes is how many steps of five semi-tones is required to move from one note to the other. Chord and key changes usually have short circle-of-fifths distances. Multiplying the MIDI note number by 5, and leaving the period at 12 makes the phase differences of the rescaled data equal the circle-of-fifths distance.

Originally, the average routines chose a return value at random from the array of possible averages. It makes perfect sense to the applications programmer, who must have an answer. But it's better not to irritate the mathematicians.

References

Kanti V. Mardia, Statistics of Directional Data London: Academic Press, 1972. Cited in: Bennett Sango Otieno, An Alternative Estimate of Preferred Direction for Circular Data, Virginia Polytechnic Institute PhD dissertation, 2002.
Edwin Olson, "On computing the average orientation of vectors and lines," http://april.eecs.umich.edu/pdfs/olson2011orientation.pdf
David Radcliffe, brief posting, http://control.com/thread/1026210133

History

12 December 2013 - Version 1.
16 December 2013 - Version 1.1 - important mistake fixed
20 November 2015 - Version 1.2 - minor changes