I've taken the link provided above (about cumulative average) as an inspiration to calculate the average incrementally, in a way that no intermediate value gets larger than about three times the maximum absolute value being added. Since I wasn't sure about various effects, such as the current average going negative, I added some test code and asserts to verify it actually works as intended. See the code below:
#define TEST_RANGE
#ifdef TEST_RANGE
#include <assert.h>
void minmax(const double newval, double&minv, double&maxv) {
if (newval < minv)
minv = newval;
else if (newval > maxv)
maxv = newval;
}
#endif
template <class basetype, class container_iterator>
basetype average(const container_iterator& start, const container_iterator& end) {
basetype cumulated_average = 0;
basetype cumulated_remainder = 0;
basetype addendum = 0;
#ifdef TEST_RANGE
double real_avg = 0.0;
double val_min = *start;
double val_max = *start;
double avg_min = cumulated_average;
double avg_max = cumulated_average;
double rem_min = cumulated_remainder;
double rem_max = cumulated_remainder;
#endif
long long n_values = 0;
for (auto pvalue = start; pvalue != end; ++pvalue) {
++n_values;
addendum = cumulated_remainder - cumulated_average + *pvalue;
cumulated_average += addendum/n_values;
cumulated_remainder = addendum%n_values;
#ifdef TEST_RANGE
real_avg += *pvalue;
assert((char)(n_values*cumulated_average + cumulated_remainder - real_avg) == 0);
minmax(*pvalue, val_min, val_max);
minmax(cumulated_average, avg_min, avg_max);
minmax(cumulated_remainder, rem_min, rem_max);
#endif
}
#ifdef TEST_RANGE
assert (fabs(n_values*cumulated_average - real_avg) < n_values);
real_avg /= (double)n_values;
#endif
return cumulated_average;
}
void test_average() {
char cvalues[] = { 13,7,-27, 34, -3, 22, 33, -1, 18, 29,
13,7,-27, 34, -3, 22, 33, -1, 18, 29,
13,7,-27, 34, -3, 22, 33, -1, 18, 29,
13,7,-27, 34, -3, 22, 33, -1, 18, 29,
13,7,-27, 34, -3, 22, 33, -1, 18, 29
};
auto cavg = average<char>(cvalues+0, cvalues+50);
}
The last line is how it's used. For passing the values, you can pass any pair of iterators that delimit the range, including simple pointers (as I've done here), provided that these iterators can be incremented and dereferenced. In this example, the cumulative average does go negative a few times, and the total sum adds up to 625, way above the maximum a char can hold.
Note that you may not use a signed type for n_values: I originally used a signed type, but then found that this led to some nasty signed/unsigned conversion effects. You could probably get rid of that with proper casting though.
P.S. these are the formulas I've used:
The general idea is to have two values that correspond to the rounded average, and the remainder of the division, like this:
Cum_avg(n) := sum(x1 ... xn) \ n
Cum_rem(n) := sum(x1 ... xn) % n
These two values then fulfil the follwing equation:
sum(x1...xn) = n*Cum_avg(n) + Cum_rem(n)
The values for the next iteration are then calculated based on the previous values like this:
Cum_avg(n+1) := sum(x1...xn+1) \(n+1)
= (sum(x1...xn)+xn+1) \(n+1)
= (n*Cum_avg(n)+Cum_rem(n)+xn+1) \(n+1)
= ((n+1)*Cum_avg(n)-Cum_avg(n)+Cum_rem(n)+xn+1) \(n+1)
= Cum_avg(n) + (-Cum_avg(n)+Cum_rem(n)+xn+1) \(n+1)
I've used a helper variable called addendum to hold the divisor of the second term. This term is just a sum of three values that are going to be in the normal value range, so the worst that can happen here is an overflow of that term. I still suspect that this can happen under certain circumstances, so the calculation of addendum may need some reworking!
The remainder of the next iteration is then simply the remainder of the division of addendum:
Cum_rem(n+1) := addendum % (n+1)
GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)
|