Error in recursive averaging
In case of nan
-values (e.g. land areas in ocean datasets), when coarsening the data, the average value of the four input cells is currently computed as nanmean
. This yields correct values for the first coarsening step (e.g. if only 3 cells contain values, the result will be sum(3 cells) / 3
). But errors occur for subsequent coarsening steps:
O1| L1|| L2| L3
---+---++---+---
O2| O3|| L4| L5
---+---++---+--- A1|A2
---+---++---+--- --> --+-- --> B
O4| O5|| O6| L6 A3|A4
---+---++---+---
O7| O8|| O9| L7
Assuming the grid above, with ocean cells O1 ... O9
and land cells L1 ... L7
on the finest level, the current algorithm will correctly compute:
A1 = (O1 + O2 + O3) / 3
A2 = 0 / 0 = NaN
A3 = (O4 + O5 + O7 + O8) / 4
A4 = (O6 + O9) / 2
But it will incorrectly compute:
B = (A1 + A3 + A4) / 3
= ((O1 + O2 + O3) / 3 + (O4 + O5 + O7 + O8) / 4 + (O6 + O9) / 2) / 3
= (O1 + O2 + O3) / 9 + (O4 + O5 + O7 + O8) / 12 + (O6 + O9) / 6
!= (O1 + O2 + O3 + O4 + O5 + O6 + O7 + O8 + O9) / 9
The problem arises, because mean()
is not an associative operation. The classical approach for hierarchical computation of averages would be to compute the sum
and count
though all levels and only compute sum / count
as a final step before output. This might however be challenging, in the current architecture, as there's (not yet) a nice way to pass the count
along the coarsening steps...