Next, we incorporate these downscaled samples in GLM inference. At the core of our framework is a spatial Berkson-error model. Berkson error happens when data on explanatory variables are not obtained, collected, or measured imperfectly (Carroll et al., 2006). In this model, an explanatory variable at fine resolution, say X(f)(si) on the BAU located at si, is decomposed into two components:
X(f)(si)=ˆX(f)(si)+η(f)(si),
where ˆX(f)(si) is the mean of the downscaled samples at the i-th BAU, and η(f)(si) represents a downscaling error. Importantly, the downscaling error is spatially correlated, reflecting the natural spatial structure and heterogeneity of environmental processes.
Recall that a GLM with an explanatory variable can be written as, for i=1,…,n,
[Z(f)(si)|Y(f)(si)]=EF(Y(f)(si)),g(Y(f)(si))=β0+β1X(f)(si),
where Z(f)(si) is the response data; [Z(f)(si)|Y(f)(si)] denotes the conditional distribution of Z(f)(si) given Y(f)(si); EF is a distribution that belongs to an exponential family such that E(Z(f)(si)|Y(f)(si))=Y(f)(si); g(⋅) is a link function, with unknown parameters β0 and β1 and a single covariate X(f)(si). Of particular interest is β1, the coefficient of the climate explanatory variable X(f)(si).
To account for the downscaling uncertainty properly, both components, ˆX(f)(si) and η(f)(si), are included in the GLM for inference, as follows:
g(Y(f)(si))=β0+β1X(f)(si)=β0+β1(ˆX(f)(si)+η(f)(si))=β0+β1ˆX(f)(si)+δ(f)(si).
Here, the downscaling uncertainty is incorporated through the component δ(f)(si)≡β1η(f)(si), which is seen to be the downscaling error η(f)(si) scaled by β1. The simulation results in Figure 4 show why including the downscaling error matters: Inference using CORGI (colour blue) is valid compared to the commonly used ‘plug-in’ approach (colour green), which simply substitutes ˆX(f)(si) without accounting for the error (i.e., g(Y(f)(si))=β0+β1ˆX(f)(si)). Another approach called ‘ensemble’ (colour red), shown in Figure 4 directly includes downscaled samples without using the Berkson-error decomposition (i.e., g(Y(f)(si))=β0+β1˜X(f)(si), where ˜X(f)(si) is a downscaled sample). While the ‘ensemble’ approach seems intuitively appealing, it can lead to biased and invalid inferences. Recall that CORGI incorporates downscaling uncertainty in GLM inference. Figure 5 demonstrates Monte Carlo downscaled samples of X(f)=(X(f)(s1),…,X(f)(sN))⊤ and Monte Carlo samples of Y(f)=(Y(f)(s1),…,Y(f)(sN))⊤.
Figure 4: Comparison of CORGI (blue) to other methods, ‘plug-in’ (green) and ‘ensemble’ (red): Bias of estimate ˆβ1 against spatial range ψ0 and spatial variance σ20, respectively (top row); coverage probability (CP) for β1 against spatial range ψ0 and spatial variance σ20, respectively (bottom row). The black line represents the ‘gold standard’ as a GLM was fitted with the ‘ground truth’.


Figure 5: (left panel) 100 Monte Carlo samples of X(f); (right panel) 100 Monte Carlo samples of Y(f) under CORGI.