On-farm experiments

Treatments are applied to adjacent strips to detect spatial variation in treatment response.

Bayesian Statistics and Workflow

Liear mixed model notation

A linear mixed effects model for $\boldsymbol{Y}$ , using the matrix notation, is

Y = X b + Z u + e,

$\boldsymbol{Y} = \boldsymbol{X}\boldsymbol{b} + \boldsymbol{Z}\boldsymbol{u} + \boldsymbol{e},$ and

[\begin{matrix} u \\ e \end{matrix}] \sim N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} Σ_{u} & 0 \\ 0 & Σ_{e} \end{matrix}]) .

$\begin{bmatrix} \boldsymbol{u} \\ \boldsymbol{e} \end{bmatrix} \sim \N\left( \begin{bmatrix} 0\\0 \end{bmatrix}, \begin{bmatrix} \Sigma_u & 0 \\ 0 & \Sigma_e \end{bmatrix}\right).$

Bayesian hierarchical model

At location $s_i$ , the former model is re-written as

\begin{aligned} y (s_{i}) = \sum_{m = 1}^{l} b_{m} x_{m} (s_{i}) + \sum_{j = 1}^{h} u_{j} (s_{i}) z_{j} (s_{i}) + e (s_{i}), \\ u_{i} ∣ θ_{u} \sim N (0, V_{u} (θ_{u})), \\ e (s_{i}) ∣ σ_{e} \sim N (0, σ_{e}^{2}) . \end{aligned}

$\begin{split} & y(s_i) = \sum_{m=1}^{l}b_m x_m(s_i) + \sum_{j=1}^{h}u_j(s_i)z_j(s_i)+e(s_i), \\ & \boldsymbol{u}_i \mid \theta_u \sim \N(0,V_u(\theta_u)), \\ & e(s_i) \mid \sigma_e \sim \N(0,\sigma_e^2). \end{split}$ And,

Y \sim N (X b + Z u, e) .

$\boldsymbol{Y} \sim \N\left(\boldsymbol{X}\boldsymbol{b} + \boldsymbol{Z}\boldsymbol{u} , \boldsymbol{e}\right).$

Particular model

A regression model of particular interest is the quadratic response model. The term associated with the global effects would take the form:

b_{1} + b_{2} x (s_{i}) + b_{3} x^{2} (s_{i}), i = 1, \dots, n,

$b_{1} + b_{2} x(s_{i}) + b_{3} x^{2}(s_{i}), \, i=1,\ldots, n,$ where

x (s_{i})

$x(s_{i})$ is the particular level of some controllable treatment applied at location

s_{i}

$s_{i}$ . Local departures from the global treatment effects

b_{2}

$b_{2}$ and

b_{3}

$b_{3}$ take the form:

u_{1} (s_{i}) + u_{2} (s_{i}) x (s_{i}) + u_{3} (s_{i}) x^{2} (s_{i}), i = 1, \dots, n,

$u_{1}(s_{i}) + u_{2}(s_{i}) x(s_{i}) + u_{3}(s_{i}) x^{2}(s_{i}), \, i=1,\ldots, n,$ where

u_{1} (s_{i})

$u_{1}(s_{i})$ ,

u_{2} (s_{i})

$u_{2}(s_{i})$ , and

u_{3} (s_{i})

$u_{3}(s_{i})$ are spatially correlated local effects corresponding to the location

s_{i}

$s_{i}$ .

Spatially correlated random parameters

For all $\bm{u} = \{\bm{u}_1,\ldots,\bm{u}_3\}$ , the variance is

Σ_{u} = I_{n} \otimes V_{u} .

$\Sigma_u = I_n \otimes V_u.$ Or

Σ_{u} = V_{s} \otimes V_{u} .

$\Sigma_u = V_s \otimes V_u.$

$V_s$ can be $\AR\otimes\AR$ for regular grid data or Matérn covariance function for irregular grid data.

Bayesian process

Suppose $\theta \in \Theta$ is the set of all parameters under consideration. For a given $f:\Theta \rightarrow \mathbb{R}$ , the main focus in the Bayesian approach is to estimate $f(\theta)$ , typically by its conditional expectation, which is given by

E [f (θ) ∣ Y] = \int_{Θ} f (θ) p (θ ∣ Y) d θ .

$\E[f(\theta)\mid \bm{Y}] = \int_{\Theta} f(\theta)p(\theta\mid \bm{Y})d\theta.$ Assuming a prior distribution for

θ

$\theta$ and applying the Bayes theorem we obtain the posterior density function

p (θ ∣ Y)

$p(\theta\mid \bm{Y})$ , which, subsequently, leads to the solution in the above equation.

Posterior distribution

In order to estimate $\theta$ conditional on $\bm{Y}$ , we use the Bayes theorem to obtain the joint posterior density of the parameters in terms of the likelihood $p(\bm{Y}\mid\theta)$ and the prior $\pi(\theta)$ as follows:

p (θ ∣ Y) \propto p (Y ∣ θ) π (θ) .

$p(\theta\mid \bm{Y}) \propto p(\bm{Y}\mid\theta)\pi(\theta).$ The distribution

p (θ ∣ Y)

$p(\theta\mid \bm{Y})$ is the key ingredient for “Bayesian inference” of the parameter

θ

$\theta$ . The posterior distribution

p (θ ∣ Y)

$p(\theta\mid \bm{Y})$ provides all information about

θ

$\theta$ conditional on the observed data (Che and Xu (2010)).

Likelihood

We obtain for multivariate Gaussian distribution

\log p (Y ∣ θ) \propto - \frac{1}{2} (Y - X b - Z u)^{⊤} Σ_{e}^{- 1} (Y - X b - Z u) - \frac{1}{2} \ln det Σ_{e},

$\log p(\bm{Y}\mid\theta) \propto -\frac{1}{2} (\bm{Y}-\bm{X}\bm{b}-\bm{Z}\bm{u})^\top \Sigma_e^{-1}(\bm{Y}-\bm{X}\bm{b}-\bm{Z}\bm{u}) -\frac{1}{2}\ln\det\Sigma_e,$ and for multivariate Student-

t

$t$ distribution

\begin{aligned} \log p (Y ∣ θ) \propto & - \frac{ν + n}{2} \ln (1 + \frac{1}{ν} (Y - X b - Z u)^{⊤} Σ_{e}^{- 1} (Y - X b - Z u)) \\ - \frac{n}{2} \ln ν + \ln Γ (\frac{ν + n}{2}) - \ln Γ (\frac{ν}{2}) - \frac{1}{2} \ln det Σ_{e}, \end{aligned}

$\begin{split} \log p(\bm{Y}\mid\theta) \propto &-\frac{\nu+n}{2}\ln \left( 1+\frac{1}{\nu}(\bm{Y}-\bm{X}\bm{b}-\bm{Z}\bm{u})^\top\Sigma_e^{-1}(\bm{Y}-\bm{X}\bm{b}-\bm{Z}\bm{u}) \right) \\ & -\frac{n}{2}\ln\nu + \ln \Gamma(\frac{\nu+n}{2}) - \ln\Gamma(\frac{\nu}{2})-\frac{1}{2}\ln\det \Sigma_e, \end{split}$ where

ν \geq 1

$\nu\geq 1$ is the degrees of freedom.

Prior specification

Usually, if nothing is known from earlier studies, we can use a flat non-informative prior $p(\theta) (\propto \mbox{constant})$ , also called an “improper prior” (Gelman et al. (2006)).
In many circumstances, a Cauchy or Gamma prior is a reasonable candidate for regression coefficients. Inverse Wishart (IW) or inverse Gamma as the prior distribution for the standard deviation parameter of a hierarchical model.
Gelman et al. (2006), Gelman, Simpson, and Betancourt (2017) suggested weakly informative priors for variance parameters for Bayesian analyses of hierarchical linear model

Prior specification

To specify a prior distribution for the parameters associated with the variance-covariance matrix $V_u$ , note that the matrix can be decomposed as follows:

V_{u} = B (σ_{u}) R_{u} B (σ_{u}),

$V_u = B(\sigma_u)R_u B(\sigma_u),$ where

B (σ_{u})

$B(\sigma_u)$ denotes the diagonal matrix with diagonal elements

σ_{u_{1}}, \dots, σ_{u_{h}}

$\sigma_{u_1},\ldots,\sigma_{u_h}$ , the standard deviation of

u_{1}, \dots, u_{h}

$u_1,\ldots,u_h$ , and

R_{u}

$R_u$ is the matrix whose diagonal elements are equal to unity and off-diagonal elements are the correlation coefficients between the random effects.

Prior specification

For the matrix $R_u$ with correlation coefficients, we specify the Lewandowski-Kurowicka-Joe (LKJ) distribution (Lewandowski, Kurowicka, and Joe (2009)) as the prior distribution, and this specification is given by

R_{u} \sim LKJcorr (ϵ),

$R_u \sim \text{LKJcorr}(\epsilon),$ where

LKJcorr (ϵ)

$\text{LKJcorr}(\epsilon)$ is a positive definite correlation matrix sampled from the LKJ distribution that depends on the value of a positive parameter

ϵ

$\epsilon$ . The parameter

ϵ

$\epsilon$ controls the correlations in a way that, as the value of

ϵ

$\epsilon$ increases, the correlations amongst parameters decrease.

Hamiltonian Monte Carlo (HMC)

Hamiltonian Monte Carlo (HMC) (Brooks et al. (2011), Duane et al. (1987)) is an efficient Markov chain Monte Carlo (MCMC) method that overcomes the inefficiency associated with the random walk and with the sensitivity to correlated parameters.
An important step in HMC is the drawing of a set of auxiliary momentum variables $r=\lbrace r_1,\ldots,r_d\rbrace$ , independently from the standard normal distribution for each parameter in the set $\theta=\lbrace\theta_1,\ldots,\theta_d\rbrace$ . The joint density function $f(\theta,r)$ of $\theta$ and $r$ is given by
$f (θ, r) \propto \exp {L (θ) - K (r)} = \exp {- H (θ, r)},$ $f(\theta,r) \propto \exp \lbrace L(\theta)-K(r) \rbrace = \exp \lbrace -H(\theta,r) \rbrace,$ where $H(\theta,r)$ is the Hamiltonian system dynamics (HSD) equation with potential energy $L(\theta)$ and kinetic energy $K(r)$ .

Hamiltonian Monte Carlo (HMC)

The HSD is numerically approximated in discrete time space with the leapfrog method to maintain the total energy when a new sample $(\theta^*,r^*)$ is drawn.
The leapfrog method requires two parameters: (i) a step size $\epsilon$ , representing the distance between two consecutive draws, and (ii) a desired number of steps $L$ , required to complete the process. A new sample is accepted with the probability
$α = min {1, \frac{f (θ^{*}, r^{*})}{f (θ, r)}} .$ $\alpha = \min \left\{ 1, \frac{f(\theta^*,r^*)}{ f(\theta,r)} \right\}.$

No-U-Turn Sampler (NUTS)

No-U-Turn Sampler (NUTS) determines the step size adaptively during the warm-up (burn-in) phase to a target acceptance rate and uses it then for all sampling iterations (Hoffman and Gelman (2014), Monnahan, Thorson, and Branch (2017)).

The NUTS also eliminates the need to specify a value of $L$ by using the criterion

\frac{d}{d t} \frac{(θ^{*} - θ) \cdot (θ^{*} - θ)}{2} = (θ^{*} - θ) \cdot \frac{d}{d t} (θ^{*} - θ) = (θ^{*} - θ) \cdot r^{*} < 0,

$\frac{d}{dt}\frac{(\theta^*-\theta)\cdot(\theta^*-\theta)}{2}=(\theta^*-\theta)\cdot\frac{d}{dt}(\theta^*-\theta)=(\theta^*-\theta)\cdot r^* <0,$ where

r^{*}

$r^*$ is the current momentum and

(θ^{*} - θ)

$(\theta^*-\theta)$ is the distance from the initial position to the current position. The idea is that the trajectory will keep exploring the space until

θ^{*}

$\theta^*$ starts to move back towards

θ

$\theta$ .

No-U-Turn Sampler (NUTS)

The slice sampling generates a finite set of samples of the form $(\theta,r)$ during the doubling procedure and the binary tree building process by randomly taking forward and backward leapfrog steps until

\begin{matrix} (θ^{+} - θ^{-}) \cdot r^{-} < 0 & or & (θ^{+} - θ^{-}) \cdot r^{+} < 0, \end{matrix}

$\begin{matrix} (\theta^+-\theta^-)\cdot r^- <0 & \mbox{or} & (\theta^+-\theta^-)\cdot r^+ <0, \end{matrix}$ where

(θ^{-}, r^{-})

$(\theta^-, r^-)$ and

(θ^{+}, r^{+})

$(\theta^+, r^+)$ are the leftmost and rightmost leaves, respectively, in the subtree. The best candidate

(θ^{*}, r^{*})

$(\theta^*,r^*)$ is uniformly sampled from the subset of all candidate values of

(θ, r)

$(\theta,r)$ .

No-U-Turn Sampler (NUTS)

Example of building a binary tree via repeated doubling (Hoffman and Gelman (2014)).

Posterior predictive checking

The posterior predictive (PP) checking uses the posterior distribution of the model parameters to regenerate the observations.

Let $\bm{Y}^{rep}$ denote a simulated or replicated data set, generated using the posterior predictive distribution

p (Y^{r e p} ∣ Y) = \int p (Y^{r e p} ∣ θ) p (θ ∣ Y) d θ .

$p(\bm{Y}^{rep} \mid \bm{Y}) = \int p(\bm{Y}^{rep} \mid \theta)p(\theta \mid \bm{Y}) d\theta.$ To assess the fitted model, several data sets are simulated from

p (Y^{r e p} ∣ Y)

$p(\bm{Y}^{rep} \mid \bm{Y})$ , and each of them is compared with the observed data

Y

$\bm{Y}$

Leave-one-out (LOO) cross validation

In Bayesian statistics, the expected $\log$ LOO predictive density (ELPD) is used to measure the predictive accuracy :

elpd = \sum_{i = 1}^{n} \log p (y_{i} ∣ y_{- i}),

$\mbox{elpd} = \sum_{i=1}^{n}\log p(y_i\mid y_{-i}),$ where

p (y_{i} ∣ y_{- i}) = \int p (y_{i} ∣ θ) p (θ ∣ y_{- i}) d θ

$p(y_i \mid y_{-i}) = \int p(y_i \mid \theta)p(\theta \mid y_{-i})d\theta$ is the LOO predictive density with the

i

$i$ -th observation omitted from the data set (Vehtari, Gelman, and Gabry (2017)).

Bürkner, Gabry, and Vehtari (2021) proposed approximated LOO CV, which uses only a single model fit and calculating the pointwise $\log$ predictive density as a fast approximation to the exact LOO CV.

Pareto-smoothed importance-sampling (PSIS)

The LOO estimate is improved by using Pareto smoothed importance sampling which applies a smoothing procedure to the importance weights (Vehtari, Gelman, and Gabry (2017)).

The PSIS-LOO-CV estimate is computed taking a weighted sum over all $n$ pointwise $\log$ -likelihood by

\hat{psis} = \sum_{i = 1}^{n} \log (\frac{\sum_{m = 1}^{M} p (y_{i} ∣ θ^{(m)}) w_{i}^{(m)}}{\sum_{m = 1}^{M} w_{i}^{(m)}}),

$\widehat{\mbox{psis}} = \sum_{i=1}^n\log\left( \frac{ \sum_{m=1}^{M} p(y_i\mid \theta^{(m)})w_i^{(m)} }{ \sum_{m=1}^{M}w_i^{(m)} }\right),$ where

w_{i}^{(m)}

$w_i^{(m)}$ are stabilised weights computed during PSIS,

m = 1, \dots, M

$m = 1, \ldots, M$ .

Pareto-smoothed importance-sampling (PSIS)

The estimated shape parameter $\hat{k}$ of the generalized Pareto distribution can be used to assess the reliability of the estimate:

If $k<\frac{1}{2}$ , the variance of the raw importance ratios is finite, the central limit theorem holds, and the estimate converges quickly.
If $\frac{1}{2}<k<1$ , the variance of the raw importance ratios is infinite but the mean exists, the generalized central limit theorem for stable distributions holds, and the convergence of the estimate is slower. The variance of the PSIS estimate is finite but may be large.
If $k>1$ , the variance and the mean of the raw ratios distribution do not exist. The variance of the PSIS estimate is finite but may be large.

Bayesian $R^2$

The Bayesian $R^2$ (Gelman et al. (2019)) is presented as the variance of the predicted values divided by the variance of predicted values plus the expected residual variance

Bayesian R^{2} = \frac{V a r (Y^{p r e d})}{V a r (Y^{p r e d}) + V a r (r e s)} .

$\mbox{Bayesian } R^2 = \frac{\Var(\bm{Y}^{pred})}{\Var (\bm{Y}^{pred})+\Var(\bm{res})}.$

It should not be interpreted solely if the model has a large number of bad Pareto $\hat{k}$ values, i.e., values greater than 0.7 or, even worse, greater than 1.

The proportion of variance explained, classical R2 = V(hat(y))/V(y) is a commonly used measure of model fit, and there is a long literature on interpreting it, adjusting it for degrees of freedom used in fitting the model, and generalizing it to other settings such as hierarchical models; see, for example, Xu (2003) and Gelman and Pardoe (2006).

Two challenges arise in defining R2 in a Bayesian context. The first is the desire to reflect posterior uncertainty in the coefficients, which should remove or at least reduce the overfitting problem of least squares. Second, in the presence of strong prior information and weak data, it is possible for the fitted variance, V(hat(y)) to be higher than total variance V(y) so that the classical formula (1) can yield an R2 greater than 1 (Tjur 2009).

Las Rosas data

To obtain the map of locally varying optimal input rates, we specified a quadratic regression model, in which the corn yield is modelled as a quadratic function of the nitrogen rate. The optimal treatment can be determined by estimating the coefficients of the quadratic regression model at each grid point.

	Model 1	Model 2	Model 3	Model 4
Spatial correlation	No	Yes	No	Yes
$\Var(\bm{u})$	$I_{n\times n}\otimes V_u$	$V_s\otimes V_u$	$I_{n\times n}\otimes V_u$	$V_s\otimes V_u$
Distribution	Gaussian	Gaussian	Student- $t$	Student- $t$

Prior predictive simulations

	Model 1	Model 2	Model 3	Model 4
$b_0$	$\N(80,10)$
$b_1$	$\N(0, 0.01)$
$b_2$	$\N(0, 0.001)$
$\sigma_0$	$\N_+(0, 1)$
$\sigma_1$	$\N_+(0, 0.01)$
$\sigma_2$	$\N_+(0, 0.001)$
$\sigma_e$	$\N_+(0, 1)$
$R_u$	—	LKJcorr(1)	—	LKJcorr(1)
$\rho_c$	—	$U(0,1)$	—	$U(0,1)$
$\rho_r$	—	$U(0,1)$	—	$U(0,1)$
$\nu$	—	—	$\Gamma(2,0.1)$	$\Gamma(2,0.1)$

Posterior checking (LOO PIT)

LOO CV predictive cumulative density plots can also be used to assess the performance of fitted models. A model is well calibrated for continuous responses when the corresponding plot shows asymptotically uniform behaviour . Figure $\ref{fig:pitloo}$ compares the density of the computed (LOO PIT) (the thick dark curve) with the 100 simulated data sets from a standard uniform distribution (the thin light curves). It is evident from Figure $\ref{fig:pitloo}$ that Model 1 and 3 are miscalibrated. Although the Model 2 fit seems good, the frown shape of the curve indicates inferior calibration than Model 4. This implies that Model 2 is either misspecified or too flexible. A flexible model often has the capability of predicting successfully out-of-sample data. However, amongst the four fitted models, Model 4 demonstrates the best fit for the Las Rosas data set.

Pareto $\hat{k}$

	Model 1			Model 2			Model 3			Model 4
	Count	Per	M.Eff	Count	Per	M.Eff	Count	Per	M.Eff	Count	Per	M.Eff
(-Inf, 0.5] (good)	28	1.7%	457	1585	94.7%	432	1474	88.1%	494	1672	99.9%	868
(0.5, 0.7] (ok)(good)	372	22.2%	112	83	5.0%	103	176	10.5%	254	2	0.1%	1733
(0.7, 1] (bad)	1138	68.0%	18	4	0.2%	70	24	1.4%	170	0	0.0%	0
(1, Inf) (very bad)	136	8.1%	8	2	0.1%	4	0	0.0%	0	0	0.0%	0

The reliability and approximate convergence rate of the PSIS-based estimates can be assessed using the estimates for the shape parameter k of the generalized Pareto distribution:

If k<0.5 then the distribution of raw importance ratios has finite variance and the central limit theorem holds. However, as k approaches 0.5 the RMSE of plain importance sampling (IS) increases significantly while PSIS has lower RMSE.

If 0.5≤k<1 then the variance of the raw importance ratios is infinite, but the mean exists. TIS and PSIS estimates have finite variance by accepting some bias. The convergence of the estimate is slower with increasing k. If k is between 0.5 and approximately 0.7 then we observe practically useful convergence rates and Monte Carlo error estimates with PSIS (the bias of TIS increases faster than the bias of PSIS). If k>0.7 we observe impractical convergence rates and unreliable Monte Carlo error estimates.

If k≥1 then neither the variance nor the mean of the raw importance ratios exists. The convergence rate is close to zero and bias can be large with practical sample sizes

Model evaluation

	Model 1		Model 2		Model 3		Model 4
	Estimate	SE	Estimate	SE	Estimate	SE	Estimate	SE
elpd	-7236.2	13.4	-4945.2	134.8	-7848.4	17.1	-4734.3	38.3
ploo	1487.1	11.7	341.8	41.3	241.2	6.8	516.1	10.5
looic	14472.5	26.7	9890.4	269.6	15696.8	34.3	9468.7	76.7

	Median	CI	Median	CI	Median	CI	Median	CI
Bayesian $R^2$	0.842	0.563 $\sim$ 0.965	0.974	0.972 $\sim$ 0.977	0.190	0.135 $\sim$ 0.251	0.989	0.987 $\sim$ 0.991

Compare with GWR

	GWR	Bayesian
Inference	with neighbouring data	with all data
Initialisation	bandwidth selection	prior specification
Objective	local log-likelihood	global log-likelihood
Evaluation	$t$ scores	credible intervals
	$p$ -values	PP check and LOO PIT
		Pareto $k$ diagnosis
		Bayesian $R^2$

In GWR, the results crucially depends on the bandwidth of the selected kernel function. Although an appropriate bandwidth can be selected using spatial cross validation, it is computationally challenging for large data sets. To estimate the regression parameters for a query location, the neighbouring observations are given more weight than the distant ones in GWR. On the contrary, the proposed Bayesian approach uses all data in one go to produce estimates for all grid point, based on a spatial variance matrix defined for the entire field. The Bayesian inference is affected by the choice of priors and the likelihood. However, the influence of the prior reduces if the amount of data increases. The Bayesian approach in general is more flexible than GWR, as it can be easily extended and applied broadly to other applications.

5 / 74

On-farm experiments Treatments are applied to adjacent strips to detect spatial variation in treatment response. Large strip trial

A Bayesian Workflow for Spatially Correlated Random Effects in On-farm Experiment
\[ \newcommand{\E}{\mathrm{E}}...
SAGI West Node Research
On-farm experiments
On-farm experiments
Improve profitability and sustainability
Identify optimum nitrogen rates
Example: Argentinian corn field experiment
Example: Argentinian corn field experiment
Example: Argentinian corn field experiment
Inadequacy of a global model
Inadequacy of a global model
Zone-specific models are limiting
Geographically weighted regression
Yield prediction with GWR
Bayesian Statistics and Workflow
Why Bayesian?
Liear mixed model notation
Liear mixed model notation
Bayesian hierarchical model
Particular model
Spatially correlated random parameters
Why use spatially correlated $\Sigma_u$?
Why use spatially correlated $\Sigma_u$?
Bayesian workflow
Bayesian process
Posterior distribution
Likelihood
Prior specification
Prior specification
Prior specification
LKJ Prior
Predictive distribution
Posterior sampling with Markov chains
Posterior sampling with Markov chains
Random walk Metropolis-Hastings
Hamiltonian Monte Carlo (HMC)
Hamiltonian Monte Carlo (HMC)
Hamiltonian Monte Carlo (HMC)
Problems with HMC
No-U-Turn Sampler (NUTS)
No-U-Turn Sampler (NUTS)
No-U-Turn Sampler (NUTS)
No-U-Turn Sampler (NUTS)
No-U-Turn Sampler (NUTS)
Posterior predictive checking
Leave-one-out (LOO) cross validation
Pareto-smoothed importance-sampling (PSIS)
Pareto-smoothed importance-sampling (PSIS)
Bayesian $R^2$
Las Rosas data
Las Rosas data
RStan
RStan
RStan
Prior predictive simulations
Prior predictive simulations
Posterior checking (fitting)
Posterior checking (skewness)
Posterior checking (LOO PIT)
Pareto $\hat{k}$
Model evaluation
Results
Yield
Compare with GWR
Further application of BHM and GWR
Two types of large-strip trial design
Response
Spatial correlation
Results
Results
Take-home message
References
Thank you.