From Principles to Simulation Evaluations
CCDM & EECMS, Curtin University
02/12/2025
Replication
Blocks
Randomisation
(https://www.livingfarm.com.au)
Traditional research limitations
Small-plot trials: Limited relevance to commercial-scale farming
Controlled conditions: Don’t reflect real-world variability
Long translation time: Years between research and practice
Geographic specificity: Results may not transfer across regions
Farmer engagement: Limited involvement in research design
The OFE solution
Commercial-scale validation: Testing at realistic field sizes
Real-world conditions: Incorporating natural variability and farmer practices
Rapid innovation cycles: Faster feedback and adaptation
Local relevance: Site-specific recommendations
Farmer-scientist collaboration: Co-creation of knowledge
Illustration of some OFE trial designs, including strip-based, grid-based, Latin square, and gradient layouts. These designs support different analytical goals, such as treatment comparison, spatial modelling, and rate-response analysis (Li, Mieno, and Bullock (2023))
Strip trial advantages for OFE
Strip dimensions:
Width: 10-30m (machinery width dependent)
Length: 200m-1km (field size dependent)
Area: 0.2-1.5 ha per strip
Number: 2-6 treatments typical

Traditional paradigm
✓ Replication
✓ Blocks
✓ Randomisation
Clear, controlled, artificial
OFE reality
? Replication (fewer reps)
? Blocks (natural variation)
? Randomisation (farmer constraints)
Practical, realistic, messy
Challenge: How to achieve statistical validity under OFE constraints?


For continuous responses, Prior research (Cao et al. 2022, 2024; Rakshit et al. 2020; Piepho et al. 2011; Pringle, Cook, and McBratney 2004) conclusively showed systematic designs outperform randomised designs for spatial analysis of continuous variables
This study explores statistical strategies for the design and analysis of on-farm experiments (OFE), building on established foundations (Piepho et al. 2011; Pringle, Cook, and McBratney 2004) while advancing methodological innovation.
Do these design principles extend to categorical variables common in modern OFE?
Core concept: Linear Mixed Models treat large strip OFEs similar to Multi-Environment Trials (MET), where pseudo-environments (PE) represent different zones within the field (Stefanova et al. (2023)).

Suppose the entire field is partitioned into \(p\) PEs, and the \(j\)th PE consists of \(n_j\) grid points, such that the total number of grid points in the experiment is \(n=\sum_{j=1}^p n_j\). Let \({y}_j\) be the \(n_{j} \times 1\) vector of observed responses corresponding to the \(j\)th PE. The LMM for the combined vector of data \({Y} = \lbrack{y}^\top_{1}, \ldots, {y}^\top_{p}\rbrack^{\top}\) across all PEs is given by \[\begin{equation}\label{eq:modelmatrix} {Y} = {X}{\tau} + {Z}{u} + {e}, \end{equation}\] where \({\tau}\) and \({u}\) are \(t\times 1\) and \(b \times 1\) vectors of fixed and random effects, respectively. The matrices \({X}\) and \({Z}\) are \(n\times t\) and \(n\times b\) design matrices corresponding to the fixed and random effects, respectively. The \(n\times 1\) vector \({e}\) provides the combined residual effects from all PEs.
Typically, the random effect \({u} = \lbrack {u}^{\top}_{1}, \ldots, {u}^{\top}_{q}\rbrack^{\top}\) is composed of several model terms, with the corresponding design matrix \({Z}\) partitioned as \(\lbrack {Z}_1, \ldots,{Z}_q \rbrack\).
Why PEs matter:
Fields are not uniform - different zones behave differently
One-size-fits-all recommendations often fail
Zone-specific management improves overall performance
Practical implementation:
Clustering approach using elevation, soil, or other covariates
Systematic partitioning when covariates unavailable
3-6 zones typically optimal for most fields
Statistical modeling within each zone

Interpretation
We simulate baseline yield using unconditional Gaussian geo-statistical simulation based on a first-order stationary random field model (Evans et al. (2020)). \[ z(s) = \mu + \varepsilon(s) \] where \(z(s)\) is the simulated yield at location \(s = (x, y)\), fertiliser trials are simulated using different designs and is added to the baseline yield: \[ y(s) = z(s) + \beta \times N(s) \]
Primary research question
How do trial length, number of replications, model structure (spatial vs non-spatial), data granularity (averaged vs full), and layout (strip vs stacked) affect the accuracy and statistical power of treatment effect estimation in OFE?
This design is especially useful when:
Field length is limited, but replication is still required.
High-resolution spatial data is available, allowing for detailed modelling.



Boxplots of relative absolute difference (RAD) across different trial lengths, models, and designs of coefficients of strip trials. Each panel represents a combination of trial length and model (M11–M22), with RAD values compared between randomised and systematic strip designs. Lower RAD values indicate more accurate treatment effect estimation. Randomised designs and models incorporating spatial terms (M12,M22) show improved performance.


Critical Design Thresholds
Minimum 3 replications required for 95%+ power in most scenarios
Strip length matters: Longer strips (\(\ge\) 500m) show consistent power improvement
Spatial models (M12, M22) enhance power by 5-15% over non-spatial models
Full data analysis consistently outperforms averaged data approaches
Design-specific performance
Randomised designs: Superior power for hypothesis testing (0.85-1.0 typical)
Systematic designs: Adequate power (0.70-0.95) with operational advantages
Length effect: 1100m strips achieve 98%+ power with 2 reps vs 70% for 100m strips
Model selection: Spatial correlation models critical for realistic field conditions
Boxplots of relative absolute difference (RAD) for stacked replicate trials across different trial lengths and models. Each panel represents a combination of trial length and model (M11–M23), with RAD values compared between randomised and systematic designs. While randomised designs generally show slightly lower RAD values, the differences are less pronounced than in strip trials.
Critical performance patterns:
M21 (Full non-spatial): Consistently achieves 100% power across all scenarios
M23 models: Generally poor performance (0.1-0.7 power) - avoid in practice
Length sensitivity: Dramatic improvement from 10m (0.0-0.7) to 110m (0.25-1.0) trials
Replication effects: Optimal performance at 10-12 reps, diminishing returns beyond 6 reps
Model selection insights:
Averaged data models: Highly variable power (0.0-1.0) - unreliable for inference
Full data analysis: More consistent and higher power overall
Spatial vs non-spatial: Non-spatial models (M21) surprisingly outperform spatial models (M22, M23)
Design comparison: Randomised slightly better than systematic, but differences minimal
Practical recommendations: For stacked trials, prioritize M21 models with full data analysis and ensure adequate trial length (\(\ge\) 50m) for reliable statistical inference.
Yield monitor technology
Combine harvesters: Real-time yield measurement
GPS integration: Precise location recording
Data logging: Continuous data capture at 1-second intervals
Quality sensors: Moisture, protein, oil content measurement
Auxiliary spatial data
NDVI or multispectral imagery for vegetation indices
Soil electrical conductivity from dual EM surveys
Gamma radiometric data for soil texture
Digital elevation models (DEM) for topographic variation
Weather station data for environmental context
Soil moisture sensors for irrigation management
Objective-driven design: Select design strategy based on research goals and intended analytical approach
LMM for: categorical treatments & zone-based analysis
Categorical comparisons: Varieties, formulations, products
Hypothesis testing: Statistical significance of treatment effects
Zone-specific management: Different areas of the field
Treatment × environment interactions: How treatments perform across zones
GWR for continuous treatments & spatial optimisation
Continuous treatments: Varying nitrogen rates, seeding rates
Spatial optimisation: Site-specific management
Variable-rate applications: Precision agriculture implementation
Local response mapping: Understanding spatial variability
Adoption as standard operating procedures
Our research outputs are not just academic—they set the national standard for on-farm experimentation, guiding the next generation of agricultural innovation in Australia. – Zhanglong Cao
This research and presentation were made possible through the support and collaboration of:
GRDC (Grains Research & Development Corporation)
Australian Grower Groups (Liebe Group, Facey Group, Grower Group Alliance, Consult Ag, Delta Agribusiness, Riverine Plains, Sygenta, Coterva, NSW DPI, DPIRD, DPI QLD and more)
Special thanks to all collaborating farmers, research staff, and postgraduate students for their contributions to on-farm experimentation and data collection.
Special thanks to EECMS, CBADA, C4AP & CCDM Curtin University

ASC2025 | Z. Cao et al.