This vignette covers models where the response is a function of predictor values, measured at a collection of regular distance increments in two dimensions, which will usually represent distance in space (spatial lags) and in time (time lags).
Such scenarios may arise when the response variable depends on environmental conditions over a neighborhood (the “zone of influence” of these environmental factors), but also depends on what these conditions were in the past (may be termed “memory”, “lagged” or “carry-over” effects, depending on contexts). The spatial zone of influence is allowed to vary with the time lag, and the temporal zone of influence may vary with the spatial lag.
The model assumes that the coefficient (effect) of the predictor varies smoothly with the distances at which it is measured.
Mathematical description
For each observation \(i \in \{1, ..., N\}\), the linear predictor includes the additive effects of a matrix predictor \(X_{i}\) consisting of values \(x_{ijk}\) measured on a grid of regular distances increments: \(l_{ijk}\) in one direction (e.g., time lag) and \(s_{ijk}\) in the other (e.g., spatial lag), forming the \(J \times K\) distance matrices \(L_{i}\) and \(S_{i}\). In typical applications, \(l_{ijk}\) is invariant between observations and distances \(k\), so indices \(i\) and \(k\) may be dropped. Likewise, \(s_{ijk}\) is invariant between observations and distances \(j\), so indices \(i\) and \(k\) may be dropped.
Function \(f(l_{j},s_{k})\) acts as a coefficient for \(x_{ijk}\), that varies smoothly with distances \(( l_{j}, s_{k} )\). The shape of \(f(d_k)\) is typically unknown and needs to be estimated from the data, so the function should have sufficient flexibility to ensure a good fit to the data, while being sufficiently constrained to avoid overfitting. In this tutorial we will represent \(f\) with tensor products of penalized regression splines, since they meet these requirements and are easily implemented using standard software.
Implementation using standard software such as mgcv in R requires all array data to be at most 2D arrays. This requires reformatting the \(N \times J \times K\) 3D arrays \(X\), \(L\) and \(S\) by “unfolding” them into \(N \times (J \times K)\) 2D arrays, with resulting column index \(u \in \{1, ..., J \times K\}\) encoding all unfolded combinations of \(j\) and \(k\) as follows.
As in the mathematical description above, we assume a predictor \(x_{ijk}\) measured for any observation \(i \in \{1, ..., N\}\) at all combinations of distances \(l_j\) and \(s_k\) (e.g., temporal and spatial lags, respectively). These are stored in the \(N \times J \times K\) 3D arrays X, L and S respectively. Arrays L and S encode the actual distances values, with equal intervals in each dimension and constant across rows (if these ,conditions are not met, appropriate integration weights should be applied). The units chosen for expressing L or S are arbitrary and do not affect the predictions of the model (only its interpretation).
The R implementation of the model, using package mgcv, is of the form:
thisModel<- gam(y ~ s(Lu, Su, by= Xu), data= exampleData)
Where y is a dataframe column with \(N\) observations. Lu, Su and Xu are the unfolded 2D versions of 3D arrays L, S and Xall \(N \times U\) matrices, with the number of columns U corresponding to the number of distance class combinations \(JK\) over which predictor X is available.
The summation convention applied in mgcv means that when the data fed to the smooth terms are multi-column matrices, the sum of the function evaluation over all columns (after multiplication by the corresponding entry of any by= argument) is returned.
The input data should have the following structure (illustrative example with 6 temporal lags \(L\), 8 spatial lags \(S\) and 100 observations), a sample of which is shown below:
y (response) is typically a single-column vector (may have two columns for binomial family or an arbitrary number for likelihoods that make use of this, such as multivariate normal).
Lu has one column per combination of temporal and spatial lags (here 6x8=48) and is normally row-invariant. Temporal lag values increase left to right and are recycled when the next spatial lag (Su) starts.
Su has one column per combination of temporal and spatial lags (here 6x8=48) and is normally row-invariant. Spatial lags move to their next value only when the temporal lags (Lu) have gone through a full cycle.
Predictor Xu has same dimension as Lu and Su and is NOT row-invariant. When serial correlation in the predictor values exists either across temporal and/or spatial lags, rows of Xu will tend to display serial correlation patterns, as in the illustrative example above. However patterns may be hard to discern as they will be nested within each (unfolded) distance dimension.
Further notes
This model extends the standard “signal regression” model family (see vignette “2-1_Distributed-linear-Effects-Over-1D-Lags”), by allowing values of predictor \(X\) to be taken two dimensions \(D_1\) and \(D_2\), which may be for example space & time, or spectral & excitation frequencies (e.g., Marx and Eilers 2005).
Under typical circumstances (i.e., the \(d_1\) and \(d_2\) increments are regular and there is no missing value in \(X\)), the model is also a form of “distributed lag model”, where the effect of \(x\) has been constrained to be linear for any combination of \((d_1,d_2)\).
Illustration: analysis of kittiwake breeding success
The motivation for the model, its construction and interpretation are best shown by example. Background information about the kittiwake case study can be found in the introductory vignette (“1_Introduction”).
Specific research question
“What is the spatial and temporal zone of influence of SST on kittiwake yearly breeding success?”
A signal regression model with spatial and temporal lags
A direct answer to the “Specific research question” above can be given by a signal regression using SST values at suitable spatial and temporal lags, as predictors.
Here, we will use weekly SST values and assume that the effect of SST does not extend beyond 30 weeks (~ 6 months) in the past, or beyond 210 km from the colony.
The SST satellite images would allow us to work at different time resolutions for SST, from daily to monthly values. A daily resolution (~ 200 values) wouldn’t be problematic for the analysis, but we wouldn’t expect such a fine resolution to be relevant for the biological processes involved in our context. It would also incur a significant and unnecessary computational cost. At the other end, a monthly resolution for SST values would only give us 6 intervals, which may miss biologically relevant variation and limit the scope for fitting accurate spline functions. A weekly resolution with 30 intervals, seems a reasonable trade-off.
We will use 21 distance bands of width 10 km, centered on the colony. We wouldn’t expect distance-dependent processes to be detectable at much finer resolutions than this with the type of predictor and temporal resolution we use, particularly with ecological processes that ought to generalize across colonies.
Data preparation
# loading the data# depending on the configuration of# your system, one of these should worktry(load("kit_SST_sandeel.RData"), silent= T)try(load("../kit_SST_sandeel.RData"), silent= T)# we'll work with the smaller subset of 7 colonieskit1<- kit2[kit2$Site %in% kit_sub1, ]
Modelling parameters
# number of weeks to use as predictorsnweeks<-30# distance lags (in km from colony)lags_sp<-seq(0, 210, by=10)N<-nrow(kit1)D<-length(lags_sp[-1])T<- nweeks
The pre-processed SST data (SST_mean_ring) come in this 3D array format:
30 rows = weekly lags
21 columns = distance bands (“spatial lags”)
161 slices = colony / year combinations
Preparing the predictor matrix thus requires unfolding the array into a matrix with adequate indices.
The following procedure works for the data format above, but will need careful adaptation for different data formats to ensure that the lag indices match across all matrices feeding into the model.
SSTu<- SST_mean_ring[, , kit1$SiteYear]dim(SSTu)<-c(D * T, N)SSTu<-t(SSTu)# visualising the first 3 rows and 8 columns of the matrix:round(SSTu, 2)[1:3, 1:8]
Now, the corresponding spatial and temporal lag indices matrices. We find convenient to annotate lags as negative week numbers, with -30 corresponding to the farthest week in time (w/c 1st Jan of that year) and -1 corresponding to the most recent. Other annotation systems can be used, without effect on the predictions of the model, as long as intervals remain all equal.
The lag combination unfolding is performed by the function expand.grid()
dt<-expand.grid(D= lags_sp[-1], T=-(nweeks:1))Du<-matrix(dt$D, nrow=1)[rep(1, N), ]Tu<-matrix(dt$T, nrow=1)[rep(1, N), ]# visualising the first 3 rows and 8 columns of the matrices:Du[1:3, 1:8]
Visual inputs check (yellow is lowest value, blue highest):
Fitting the model
An offset offset(log(AON + 1)) is used to standardise the number of chicks by the number of monitored nests (see Introduction vignette)
a site random effect is included to account for unknown variation in the mean breeding success between colonies: s(Site, bs= "re")
Data are assumed to follow a Tweedie distribution with a log link, which helps modelling overdispersion in the data.
SST values at each of 30 time lags and 21 distance lags are contained in matrix SSTu. Their effect on kittywake breeding success is assumed to vary smoothly with both the spatial and temporal lag (encoded in matrices Du and Tu) and captured by the term te(Du, Tu, by= SSTu).
library(mgcv)
Loading required package: nlme
This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
The model summary suggests that there is clear variation in average breeding success between colonies, but the effect of SST on kittiwake breeding success isn’t signifiant at any spatial or temporal lag.
Plotting the estimated effects provides us with a richer interpretation. (below with two alternative presentation options, set by scheme)
par(mfrow=c(1, 2))plot(m_dt, scheme=2, scale= F)
plot(m_dt, scheme=1, scale= F, n2=70)
The plot to the left shows a quantile-quantile plot of the random colony effects. With just 7 colonies, this is not terribly informative: some colonies do better than others (+/- 20% around the mean).
The plot to the right is the spatio-temporal lag function, of greatest interest to us. On the x-axis, is the spatial lag as we encoded it and on the y-axis, the temporal lag. The function of distance (“spatial lag”) and week (“temporal lag”) acts as the lag-varying coefficient for the linear effect of SST at a given combination of lags. We therefore care to know if the function is above or below zero, and so if the effect of SST positive or negative, respectively, at particular lags. The location of the zero plane isn’t easy to visualize in the default representations above, so these are not straightworward to interpret.
A custom 3D plotting function for bivariate smoothers can be used to provide
a red zero reference line, indicating no effect
an option to use transparency for sections of the function surface for which \(\delta {SE}\) overlaps zero, with \(\delta\) set by parameter se.mult
custom view rotations and annotations
## function for 3D plotting of 2D smoothers, with reference plane at z= 0# use within rmarkdown may need chunk option *fig.keep= "last"* to be setplot_2Dspline_0<-function(model, term.n=1, transp=TRUE, se.mult=2, theta=40, phi=20, lab_x="X", lab_y="Y", lab_z="f(X, Y)", ...){# *model* fitted mgcv::gam object# *term.n* index (integer) of the desired spline term in the model formula# *transp* logical where TRUE (default) uses transparency (slower to compute) and FALSE uses plain colors (faster)# *se.mult* multiplier for SE# *theta*, *phi* rotations of the perspective plot (in degrees)# *lab_x*, *lab_y*, *lab_z* axis labels (character)# *...* additional arguments to `persp()`, excluding *x, y, z, col, border, ticktype, xlab, ylab, zlab, theta, phi*. model_fit<-plot(model, select= term.n, scheme=1, scale= F, n2=70)par(mfrow=c(1, 1)) z_array<-array(model_fit[[term.n]]$fit, dim=c(length(model_fit[[term.n]]$x), length(model_fit[[term.n]]$y))) se_array<-array(model_fit[[term.n]]$se, dim=c(length(model_fit[[term.n]]$x), length(model_fit[[term.n]]$y)))# identify grid nodes for which SE of fitted values don't cover zero zero_out<- ((z_array - se_array * se.mult) >0) | ((z_array + se_array * se.mult) <0)# remove 1 row and 1 column, to map with number of facets (x and y coords are off by 0.5 grid cell, with this approach) zero_out<- zero_out[-1, -1]# set colors for plotting facet_col<-c("white", "grey") bd<-1if(transp){ facet_col<-rgb(0.5, 0.5, 0.5, c(0, 0.3)) bd<-rgb(0.5, 0.5, 0.5, 0.2) }# plot fitted valuespersp(x= model_fit[[term.n]]$x,y= model_fit[[term.n]]$y,z= z_array,col= facet_col[zero_out +1],border= bd,ticktype="detailed",xlab= lab_x, ylab= lab_y, zlab= lab_z,theta= theta, phi= phi, ...) -> res# define x and y values for the z = 0 plane x_plane<-range(model_fit[[term.n]]$x)[c(1, 2, 2, 1, 1)] y_plane<-range(model_fit[[term.n]]$y)[c(1, 1, 2, 2, 1)]# convert into coordinate system of perspective plot plane<-trans3d(x_plane, y_plane, 0, res)lines(plane, col =2, lwd =1, lty=2)}
The function at week -30 (left side) is negative near the colony and gradually becomes positive at the extreme of the distance range. If this was representative of the truth, this would suggest that early in the year (week beginning 1st Jan), the effect on kittiwake productivity tends to be negative near the colony, i.e. higher SST tends to correlate with lower productivity, and positive further away, i.e. higher SST tends to correlate with higher productivity. Moving towards spring and then summer, the SST effect generally becomes more positive, but following a rather convoluted, bimodal-looking path.
Critical thoughts on this application of the model
Because the distances are measured in all seaward directions here, they do not distinguish between coastal and offshore areas at a specific distance. If the true gradient is inshore-offshore rather than distance from colony, using the distance from colony as a proxy may bias the effect.
In general, as we look further away from the colony, the effect of distant environmental conditions should eventually become irrelevant and close to zero. The fact that the estimated effect of SST increases linearly with distance, could suggest that we did not consider lagged SST values far enough.
Note that the reasoning for point 2 above is generic.
In the specific context of seabirds,
The overall effect is non-significant, suggesting that confidence intervals (~ \(\pm 2 {SE}\) for a 95% CI) include zero at large distances from the colonies
We cannot exclude the hypothesis that variables that are of direct relevance to the seabirds breeding success (e.g., prey abundance) are driven by SST values at locations further away in space or time, as could be the case for seasonally migrating prey.
Model validation
We should look at traditional GAM diagnostics, using deviance residuals.
par(mfrow=c(2, 2))gam.check(m_dt)
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-0.0003229927,0.0005443086]
(score 837.9664 & scale 6.39907).
Hessian positive definite, eigenvalue range [0.0003218041,163.5966].
Model rank = 108 / 108
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Site) 7.00 4.10 NA NA
te(Du,Tu):SSTu 100.00 6.98 NA NA
abline(0, 1)
These residuals diagnostics are broadly acceptable.
When assuming linear effects of a predictor (here, SST), it’s often worth checking for unwanted trends in the residuals against predictor values. With SST broken down into so many distinct predictors (here, 30 lags) panel plots using a sliding window of lags can an effective way of visualizing the patterns.
No apparent trend in the moving averages in red (good news).
# residuals against time lagcoplot(resid ~ SST | Tlag, data= m_dt_res, panel= panel.smooth, number=12, lwd=3)
Both the mean and variance of SST increase, as we move away from winter (left to right, then bottom to top). The moving averages (red) suggest that there could be some non-linearity in the relationship with SST, which was not considered in this model.
Simulations
Loading a function to generate random 2D smooth functions \(f(x,y)\), which will allow us to simulate the “true”/known effect (coefficient) of predictor vector \(X\) measured over a series of \(J\) distances \(L\) and \(K\) distances \(S\).
k1, k2: spline basis size for dimensions 1 and 2 (x and y)
lambda: a vector of length 3 specifying smoothing parameters ‘lambda’ for the 2D tensor product spline
inter: intercept value
x_l, y_l: number of x and y values a which to evaluate 2D smooth
require(mgcv)require(mvtnorm)
Loading required package: mvtnorm
# k1, k2: spline basis size for dimensions 1 and 2 (x and y)# lambda: a vector of length 3 specifying smoothing parameters 'lambda' for the 2D tensor product spline# inter: intercept value# x_l, y_l: number of x and y values a which to evaluate 2D smoothsim_spline_2D<-function(k1=7, k2=7, lambda=c(0.3, 0.3, 3000), inter=0, x_l=100, y_l=100) {# Define the spatial domain x<-seq(0, 1, length.out= x_l) y<-seq(0, 1, length.out= y_l)# Create a grid grd<-expand.grid(x= x, y= y) grd$resp<-rnorm(x_l * y_l)# Open a null file connection to discard the 'jagam()' text output# Check the operating systemif (.Platform$OS.type =="unix") { null_device<-"/dev/null" } else { # Windows null_device<-"NUL" } con<-file(null_device, open="wt")# generate a template gam model object for JAGS to extract the relevant components jd<-jagam(resp ~te(x, y, bs=c("tp", "tp"), k=c(k1, k2)), data= grd, file= con, diagonalize= F)close(con) S1<- jd$jags.data$S1 zero<- jd$jags.data$zero# penalty matrix K1<- S1[1:(k1*k2-1), 1:(k1*k2-1)] * lambda[1] + S1[1:(k1*k2-1), (k1*k2):(2*k1*k2-2)] * lambda[2] + S1[1:(k1*k2-1), (2*k1*k2-1):(3*k1*k2-3)] * lambda[3]# simulate multivariate normal basis coefficients beta_sim<-c(inter, mvtnorm::rmvnorm(n=1, mean= zero[2:(k1*k2)], sigma= K1, method="chol"))# evaluate simulated function at x and y values fct_sim<- jd$jags.data$X %*% beta_simreturn(matrix(fct_sim, x_l, y_l))}# example#sim_2D<- sim_spline_2D(k1= 7, k2= 7, lambda<- c(0.3, 0.3, 3000), inter= 0, x_l= 100, y_l= 100); image(sim_2D)
Now, a function to simulate data from the model, given the known parameter values (assume the same number of distances / lags in both directions)
Output values:
LP: lagged predictor matrix
Lag: lag encoding matrix
tf: true lagged-effect function
Y: response vector
# function to simulate 2D-lagged processsim_lag_of_2D<-function(N=200, nlags=20, residSD=1){# simulate 2D-lag-varying predictor (unfold), where each column represents a lag combination w.r.t. the observation Xu<-matrix(runif(N*nlags*nlags), N, nlags*nlags)# true coefficient function, unfold tfu<-as.vector(sim_spline_2D(x_l= nlags, y_l= nlags))# observations Y<-rnorm(n= N, mean= Xu %*% tfu, sd= residSD)# index matrix LS<-expand.grid(L=1:nlags, S=1:nlags) Lu<-matrix(LS$L, nrow=1)[rep(1, N), ] Su<-matrix(LS$S, nrow=1)[rep(1, N), ]list(Xu= Xu, Y= Y, tfu= tfu, Lu= Lu, Su= Su)}
Let’s simulate 6 random data sets from 6 different sets of random model parameters, with large observation error (SD= 8), and plot the estimates of \(f(l,s)\) (black) together with the true values (red).
In all cases, \(f(l,s)\), the effect of \(X\) across lags \(L\) and \(S\) is very well recovered by the model. For simplicity we have omitted the 95% CI of the estimates from the plots, so a complementary analysis would be required to establish coverage properties.
Proposed exercises
Level 1: repeat the analysis above (spatial and temporal lags of SST). Parameters you could try and vary include the coding of lag values.
Level 2: fit a similar model for data subset kit2 (use object kit_sub2 for subsetting predictor data to match the rows of data set kit2), trying to reconstruct the data (predictor) preparation procedure without copying/pasting from the vignette.
In all cases, pay attention to the structure of the model inputs (vectors and matrices) and take some time to reflect on the interpretation of the model outputs.
Source Code
---title: "[2-2] Distributed linear effects over 2D Lags"author:- name: Thomas Cornulier thomas.cornulier@bioss.ac.uk affiliation: "Biomathematics & Statistics Scotland (https://www.bioss.ac.uk/)"- name: Dave Miller dave.miller@bioss.ac.uk affiliation: "BioSS & UKCEH"date: "Last run on `r format(Sys.time(), '%d/%m/%Y')`"output: html_document: number_sections: true self_contained: true---# The modelThis vignette covers models where the response is a function of predictor values, measured at a collection of regular distance increments in two dimensions, which will usually represent distance in space (spatial lags) and in time (time lags).Such scenarios may arise when the response variable depends on environmental conditions over a neighborhood (the "zone of influence" of these environmental factors), but also depends on what these conditions were in the past (may be termed "memory", "lagged" or "carry-over" effects, depending on contexts). The spatial zone of influence is allowed to vary with the time lag, and the temporal zone of influence may vary with the spatial lag.The model assumes that the coefficient (effect) of the predictor varies smoothly with the distances at which it is measured.## Mathematical descriptionFor each observation $i \in \{1, ..., N\}$, the linear predictor includes the additive effects of a matrix predictor $X_{i}$ consisting of values $x_{ijk}$ measured on a grid of regular distances increments: $l_{ijk}$ in one direction (e.g., time lag) and $s_{ijk}$ in the other (e.g., spatial lag), forming the $J \times K$ distance matrices $L_{i}$ and $S_{i}$. In typical applications, $l_{ijk}$ is invariant between observations and distances $k$, so indices $i$ and $k$ may be dropped. Likewise, $s_{ijk}$ is invariant between observations and distances $j$, so indices $i$ and $k$ may be dropped.The model is of the form:$$\mathbb{E}(y_i) = g^{-1}\left( \beta_0 + \sum_{j} \sum_{k} f(l_{j},s_{k}) x_{ijk}\right)$$Function $f(l_{j},s_{k})$ acts as a coefficient for $x_{ijk}$, that varies smoothly with distances $( l_{j}, s_{k} )$. The shape of $f(d_k)$ is typically unknown and needs to be estimated from the data, so the function should have sufficient flexibility to ensure a good fit to the data, while being sufficiently constrained to avoid overfitting. In this tutorial we will represent $f$ with tensor products of penalized regression splines, since they meet these requirements and are easily implemented using standard software.Implementation using standard software such as `mgcv` in `R` requires all array data to be at most 2D arrays. This requires reformatting the $N \times J \times K$ 3D arrays $X$, $L$ and $S$ by "unfolding" them into $N \times (J \times K)$ 2D arrays, with resulting column index $u \in \{1, ..., J \times K\}$encoding all unfolded combinations of $j$ and $k$ as follows.$$u = (j - 1) K + k$$The reparameterized model is thus:$$\mathbb{E}(y_i) = g^{-1}\left( \beta_0 + \sum_{u} f(l'_{u},s'_{u}) x'_{iu}\right)$$## R implementationAs in the mathematical description above, we assume a predictor $x_{ijk}$ measured for any observation $i \in \{1, ..., N\}$ at all combinations of distances $l_j$ and $s_k$ (e.g., temporal and spatial lags, respectively). These are stored in the $N \times J \times K$ 3D arrays `X`, `L` and `S` respectively. Arrays `L` and `S` encode the actual distances values, with equal intervals in each dimension and constant across rows (if these ,conditions are not met, appropriate integration weights should be applied). The units chosen for expressing `L` or `S` are arbitrary and do not affect the predictions of the model (only its interpretation).The R implementation of the model, using package `mgcv`, is of the form:`thisModel<- gam(y ~ s(Lu, Su, by= Xu), data= exampleData)`Where `y` is a dataframe column with $N$ observations. `Lu`, `Su` and `Xu` are the unfolded 2D versions of 3D arrays `L`, `S` and `X`all $N \times U$ matrices, with the number of columns `U` corresponding to the number of distance class combinations $JK$ over which predictor `X` is available. The summation convention applied in `mgcv` means that when the data fed to the smooth terms are multi-column matrices, the sum of the function evaluation over all columns (after multiplication by the corresponding entry of any `by=` argument) is returned.The input data should have the following structure (illustrative example with 6 temporal lags $L$, 8 spatial lags $S$ and 100 observations), a sample of which is shown below:```{r, echo= F}library(knitr)nr<-100nl<-6ns<-8set.seed(4)X<-array(NA, , dim=c(nl, ns, nr))for(i in1:nr){ X[, , i]<-scale(arima.sim(n= nl, model=list(ar=0.98))) %*%t(scale(arima.sim(n= ns, model=list(ar=0.96))))}dimnames(X)<-list(l=paste0("L", 1:nl), s=paste0("S", 1:ns), NULL)exampleData<-list(y=rnorm(nr, 2.5, 1),X= X)``````{r}head(exampleData$y, n=3)exampleData$X[, 1:5, 1:3]```Note the non-standard indexing of array `X`, with observations coming last, as the 3rd dimension.Next we unfold the 3D array `X`, by fusing dimensions 1 and 2 into one.```{r}Xu<-array(X, dim=c(nl*ns, nr))Xu<-t(Xu) # transpose```and compute the corresponding distance matrices `L` and `S`.Here we assume 6 time lags coded from 0 (current), going back 5 time units and 8 spatial lags (distance bands) ranging from 10 units up to 80 units.Unfolding the combinations of `L` and `S` is done by the built-in R function `expand.grid()`.```{r}ls<-expand.grid(L=-5:0, S= (1:8)*10)Lu<-matrix(ls$L, nrow=1)[rep(1, nr), ]Su<-matrix(ls$S, nrow=1)[rep(1, nr), ]exampleData_unfold<-list(y=rnorm(nr, 2.5, 1),Lu= Lu,Su= Su,Xu= Xu)str(exampleData_unfold$y)str(exampleData_unfold$Xu)str(exampleData_unfold$Lu)str(exampleData_unfold$Su)```Visually (yellow is lowest value, blue highest):```{r, echo= F, fig.width= 25, fig.height= 12}par(mfrow=c(1, 4), mar=c(1.1, 2.1, 4.1, 2.1), bty="n", xaxt ="n", yaxt ="n", cex.main=4)image(t(cbind(exampleData$y, matrix(NA, nr, nl * ns -1))), col=rev(hcl.colors(64)), main="y")image(t(as.matrix(exampleData_unfold$Lu)), col=rev(hcl.colors(64)), main="Lu")image(t(as.matrix(exampleData_unfold$Su)), col=rev(hcl.colors(64)), main="Su")image(t(as.matrix(exampleData_unfold$Xu)), col=rev(hcl.colors(64)), main="Xu")```Key features:* `y` (response) is typically a single-column vector (may have two columns for binomial family or an arbitrary number for likelihoods that make use of this, such as multivariate normal).* `Lu` has one column per combination of temporal and spatial lags (here 6x8=48) and is normally row-invariant. Temporal lag values increase left to right and are recycled when the next spatial lag (`Su`) starts.* `Su` has one column per combination of temporal and spatial lags (here 6x8=48) and is normally row-invariant. Spatial lags move to their next value only when the temporal lags (`Lu`) have gone through a full cycle.* Predictor `Xu` has same dimension as `Lu` and `Su` and is *NOT* row-invariant. When serial correlation in the predictor values exists either across temporal and/or spatial lags, rows of `Xu` will tend to display serial correlation patterns, as in the illustrative example above. However patterns may be hard to discern as they will be nested within each (unfolded) distance dimension.## Further notesThis model extends the standard "signal regression" model family (see vignette "2-1_Distributed-linear-Effects-Over-1D-Lags"), by allowing values of predictor $X$ to be taken two dimensions $D_1$ and $D_2$, which may be for example space & time, or spectral & excitation frequencies (e.g., [Marx and Eilers 2005](https://doi.org/10.1198/004017004000000626)).Under typical circumstances (i.e., the $d_1$ and $d_2$ increments are regular and there is no missing value in $X$), the model is also a form of "distributed lag model", where the effect of $x$ has been constrained to be linear for any combination of $(d_1,d_2)$.# Illustration: analysis of kittiwake breeding successThe motivation for the model, its construction and interpretation are best shown by example. Background information about the kittiwake case study can be found in the introductory vignette ("1_Introduction").## Specific research question"What is the spatial and temporal zone of influence of SST on kittiwake yearly breeding success?"## A signal regression model with spatial and temporal lagsA direct answer to the "Specific research question" above can be given by a signal regression using SST values at suitable spatial and temporal lags, as predictors.Here, we will use weekly SST values and assume that the effect of SST does not extend beyond 30 weeks (~ 6 months) in the past, or beyond 210 km from the colony.> The SST satellite images would allow us to work at different time resolutions for SST, from daily to monthly values. A daily resolution (~ 200 values) wouldn't be problematic for the analysis, but we wouldn't expect such a fine resolution to be relevant for the biological processes involved in our context. It would also incur a significant and unnecessary computational cost. At the other end, a monthly resolution for SST values would only give us 6 intervals, which may miss biologically relevant variation and limit the scope for fitting accurate spline functions. A weekly resolution with 30 intervals, seems a reasonable trade-off.> We will use 21 distance bands of width 10 km, centered on the colony. We wouldn't expect distance-dependent processes to be detectable at much finer resolutions than this with the type of predictor and temporal resolution we use, particularly with ecological processes that ought to generalize across colonies.### Data preparation```{r, warning= F}# loading the data# depending on the configuration of# your system, one of these should worktry(load("kit_SST_sandeel.RData"), silent= T)try(load("../kit_SST_sandeel.RData"), silent= T)# we'll work with the smaller subset of 7 colonieskit1<- kit2[kit2$Site %in% kit_sub1, ]```Modelling parameters```{r}# number of weeks to use as predictorsnweeks<-30# distance lags (in km from colony)lags_sp<-seq(0, 210, by=10)N<-nrow(kit1)D<-length(lags_sp[-1])T<- nweeks```The pre-processed SST data (`SST_mean_ring`) come in this 3D array format:* `r T` rows = weekly lags* `r D` columns = distance bands ("spatial lags")* `r N` slices = colony / year combinationsPreparing the predictor matrix thus requires unfolding the array into a matrix with adequate indices.The following procedure works for the data format above, but will need careful adaptation for different data formats to ensure that the lag indices match across all matrices feeding into the model.```{r}SSTu<- SST_mean_ring[, , kit1$SiteYear]dim(SSTu)<-c(D * T, N)SSTu<-t(SSTu)# visualising the first 3 rows and 8 columns of the matrix:round(SSTu, 2)[1:3, 1:8]```Now, the corresponding spatial and temporal lag indices matrices. We find convenient to annotate lags as negative week numbers, with -30 corresponding to the farthest week in time (w/c 1st Jan of that year) and -1 corresponding to the most recent. Other annotation systems can be used, without effect on the predictions of the model, as long as intervals remain all equal.The lag combination unfolding is performed by the function `expand.grid()````{r}dt<-expand.grid(D= lags_sp[-1], T=-(nweeks:1))Du<-matrix(dt$D, nrow=1)[rep(1, N), ]Tu<-matrix(dt$T, nrow=1)[rep(1, N), ]# visualising the first 3 rows and 8 columns of the matrices:Du[1:3, 1:8]Tu[1:3, 1:8]```Visual inputs check (yellow is lowest value, blue highest):```{r, echo= F, fig.width= 25, fig.height= 12}par(mfrow=c(1, 4), bty="n", xaxt ="n", yaxt ="n")image(t(cbind(kit1$Fledg, matrix(NA, N, D))), col=rev(hcl.colors(64)), main="y")image(t(Du), col=rev(hcl.colors(64)), main="Du")image(t(Tu), col=rev(hcl.colors(64)), main="Tu")image(t(SSTu), col=rev(hcl.colors(64)), main="SSTu")```### Fitting the model* An offset `offset(log(AON + 1))` is used to standardise the number of chicks by the number of monitored nests (see Introduction vignette)* a site random effect is included to account for unknown variation in the mean breeding success between colonies: `s(Site, bs= "re")`* Data are assumed to follow a Tweedie distribution with a log link, which helps modelling overdispersion in the data.SST values at each of 30 time lags and 21 distance lags are contained in matrix `SSTu`. Their effect on kittywake breeding success is assumed to vary smoothly with both the spatial and temporal lag (encoded in matrices `Du` and `Tu`) and captured by the term `te(Du, Tu, by= SSTu)`.```{r}library(mgcv)m_dt<-gam(Fledg ~offset(log(AON +1)) +s(Site, bs="re") +te(Du, Tu, by= SSTu, k=c(10, 10)),data= kit1,family=tw(), method="REML")```### Model output & interpretation```{r}summary(m_dt)```> The model summary suggests that there is clear variation in average breeding success between colonies, but the effect of SST on kittiwake breeding success isn't signifiant at any spatial or temporal lag.Plotting the estimated effects provides us with a richer interpretation.(below with two alternative presentation options, set by `scheme`)```{r}par(mfrow=c(1, 2))plot(m_dt, scheme=2, scale= F)plot(m_dt, scheme=1, scale= F, n2=70)```> The plot to the left shows a quantile-quantile plot of the random colony effects. With just 7 colonies, this is not terribly informative: some colonies do better than others (+/- 20% around the mean). > The plot to the right is the spatio-temporal lag function, of greatest interest to us. On the x-axis, is the spatial lag as we encoded it and on the y-axis, the temporal lag. The function of distance ("spatial lag") and week ("temporal lag") acts as the lag-varying coefficient for the linear effect of SST at a given combination of lags. We therefore care to know if the function is above or below zero, and so if the effect of SST positive or negative, respectively, at particular lags. The location of the zero plane isn't easy to visualize in the default representations above, so these are not straightworward to interpret. A custom 3D plotting function for bivariate smoothers can be used to provide* a red zero reference line, indicating no effect* an option to use transparency for sections of the function surface for which $\delta {SE}$ overlaps zero, with $\delta$ set by parameter `se.mult`* custom view rotations and annotations```{r}## function for 3D plotting of 2D smoothers, with reference plane at z= 0# use within rmarkdown may need chunk option *fig.keep= "last"* to be setplot_2Dspline_0<-function(model, term.n=1, transp=TRUE, se.mult=2, theta=40, phi=20, lab_x="X", lab_y="Y", lab_z="f(X, Y)", ...){# *model* fitted mgcv::gam object# *term.n* index (integer) of the desired spline term in the model formula# *transp* logical where TRUE (default) uses transparency (slower to compute) and FALSE uses plain colors (faster)# *se.mult* multiplier for SE# *theta*, *phi* rotations of the perspective plot (in degrees)# *lab_x*, *lab_y*, *lab_z* axis labels (character)# *...* additional arguments to `persp()`, excluding *x, y, z, col, border, ticktype, xlab, ylab, zlab, theta, phi*. model_fit<-plot(model, select= term.n, scheme=1, scale= F, n2=70)par(mfrow=c(1, 1)) z_array<-array(model_fit[[term.n]]$fit, dim=c(length(model_fit[[term.n]]$x), length(model_fit[[term.n]]$y))) se_array<-array(model_fit[[term.n]]$se, dim=c(length(model_fit[[term.n]]$x), length(model_fit[[term.n]]$y)))# identify grid nodes for which SE of fitted values don't cover zero zero_out<- ((z_array - se_array * se.mult) >0) | ((z_array + se_array * se.mult) <0)# remove 1 row and 1 column, to map with number of facets (x and y coords are off by 0.5 grid cell, with this approach) zero_out<- zero_out[-1, -1]# set colors for plotting facet_col<-c("white", "grey") bd<-1if(transp){ facet_col<-rgb(0.5, 0.5, 0.5, c(0, 0.3)) bd<-rgb(0.5, 0.5, 0.5, 0.2) }# plot fitted valuespersp(x= model_fit[[term.n]]$x,y= model_fit[[term.n]]$y,z= z_array,col= facet_col[zero_out +1],border= bd,ticktype="detailed",xlab= lab_x, ylab= lab_y, zlab= lab_z,theta= theta, phi= phi, ...) -> res# define x and y values for the z = 0 plane x_plane<-range(model_fit[[term.n]]$x)[c(1, 2, 2, 1, 1)] y_plane<-range(model_fit[[term.n]]$y)[c(1, 1, 2, 2, 1)]# convert into coordinate system of perspective plot plane<-trans3d(x_plane, y_plane, 0, res)lines(plane, col =2, lwd =1, lty=2)}```Applying the function to term 2 of the model:```{r, fig.keep= "last"}par(mfrow=c(1, 1), mar=c(1.1, 4.1, 4.1, 2.1), cex.lab=1.2, cex.axis=0.6)plot_2Dspline_0(m_dt, term.n=2, transp= T, se.mult=1,lab_x="Distance (km)", lab_y="Weeks",lab_z="f(Distance, Weeks)")```> The function at week -30 (left side) is negative near the colony and gradually becomes positive at the extreme of the distance range. If this was representative of the truth, this would suggest that early in the year (week beginning 1st Jan), the effect on kittiwake productivity tends to be negative near the colony, i.e. higher SST tends to correlate with lower productivity, and positive further away, i.e. higher SST tends to correlate with higher productivity. Moving towards spring and then summer, the SST effect generally becomes more positive, but following a rather convoluted, bimodal-looking path.### Critical thoughts on this application of the model1. Because the distances are measured in all seaward directions here, they do not distinguish between coastal and offshore areas at a specific distance. If the true gradient is inshore-offshore rather than distance from colony, using the distance from colony as a proxy may bias the effect.2. In general, as we look further away from the colony, the effect of distant environmental conditions should eventually become irrelevant and close to zero. The fact that the estimated effect of SST increases linearly with distance, could suggest that we did not consider lagged SST values far enough.Note that the reasoning for point 2 above is generic. In the specific context of seabirds,* The overall effect is non-significant, suggesting that confidence intervals (~ $\pm 2 {SE}$ for a 95% CI) include zero at large distances from the colonies* We cannot exclude the hypothesis that variables that are of direct relevance to the seabirds breeding success (e.g., prey abundance) are driven by SST values at locations further away in space or time, as could be the case for seasonally migrating prey.### Model validationWe should look at traditional GAM diagnostics, using deviance residuals.```{r}par(mfrow=c(2, 2))gam.check(m_dt)abline(0, 1)```> These residuals diagnostics are broadly acceptable.When assuming linear effects of a predictor (here, SST), it's often worth checking for unwanted trends in the residuals against predictor values. With SST broken down into so many distinct predictors (here, 30 lags) panel plots using a sliding window of lags can an effective way of visualizing the patterns.```{r}m_dt_res<-data.frame(resid=residuals(m_dt), SST=as.vector(SSTu[-m_dt$na.action, ]),Dlag=as.vector(Du[-m_dt$na.action, ]),Tlag=as.vector(Tu[-m_dt$na.action, ]))# residuals against spatial lagcoplot(resid ~ SST | Dlag, data= m_dt_res, panel= panel.smooth, number=12, lwd=3)```> No apparent trend in the moving averages in red (good news).```{r}# residuals against time lagcoplot(resid ~ SST | Tlag, data= m_dt_res, panel= panel.smooth, number=12, lwd=3)```> Both the mean and variance of SST increase, as we move away from winter (left to right, then bottom to top). The moving averages (red) suggest that there could be some non-linearity in the relationship with SST, which was not considered in this model.# SimulationsLoading a function to generate random 2D smooth functions $f(x,y)$, which will allow us to simulate the "true"/known effect (coefficient) of predictor vector $X$ measured over a series of $J$ distances $L$ and $K$ distances $S$.Function name: `sim_spline_2D(k1= 7, k2= 7, lambda= c(0.3, 0.3, 3000), inter= 0, x_l= 100, y_l= 100)`Arguments:* `k1`, `k2`: spline basis size for dimensions 1 and 2 (x and y)* `lambda`: a vector of length 3 specifying smoothing parameters 'lambda' for the 2D tensor product spline* `inter`: intercept value* `x_l`, `y_l`: number of x and y values a which to evaluate 2D smooth```{r, warning= F, fig.keep= "none"}require(mgcv)require(mvtnorm)# k1, k2: spline basis size for dimensions 1 and 2 (x and y)# lambda: a vector of length 3 specifying smoothing parameters 'lambda' for the 2D tensor product spline# inter: intercept value# x_l, y_l: number of x and y values a which to evaluate 2D smoothsim_spline_2D<-function(k1=7, k2=7, lambda=c(0.3, 0.3, 3000), inter=0, x_l=100, y_l=100) {# Define the spatial domain x<-seq(0, 1, length.out= x_l) y<-seq(0, 1, length.out= y_l)# Create a grid grd<-expand.grid(x= x, y= y) grd$resp<-rnorm(x_l * y_l)# Open a null file connection to discard the 'jagam()' text output# Check the operating systemif (.Platform$OS.type =="unix") { null_device<-"/dev/null" } else { # Windows null_device<-"NUL" } con<-file(null_device, open="wt")# generate a template gam model object for JAGS to extract the relevant components jd<-jagam(resp ~te(x, y, bs=c("tp", "tp"), k=c(k1, k2)), data= grd, file= con, diagonalize= F)close(con) S1<- jd$jags.data$S1 zero<- jd$jags.data$zero# penalty matrix K1<- S1[1:(k1*k2-1), 1:(k1*k2-1)] * lambda[1] + S1[1:(k1*k2-1), (k1*k2):(2*k1*k2-2)] * lambda[2] + S1[1:(k1*k2-1), (2*k1*k2-1):(3*k1*k2-3)] * lambda[3]# simulate multivariate normal basis coefficients beta_sim<-c(inter, mvtnorm::rmvnorm(n=1, mean= zero[2:(k1*k2)], sigma= K1, method="chol"))# evaluate simulated function at x and y values fct_sim<- jd$jags.data$X %*% beta_simreturn(matrix(fct_sim, x_l, y_l))}# example#sim_2D<- sim_spline_2D(k1= 7, k2= 7, lambda<- c(0.3, 0.3, 3000), inter= 0, x_l= 100, y_l= 100); image(sim_2D)```Now, a function to simulate data from the model, given the known parameter values (assume the same number of distances / lags in both directions)Output values:* `LP`: lagged predictor matrix* `Lag`: lag encoding matrix* `tf`: true lagged-effect function* `Y`: response vector```{r}# function to simulate 2D-lagged processsim_lag_of_2D<-function(N=200, nlags=20, residSD=1){# simulate 2D-lag-varying predictor (unfold), where each column represents a lag combination w.r.t. the observation Xu<-matrix(runif(N*nlags*nlags), N, nlags*nlags)# true coefficient function, unfold tfu<-as.vector(sim_spline_2D(x_l= nlags, y_l= nlags))# observations Y<-rnorm(n= N, mean= Xu %*% tfu, sd= residSD)# index matrix LS<-expand.grid(L=1:nlags, S=1:nlags) Lu<-matrix(LS$L, nrow=1)[rep(1, N), ] Su<-matrix(LS$S, nrow=1)[rep(1, N), ]list(Xu= Xu, Y= Y, tfu= tfu, Lu= Lu, Su= Su)}```Let's simulate 6 random data sets from 6 different sets of random model parameters, with large observation error (SD= 8), and plot the estimates of $f(l,s)$ (black) together with the true values (red).```{r, fig.fullwidth = T}par(mfrow=c(2, 3), mar=c(1.1, 4.1, 4.1, 1.1))set.seed(75) # for reproducibilityfor(i in1:6){ nl<-20 dat<-sim_lag_of_2D(N=200, nlags= nl, residSD=8) mod<-gam(Y ~te(Lu, Su, by= Xu, k=rep(nl-1, 2)), data= dat, method="REML") rng<-range(dat$tfu) -mean(dat$tfu) model_fit<-plot(mod, scheme=1, axes= F,zlim= rng +c(-1, 1) *0.1*diff(rng)) rr<-round(cor(as.vector(matrix(model_fit[[1]]$fit, 40, 40)[c(TRUE, FALSE), c(TRUE, FALSE)]), dat$tfu)^2, 4)par(new= T)persp(x=1:nl,y=1:nl,z=matrix(dat$tfu, nl, nl),main=paste("Simulation", i, ",\nR2=", rr),xlab="l", ylab="s", zlab="f(l,s)",theta=30, phi=30,col=NA, border=NA, ticktype="detailed",zlim= rng +c(-1, 1) *0.1*diff(rng)) -> res# convert true function coordinates into coordinate system of perspective plot LS<-expand.grid(L=1:nl, S=1:nl)points(tf_transf<-trans3d(LS$L, LS$S, dat$tfu, res), pch=16, col=2, cex=0.5)}```> In all cases, $f(l,s)$, the effect of $X$ across lags $L$ and $S$ is very well recovered by the model. For simplicity we have omitted the 95% CI of the estimates from the plots, so a complementary analysis would be required to establish coverage properties.# Proposed exercises* Level 1: repeat the analysis above (spatial and temporal lags of SST). Parameters you could try and vary include the coding of lag values.* Level 2: fit a similar model for data subset `kit2` (use object `kit_sub2` for subsetting predictor data to match the rows of data set `kit2`), trying to reconstruct the data (predictor) preparation procedure without copying/pasting from the vignette.In all cases, pay attention to the structure of the model inputs (vectors and matrices) and take some time to reflect on the interpretation of the model outputs.