Predictions and variance

David L Miller

Biomathematics and Statistics Scotland

UK Centre for Ecology and Hydrology

What can we do?

Use gam() to fit a model
Use summary() to assess it
Use plot() to visualise it

Going back to our model

library(mgcv)
load("cosmos-processed.RData")

wwoods <- subset(all_sites, SITE_NAME=="Wytham Woods")

b <- gam(COSMOS_VWC ~ s(ndate, k=20), data=wwoods, method="REML")

plot(b)

Now we can fit a model,
now we are dangerous

What do we want to do with our model?

For linear models, we often interpret coefficients
We don’t have that here
So we usually make predictions and plots

coef(b)

(Intercept)  s(ndate).1  s(ndate).2  s(ndate).3  s(ndate).4  s(ndate).5 
 47.3525000   7.5801107  13.8527308  -4.1595358 -12.1449614  -3.0512930 
 s(ndate).6  s(ndate).7  s(ndate).8  s(ndate).9 s(ndate).10 s(ndate).11 
 -8.8018184   0.8292287  -6.1265129  -7.6962263  12.2316442   7.0850161 
s(ndate).12 s(ndate).13 s(ndate).14 s(ndate).15 s(ndate).16 s(ndate).17 
 -5.2282569   3.6549750  -0.2188604  -0.3257381   9.7677258   7.0620897 
s(ndate).18 s(ndate).19 
 18.6422253 -12.5062881

Wait, what are we seeing in that plot?

What do we have?

The black line: s(nday)
- (what about \(\beta_0\)?)
Dashed line: s(nday) \(\pm 2\) standard errors
- (how do we get to that?)
- (what uncertainty is included?)

`shift`ing to include the intercept

plot(b, shift=coef(b)[1])

Including uncertainty in the intercept

plot(b, shift=coef(b)[1], seWithMean=TRUE)

How do we make these plots?

Black line is a prediction
\(\beta_0 + s(t)\) over values of \(t\)

How do we make these plots?

Black line is a prediction
\(\beta_0 + s(t)\) over values of \(t\)

Recipe:

Make a grid
Use predict() to make predictions over the grid

Doing that in R

tg <- data.frame(ndate=seq(min(wwoods$ndate),
                           max(wwoods$ndate),
                           length.out=200))

tg$pred <- predict(b, newdata=tg)

head(tg)

     ndate     pred
1 16035.00 48.21948
2 16037.21 48.11878
3 16039.41 48.02307
4 16041.62 47.94130
5 16043.82 47.88511
6 16046.03 47.86695

Doing that in R (extra bits)

Note that our newdata= needs to have all the covariates in it
Calling predict(b) just gives us fitted values

And we can plot that…

plot(tg$ndate, tg$pred, type="l")

What about uncertainty?

predict(..., se=TRUE) gives standard errors

tg_se <- predict(b, tg, se=TRUE)

str(tg_se)

List of 2
 $ fit   : num [1:200(1d)] 48.2 48.1 48 47.9 47.9 ...
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:200] "1" "2" "3" "4" ...
 $ se.fit: num [1:200(1d)] 1.046 0.899 0.767 0.657 0.579 ...
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr [1:200] "1" "2" "3" "4" ...

Building that plot

plot(tg$ndate, tg$pred, type="l")
lines(tg$ndate, tg$pred + 2*tg_se$se.fit, lty=2)
lines(tg$ndate, tg$pred - 2*tg_se$se.fit, lty=2)

What do these intervals mean?

These are really credible intervals
- … but they have good 🎲frequentist🎲 properties
- (so we can call them confidence intervals)

What do they say?
- they are “across-the-function” intervals
- thinking about if \(s(x) = 0 \quad \forall x\)
  - (a bit more complicated becuase of centering)
- see summary()

Thinking about intervals…

Why are they credible intervals?
Because we can think of the penalty as a prior
Then we have a posterior for our model
… and we can do 🔮posterior sampling🔮
- (more on that in a minute)

Where did that uncertainty come from?

So far we’re only looking at uncertainty from \(\hat{\boldsymbol{\beta}}\)
What about smoothing paramter (\(\lambda\)) uncertainty?
- We estimate a parameter, we need some uncertainty!
For predict() and plot() we can use unconditional=TRUE
- (doesn’t make a difference for this data)

Posterior sampling

Posterior sampling background

From ⚡theory⚡ \(\boldsymbol{\beta} \sim \text{MVN}(\hat{\boldsymbol{\beta}}, \Sigma)\)
So, we can 🔮simulate🔮 from that distribution
Make lots of predictions
- make a prediction matrix and multiply by the simulated parameters
This generalizes very well, so we can make predictions of more than just smooths
Miller et al (2022) and Miller (2021) have more details

Recap

plot shows us basic plots
- we can make them ourselves if we want!
Make predictions using predict()
- Need to fiddle to make the data
Use se.fit=TRUE to get uncertainty
- unconditional=TRUE give us smoothing parameter uncertainty
Uncertainty comes from smooth parameters and smoothing parameters
Intervals can be used to check if the smooth is zero

Predictions and variance

What can we do?

Going back to our model

Now we can fit a model,now we are dangerous

What do we want to do with our model?

Wait, what are we seeing in that plot?

What do we have?

shifting to include the intercept

Including uncertainty in the intercept

How do we make these plots?

How do we make these plots?

Doing that in R

Doing that in R (extra bits)

And we can plot that…

What about uncertainty?

Building that plot

What do these intervals mean?

Thinking about intervals…

Where did that uncertainty come from?

Posterior sampling

Posterior sampling background

More on predict()

Recap

Now we can fit a model,
now we are dangerous

`shift`ing to include the intercept

More on `predict()`