flowchart LR A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont B --> C{Data}:::FixFont; classDef FixFont font-size:15px style A color:#cc0000 style B color:#3c78d8 style C color:#f1c232
BioSS
BioSS
BioSS
UKCEH
BioSS
BioSS
flowchart LR A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont B --> C{Data}:::FixFont; classDef FixFont font-size:15px style A color:#cc0000 style B color:#3c78d8 style C color:#f1c232
flowchart LR A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont B --> C{Data}:::FixFont; classDef FixFont font-size:15px style A color:#cc0000 style B color:#3c78d8 style C color:#f1c232
flowchart LR A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont B[Spatial<br>Pattern]:::FixFont --> C C --> D(Sample):::FixFont D --> E{Realistic<br>Dataset}:::FixFont; classDef FixFont font-size:15px style A color:#cc0000 style B color:#cc0000 style C color:#674ea7 style D color:#3c78d8 style E color:#f1c232
flowchart LR A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont B[Spatial<br>Pattern]:::FixFont --> C C --> D(Sample):::FixFont D --> E{Realistic<br>Dataset}:::FixFont; classDef FixFont font-size:15px style A color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)
flowchart LR A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont B[Spatial<br>Pattern]:::FixFont --> C C --> D(Sample):::FixFont D --> E{Realistic<br>Dataset}:::FixFont; classDef FixFont font-size:15px style B color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)
flowchart LR A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont B[Spatial<br>Pattern]:::FixFont --> C C --> D(Sample):::FixFont D --> E{Realistic<br>Dataset}:::FixFont; classDef FixFont font-size:15px style C color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)
flowchart LR A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont B[Spatial<br>Pattern]:::FixFont --> C C --> D(Sample):::FixFont D --> E{Realistic<br>Dataset}:::FixFont; classDef FixFont font-size:15px style D color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)
flowchart LR A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont B[Spatial<br>Pattern]:::FixFont --> C C --> D(Sample):::FixFont D --> E{Realistic<br>Dataset}:::FixFont; classDef FixFont font-size:15px style E color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)
flowchart LR A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont B --> C{Data}:::FixFont; classDef FixFont font-size:15px style A color:#cc0000 style B color:#3c78d8 style C color:#f1c232
flowchart LR C{Data}:::FixFont --> B(Observation<br>Process):::FixFont B --> A[Process<br>of Interest]:::FixFont; classDef FixFont font-size:15px style A color:#cc0000 style B color:#3c78d8 style C color:#f1c232
flowchart LR C1{Dataset<br>1}:::FixFont --> B1(Observation<br>Process<br>1):::FixFont B1 --> A[Process<br>of Interest]:::FixFont C2{Dataset<br>2}:::FixFont --> B2(Observation<br>Process<br>2):::FixFont B2 --> A; classDef FixFont font-size:15px style A color:#cc0000 style B1 color:#3c78d8 style B2 color:#3c78d8 style C1 color:#f1c232 style C2 color:#f1c232
Data complementarity is the heart of data fusion.
We want to allow data to be mutually informative…
But we also want to acknowledge their differences.
The answer: hierarchical GAMS
What our best model tells us if it has a shared trend…
What our best model tells us if it does not have a shared trend…
Let’s build things up slowly with a quick refresher on building a non-hierarchical GAM.
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
head(sample_reg)
# A tibble: 6 × 6
# Groups: date [6]
x y date value loc_id point_sample
<dbl> <dbl> <date> <dbl> <fct> <dbl>
1 1 1 2001-01-01 0.812 1_1 0.575
2 1 1 2001-01-02 0.271 1_1 0.518
3 1 1 2001-01-03 -0.597 1_1 -0.396
4 1 1 2001-01-04 -1.14 1_1 -1.27
5 1 1 2001-01-05 -0.945 1_1 -0.925
6 1 1 2001-01-06 -0.163 1_1 -0.0446
value
is the Truth from our simulation.point_sample
is what we observed.loc_id
is the combination of the x
and y
coordinates to help with plotting.Remember, we might be interested in deconstructing the different components of the date to pick out weekly, quarterly and annual patterns.
Does this look familiar?
Rather than calling mgcv
twice, can we specify these two models as a single joint model (like Model I)
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam)
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam)
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam)
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam)
fit_fam_I <- mgcv::gam(point_sample ~ t2(x, y, date_dec,
by = fam_fac),
data = sample_com,
family = "gaussian",
method = "REML")
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
mgcv
Tricks: Do Not Be Afraidmgcv
Tricks: Basis Specificationmgcv
Tricks: Grouped Families# Need to create combined response variable
sample_com$resp <- cbind(sample_com$point_sample, as.numeric(sample_com$fam))
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
mgcv
Tricks: Grouped Families# Need to create combined response variable
sample_com$resp <- cbind(sample_com$point_sample, as.numeric(sample_com$fam))
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
Now we can have observation processes with different response distribution s!
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Identical trend and smoothness
fit_fam_G <- mgcv::gam(point_sample ~ t2(x,y, date_dec),
data = sample_com,
family = "gaussian",
method = "REML")
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Identical trend and smoothness
fit_fam_G <- mgcv::gam(point_sample ~ t2(x,y, date_dec),
data = sample_com,
family = "gaussian",
method = "REML")
# Independent trend and smooths
fit_fam_I <- mgcv::gam(point_sample ~ t2(x, y, date_dec,
by = fam_fac),
data = sample_com,
family = "gaussian",
method = "REML")
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Identical trend and smoothness
fit_fam_G <- mgcv::gam(point_sample ~ t2(x,y, date_dec),
data = sample_com,
family = "gaussian",
method = "REML")
Model I
Model GI
Model GS
Model G
[1] 4571.037
[1] 4480.373
[1] 4480.303
[1] 4688.921
mgcv
makes fitting really rather complex models relatively simple.On Hierarchical GAMs
On Observation Processes
On Methods Underpinning gfam()
On The by=
Argument