flowchart LR
A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont
B --> C{Data}:::FixFont;
classDef FixFont font-size:15px
style A color:#cc0000
style B color:#3c78d8
style C color:#f1c232
BioSS
BioSS
BioSS
UKCEH
BioSS
BioSS
flowchart LR
A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont
B --> C{Data}:::FixFont;
classDef FixFont font-size:15px
style A color:#cc0000
style B color:#3c78d8
style C color:#f1c232
flowchart LR
A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont
B --> C{Data}:::FixFont;
classDef FixFont font-size:15px
style A color:#cc0000
style B color:#3c78d8
style C color:#f1c232
flowchart LR
A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont
B[Spatial<br>Pattern]:::FixFont --> C
C --> D(Sample):::FixFont
D --> E{Realistic<br>Dataset}:::FixFont;
classDef FixFont font-size:15px
style A color:#cc0000
style B color:#cc0000
style C color:#674ea7
style D color:#3c78d8
style E color:#f1c232
flowchart LR
A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont
B[Spatial<br>Pattern]:::FixFont --> C
C --> D(Sample):::FixFont
D --> E{Realistic<br>Dataset}:::FixFont;
classDef FixFont font-size:15px
style A color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)flowchart LR
A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont
B[Spatial<br>Pattern]:::FixFont --> C
C --> D(Sample):::FixFont
D --> E{Realistic<br>Dataset}:::FixFont;
classDef FixFont font-size:15px
style B color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)flowchart LR
A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont
B[Spatial<br>Pattern]:::FixFont --> C
C --> D(Sample):::FixFont
D --> E{Realistic<br>Dataset}:::FixFont;
classDef FixFont font-size:15px
style C color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)flowchart LR
A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont
B[Spatial<br>Pattern]:::FixFont --> C
C --> D(Sample):::FixFont
D --> E{Realistic<br>Dataset}:::FixFont;
classDef FixFont font-size:15px
style D color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)flowchart LR
A[Time<br>Pattern]:::FixFont --> C(Simulate):::FixFont
B[Spatial<br>Pattern]:::FixFont --> C
C --> D(Sample):::FixFont
D --> E{Realistic<br>Dataset}:::FixFont;
classDef FixFont font-size:15px
style E color:#6aa84f
# Temporal Pattern
timeline <-set_timeline(1, "month")
time_series <- define_pattern_t(timeline, weekly_amplitude = 1, ...)
# Spatial Pattern
spatial_field <- define_pattern_s(n_x = 10, n_y = 10, ...)
# Simulate
truth <- simulate_spacetime(temporal_pattern = time_series, field = spatial_field, ...)
# Sample
point_samples <- sample_area_pointily(field = truth, n_samples = 10,
sampling_error = 0.1, ...)flowchart LR
A[Process<br>of Interest]:::FixFont --> B(Observation<br>Process):::FixFont
B --> C{Data}:::FixFont;
classDef FixFont font-size:15px
style A color:#cc0000
style B color:#3c78d8
style C color:#f1c232
flowchart LR
C{Data}:::FixFont --> B(Observation<br>Process):::FixFont
B --> A[Process<br>of Interest]:::FixFont;
classDef FixFont font-size:15px
style A color:#cc0000
style B color:#3c78d8
style C color:#f1c232
flowchart LR
C1{Dataset<br>1}:::FixFont --> B1(Observation<br>Process<br>1):::FixFont
B1 --> A[Process<br>of Interest]:::FixFont
C2{Dataset<br>2}:::FixFont --> B2(Observation<br>Process<br>2):::FixFont
B2 --> A;
classDef FixFont font-size:15px
style A color:#cc0000
style B1 color:#3c78d8
style B2 color:#3c78d8
style C1 color:#f1c232
style C2 color:#f1c232
Data complementarity is the heart of data fusion.
We want to allow data to be mutually informative…
But we also want to acknowledge their differences.
The answer: hierarchical GAMS
What our best model tells us if it has a shared trend…
What our best model tells us if it does not have a shared trend…
Let’s build things up slowly with a quick refresher on building a non-hierarchical GAM.
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
head(sample_reg)# A tibble: 6 × 6
# Groups: date [6]
x y date value loc_id point_sample
<dbl> <dbl> <date> <dbl> <fct> <dbl>
1 1 1 2001-01-01 0.812 1_1 0.575
2 1 1 2001-01-02 0.271 1_1 0.518
3 1 1 2001-01-03 -0.597 1_1 -0.396
4 1 1 2001-01-04 -1.14 1_1 -1.27
5 1 1 2001-01-05 -0.945 1_1 -0.925
6 1 1 2001-01-06 -0.163 1_1 -0.0446
value is the Truth from our simulation.point_sample is what we observed.loc_id is the combination of the x and y coordinates to help with plotting.Remember, we might be interested in deconstructing the different components of the date to pick out weekly, quarterly and annual patterns.
Does this look familiar?
Rather than calling mgcv twice, can we specify these two models as a single joint model (like Model I)
set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam) set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam) set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam) set.seed(1868)
sample_reg <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 0.1, regular = TRUE)
set.seed(1871)
sample_ran <- sample_area_pointily(truth, n_samples = 25,
sampling_error = 1, regular = FALSE)
# Make mgcv see date as a date
sample_reg$date_dec <- lubridate::decimal_date(sample_reg$date)
sample_ran$date_dec <- lubridate::decimal_date(sample_ran$date)
# Add family column
sample_reg$fam <- 1 # Observation source 1, the regular samples
sample_ran$fam <- 2 # Observation source 2, the random samples
# Combine into one dataframe
sample_com <- rbind(sample_reg, sample_ran)
# Note, we need *both* fam and fam_fac for the factor smoothing later
sample_com$fam_fac <- factor(sample_com$fam) fit_fam_I <- mgcv::gam(point_sample ~ t2(x, y, date_dec,
by = fam_fac),
data = sample_com,
family = "gaussian",
method = "REML")
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")mgcv Tricks: Do Not Be Afraidmgcv Tricks: Basis Specificationmgcv Tricks: Grouped Families# Need to create combined response variable
sample_com$resp <- cbind(sample_com$point_sample, as.numeric(sample_com$fam))
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")mgcv Tricks: Grouped Families# Need to create combined response variable
sample_com$resp <- cbind(sample_com$point_sample, as.numeric(sample_com$fam))
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")Now we can have observation processes with different response distribution s!
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Identical trend and smoothness
fit_fam_G <- mgcv::gam(point_sample ~ t2(x,y, date_dec),
data = sample_com,
family = "gaussian",
method = "REML")# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Identical trend and smoothness
fit_fam_G <- mgcv::gam(point_sample ~ t2(x,y, date_dec),
data = sample_com,
family = "gaussian",
method = "REML")# Independent trend and smooths
fit_fam_I <- mgcv::gam(point_sample ~ t2(x, y, date_dec,
by = fam_fac),
data = sample_com,
family = "gaussian",
method = "REML")
# Shared trend and different smoothness
fit_fam_GI <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec,
d=c(2, 1),
bs=c("ts", "ts"),
by = fam_fac),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Shared trend and shared smoothness
fit_fam_GS <- mgcv::gam(resp ~ t2(x, y, date_dec) +
t2(x, y, date_dec, fam_fac,
d=c(2, 1, 1),
bs=c("ts", "ts", "re")),
data = sample_com,
family = gfam(list(gaussian, gaussian)),
method = "REML")
# Identical trend and smoothness
fit_fam_G <- mgcv::gam(point_sample ~ t2(x,y, date_dec),
data = sample_com,
family = "gaussian",
method = "REML")Model I
Model GI
Model GS
Model G
[1] 4571.037
[1] 4480.373
[1] 4480.303
[1] 4688.921
mgcv makes fitting really rather complex models relatively simple.On Hierarchical GAMs
On Observation Processes
On Methods Underpinning gfam()
On The by= Argument