Empirical Dynamic Models for Forecasting

A tutorial on forcasting with stationary and non-stationary time series

Last updated on Oct 11, 2021 8 min read r tutorials

Introduction to EDMs for Forecasting Non-stationary data

EDMs are a data-driven solution for uncovering hidden dynamic behavior in natural systems, which are often complex and dynamic (referred to as “non-stationarity” or “non-linearity”). This non-linearity means that the sign and magnitude of relationships within a system change with time, and therefore linear statistical approaches fail to properly represent such changes. Rather than assuming that the system is governed by any set of equations (i.e. unlike meteorological systems), EDMs reconstruct the dynamics of the system from time series data (hence “data-driven”) and provide a mechanistic understanding of the system. Under EDMs, the dynamics of a system are encoded in the temporal ordering of the time series, and the behavior of such a system can be explained by relating various states of a system using time lags (i.e. estimating the mathematical relationship of one variable at time $X (t)$ , to the same variable at other times: $X (t + 1)$ and $X (t + 2)$ . By relating states of a system using such lags, causal relationships between variables in the original system may be uncovered–providing a number of ecologically relevant applications, including forecasting.

To reiterate, EDMs are driven by non-linear dynamics in a system (the relationship of a variable, or state, at various time lags vary in sign and magnitude). Taken’s theorem–the basis of EDM–states that an original system’s dynamics can be reconstructed by exploiting the mathematical relationships between historical records of a single variable. These relationships can be mapped 1-to-1 using the Lorenz Attractor (also known as the Butterfly attractor).

Tutorial on forecasting with stationary and non-stationary time series

Load libraries

library(astsa)
library(rEDM)
library(tidyverse)
library(forecast)
library(ggpubr)

Set time series parameters, where time = hrs and the temporal range is 4 days

set.seed(1)

time = 1:96

Stationary time series

Simulate autocorrelated timeseries data with stationarity (linear data, with cyclical autocorrelation) using `arima.sim`

Arima, or AutoRegressive Integrated Moving Average, models necessarily assume linearity, because they rely on a linear relationship to predict values from one time step to another.

stationary_y_arima <- arima.sim(n = length(time), list(ar = c(0.9, -0.8), ma = c(-0.41, 0.2)),
                                sd = sqrt(0.1))

df_ts <- data.frame(x = time, y = stationary_y_arima)

autoplot(stationary_y_arima) + ylab("Stationary Time Series")

Visualize autocorrelation structures using the Parial Autocorrelation Function Estimation feature in the `forecast` package (function `acf()`)

acf(stationary_y_arima)

pacf(stationary_y_arima)

Partition data into training and predicting subsets:

train <- 1:(length(time)/2)             # indices for the first 2/3 of the time series

Arima models for forecasting:

Run a standard Arima model, with no lag dependencies

This model is mathematically identical to a intercept only linear model:

${\hat{y}}_{t} = μ + ϵ_{t}$

Where, the intercept is equal to the mean of the response variable:

$μ = \frac{1}{n} \sum_{t = 1}^{n} y_{t}$

a <- Arima(stationary_y_arima[train])

#plot the fitted values from Arima model
autoplot(fitted(a), col = "blue") + geom_path(data = df_ts, aes(x = x, y = y)) + ylab("Stationary Time Series")

Perform forecast of prediction data using a no-lag Arima model

autoplot(forecast(a, h = 48)) + geom_path(data = df_ts, aes(x = x, y = y)) + ylab("Stationary Time Series")

Autoregressive model, with one time dependency–an hourly lag term:

${\hat{y}}_{t} = μ + ϕ_{1} y_{t - 1} + ϵ_{t}$

Where, $ϕ_{1}$ is a coefficient of lag

a1 <- Arima(stationary_y_arima[train], c(1,0,0))

#plot the fitted values from Arima model
autoplot(fitted(a1), col = "blue") + geom_path(data = df_ts, aes(x = x, y = y)) + ylab("Stationary Time Series")

#plot the forecasted values from Arima model
autoplot(forecast(a1, h = 48)) + geom_path(data = df_ts, aes(x = x, y = y)) + ylab("Stationary Time Series")

Autoregressive model, with two hourly lags:

${\hat{y}}_{t} = μ + ϕ_{1} y_{t - 1} + ϕ_{2} y_{t - 2} + ϵ_{t}$

a2 <- Arima(stationary_y_arima[train], c(1,0,0))

#plot the fitted values from Arima model
autoplot(fitted(a2), col = "blue") + geom_path(data = df_ts, aes(x = x, y = y)) + ylab("Stationary Time Series")

#plot the forecasted values from Arima model
autoplot(forecast(a2, h = 48)) + geom_path(data = df_ts, aes(x = x, y = y)) + ylab("Stationary Time Series")

Autoregressive models, with up to 5 hourly lags:

${\hat{y}}_{t} = μ + ϕ_{1} y_{t - 1} + [\dots] + ϕ_{5} y_{t - 5} + ϵ_{t}$

a3 <- Arima(stationary_y_arima[train], c(3,0,0))
a4 <- Arima(stationary_y_arima[train], c(4,0,0))
a5 <- Arima(stationary_y_arima[train], c(5,0,0))

a1_gg <- autoplot(forecast(a3, h = 48)) + ggtitle("Arima Model Forecast: 3 hourly lags") +
  geom_path(data = df_ts, aes(x = x, y = y)) + 
  geom_path(aes(x = time[train], y = fitted(a3)[train]), col = "blue") + 
   ylab(" ")

a2_gg <- autoplot(forecast(a4, h = 48)) + ggtitle("Arima Model Forecast: 4 hourly lags") +
  geom_path(data = df_ts, aes(x = x, y = y)) + 
  geom_path(aes(x = time[train], y = fitted(a4)[train]), col = "blue") + 
   ylab("Stationary Time Series")

a3_gg <- autoplot(forecast(a5, h = 48)) + ggtitle("Arima Model Forecast: 5 hourly lags") +
  geom_path(data = df_ts, aes(x = x, y = y)) + 
  geom_path(aes(x = time[train], y = fitted(a5)[train]), col = "blue") + 
   ylab(" ")

ggarrange(a1_gg, a2_gg, a3_gg, ncol = 1)

Now, we can move into models with different cycle structures. For this, we will consider half day lags (12 hr periods)

Autoregressive models, with an hourly- and half-day-time dependency:

${\hat{y}}_{t} = μ + ϕ_{1} y_{t - 1} + ϕ_{2} y_{t - 2} + ϕ_{3} y_{t - 3} + ϕ_{4} y_{t - 4} + ϕ_{5} y_{t - 12} + ϵ_{t}$

a41 <- Arima(stationary_y_arima[train], c(4,0,0), c(1,0,0))

autoplot(forecast(a41, h = 48)) + ggtitle("Arima Model Forecast: 4 hourly cycle lag") +
  geom_path(data = df_ts, aes(x = x, y = y)) + 
  geom_path(aes(x = time[train], y = fitted(a41)[train]), col = "blue") +
  ylab("Stationary Time Series")

Now, we will let the Arima algorithm choose the time lag parameters, using `auto.arima`:

aa <- auto.arima(stationary_y_arima[train])
summary(aa)

## Series: stationary_y_arima[train] 
## ARIMA(3,0,0) with zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3
##       0.4728  -0.1068  -0.5655
## s.e.  0.1272   0.1513   0.1384
## 
## sigma^2 estimated as 0.08692:  log likelihood=-9.02
## AIC=26.04   AICc=26.97   BIC=33.52
## 
## Training set error measures:
##                      ME      RMSE       MAE      MPE     MAPE      MASE
## Training set 0.02152847 0.2854554 0.2289932 187.4472 335.9332 0.6855497
##                     ACF1
## Training set -0.06089878

# Auto-arima chose a 3-hour lag structure, with no half-day effects

autoplot(forecast(aa, h = 48)) + geom_path(data = df_ts, aes(x = x, y = y)) + 
  ylab("Stationary Time Series")

Non-stationary time series

Now we will simulate non-linear (a.k.a. non-stationary) data, where relationships change through time, using `diffinv`:

## non-stationary data
set.seed(44)
nonstationary_y <- diffinv(rnorm(length(time))) %>% ts()

autoplot(nonstationary_y) + ylab("Non-stationary Time Series")

Let’s see what the auto Arima algorithm estimates with non-stationary data:

aa_ns <- auto.arima(nonstationary_y[train])

summary(aa_ns)

## Series: nonstationary_y[train] 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 1.137:  log likelihood=-69.71
## AIC=141.42   AICc=141.51   BIC=143.27
## 
## Training set error measures:
##                      ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.01182676 1.055224 0.7741009 0.9130602 36.00029 0.9791667
##                    ACF1
## Training set 0.08409507

Now, visualize forecast of a linear model with non-linear data!

df_ts_st <- data.frame(x = time, y = nonstationary_y[1:96])

aa_ns <- autoplot(forecast(aa_ns, h = 48)) + 
  geom_path(data = df_ts_st, aes(x = x, y = y)) + 
  ylab("Non-stationary Time Series"); aa_ns

Not a very good prediction… Let’s try empirical dynamic models!

Empirical Dynamic Models for forecasting:

The model is a system of three ordinary differential equations now known as the Lorenz equations:

$\frac{d x}{d t} = σ (y - x)$ $\frac{d y}{d t} = x (p - x) - y$ $\frac{d z}{d t} = x y - β z$

We will use the `simplex` function to determine how many dimensions (time lags) are needed to effectively develope a data-driven mechanistic formulation of the time series

# set data for historical record (library) and prediction
lib <- c(1, 48)
pred <- c(49, 96)

simplex_output <- simplex(nonstationary_y, lib, pred)
str(simplex_output)

## 'data.frame':    10 obs. of  16 variables:
##  $ E                  : int  1 2 3 4 5 6 7 8 9 10
##  $ tau                : num  1 1 1 1 1 1 1 1 1 1
##  $ tp                 : num  1 1 1 1 1 1 1 1 1 1
##  $ nn                 : num  2 3 4 5 6 7 8 9 10 11
##  $ num_pred           : num  47 46 45 44 43 42 41 40 39 38
##  $ rho                : num  0.768 0.796 0.682 0.716 0.515 ...
##  $ mae                : num  2.81 2.76 3.03 3.1 3.38 ...
##  $ rmse               : num  3.55 3.46 3.89 3.88 4.21 ...
##  $ perc               : num  0.979 0.978 1 1 1 ...
##  $ p_val              : num  7.73e-12 5.15e-13 3.37e-08 4.22e-09 1.56e-04 ...
##  $ const_pred_num_pred: num  47 46 45 44 43 42 41 40 39 38
##  $ const_pred_rho     : num  0.954 0.954 0.947 0.944 0.939 ...
##  $ const_pred_mae     : num  1.008 0.988 0.989 0.951 0.966 ...
##  $ const_pred_rmse    : num  1.23 1.21 1.22 1.17 1.18 ...
##  $ const_pred_perc    : num  0.979 0.978 0.978 1 1 ...
##  $ const_p_val        : num  8.26e-36 6.46e-35 1.10e-31 2.88e-30 3.02e-28 ...

Let’s visualize the forecasting skill (rho)

par(mar = c(4, 4, 1, 1), mgp = c(2.5, 1, 0))  # set margins for plotting
plot(simplex_output$E, simplex_output$rho, type = "l", lwd = 5, col = "light blue", xlab = "Embedding Dimension (E)", 
     ylab = "Forecast Skill (rho)")

simplex_output <- simplex(nonstationary_y, lib, pred, E = 2, tp = 1:10)
plot(simplex_output$tp, simplex_output$rho, type = "l", lwd = 5, col = "light blue", xlab = "Time to Prediction (tp)", 
     ylab = "Forecast Skill (rho)")

Run `simplex` to create EDM model for forecasting

smap_output <- simplex(nonstationary_y, lib, pred, E = 2, stats_only = FALSE)

predictions <- na.omit(smap_output$model_output[[1]])

df_ts_st_pred <- data.frame(x = time[51:96], y = nonstationary_y[51:96], predictions)

plot(df_ts_st$y~df_ts_st$x, type = "l")

edm <- ggplot(data = df_ts_st_pred) + ggtitle("Forecasts from EDM") + xlab("Time") + ylab(" ") + 
  geom_ribbon(aes(x = x, y = y, ymin = y - 1.96*sqrt(pred_var), ymax = y +.96*sqrt(pred_var)), fill = "blue", alpha = 0.2) +
  geom_ribbon(aes(x = x, y = y, ymin = y-sqrt(pred_var), ymax = y+sqrt(pred_var)), fill = "blue", alpha = 0.4) + 
  geom_path(aes(x = x, y = y)) + 
  geom_path(data = df_ts_st, aes(x = x, y = y)) + 
  ylab("Non-stationary Time Series"); edm

ggarrange(aa_ns + coord_cartesian(ylim = c(-20,8)) + ggtitle("Forecast with ARIMA"),
          edm + coord_cartesian(ylim = c(-20,8)) + ggtitle("Forecast with EDM")) + theme_bw()

ggsave("forecasts.jpeg", dpi = 300)

## Saving 7 x 5 in image

Empirical Dynamic Models Machine Learning Forecasting Time Series

Empirical Dynamic Models for Forecasting

Introduction to EDMs for Forecasting Non-stationary data

Load libraries

Set time series parameters, where time = hrs and the temporal range is 4 days

Stationary time series

Simulate autocorrelated timeseries data with stationarity (linear data, with cyclical autocorrelation) using `arima.sim`

Arima, or AutoRegressive Integrated Moving Average, models necessarily assume linearity, because they rely on a linear relationship to predict values from one time step to another.

Visualize autocorrelation structures using the Parial Autocorrelation Function Estimation feature in the `forecast` package (function `acf()`)

Partition data into training and predicting subsets:

Arima models for forecasting:

Run a standard Arima model, with no lag dependencies

This model is mathematically identical to a intercept only linear model:

Where, the intercept is equal to the mean of the response variable:

Perform forecast of prediction data using a no-lag Arima model

Autoregressive model, with one time dependency–an hourly lag term:

Autoregressive model, with two hourly lags:

Autoregressive models, with up to 5 hourly lags:

Now, we can move into models with different cycle structures. For this, we will consider half day lags (12 hr periods)

Autoregressive models, with an hourly- and half-day-time dependency:

Now, we will let the Arima algorithm choose the time lag parameters, using `auto.arima`:

Non-stationary time series

Now we will simulate non-linear (a.k.a. non-stationary) data, where relationships change through time, using `diffinv`:

Let’s see what the auto Arima algorithm estimates with non-stationary data:

Now, visualize forecast of a linear model with non-linear data!

Not a very good prediction… Let’s try empirical dynamic models!

Empirical Dynamic Models for forecasting:

The model is a system of three ordinary differential equations now known as the Lorenz equations:

We will use the `simplex` function to determine how many dimensions (time lags) are needed to effectively develope a data-driven mechanistic formulation of the time series

Let’s visualize the forecasting skill (rho)

Run `simplex` to create EDM model for forecasting

J Alex Baecher

Postdoctoral Researcher

Related

Empirical Dynamic Models for Forecasting

Introduction to EDMs for Forecasting Non-stationary data

Load libraries

Set time series parameters, where time = hrs and the temporal range is 4 days

Stationary time series

Simulate autocorrelated timeseries data with stationarity (linear data, with cyclical autocorrelation) using arima.sim

Arima, or AutoRegressive Integrated Moving Average, models necessarily assume linearity, because they rely on a linear relationship to predict values from one time step to another.

Visualize autocorrelation structures using the Parial Autocorrelation Function Estimation feature in the forecast package (function acf())

Partition data into training and predicting subsets:

Arima models for forecasting:

Run a standard Arima model, with no lag dependencies

This model is mathematically identical to a intercept only linear model:

Where, the intercept is equal to the mean of the response variable:

Perform forecast of prediction data using a no-lag Arima model

Autoregressive model, with one time dependency–an hourly lag term:

Autoregressive model, with two hourly lags:

Autoregressive models, with up to 5 hourly lags:

Now, we can move into models with different cycle structures. For this, we will consider half day lags (12 hr periods)

Autoregressive models, with an hourly- and half-day-time dependency:

Now, we will let the Arima algorithm choose the time lag parameters, using auto.arima:

Non-stationary time series

Now we will simulate non-linear (a.k.a. non-stationary) data, where relationships change through time, using diffinv:

Let’s see what the auto Arima algorithm estimates with non-stationary data:

Now, visualize forecast of a linear model with non-linear data!

Not a very good prediction… Let’s try empirical dynamic models!

Empirical Dynamic Models for forecasting:

The model is a system of three ordinary differential equations now known as the Lorenz equations:

We will use the simplex function to determine how many dimensions (time lags) are needed to effectively develope a data-driven mechanistic formulation of the time series

Let’s visualize the forecasting skill (rho)

Run simplex to create EDM model for forecasting

J Alex Baecher

Postdoctoral Researcher

Related

Simulate autocorrelated timeseries data with stationarity (linear data, with cyclical autocorrelation) using `arima.sim`

Visualize autocorrelation structures using the Parial Autocorrelation Function Estimation feature in the `forecast` package (function `acf()`)

Now, we will let the Arima algorithm choose the time lag parameters, using `auto.arima`:

Now we will simulate non-linear (a.k.a. non-stationary) data, where relationships change through time, using `diffinv`:

We will use the `simplex` function to determine how many dimensions (time lags) are needed to effectively develope a data-driven mechanistic formulation of the time series

Run `simplex` to create EDM model for forecasting