Linear regression with gradient descent

A machine learning approach to standard linear regression

Last updated on Sep 22, 2021 7 min read r tutorials

Introduction linear regression with gradient descent

This tutorial is a rough introduction into using gradient descent algorithms to estimate parameters (slope and intercept) for standard linear regressions, as an alternative to ordinary least squares (OLS) regression with a maximum likelihood estimator. To begin, I simulate data to perform a standard OLS regression with maximum likelihood using sums of squares. Once explained, I then demonstrate how to substitute gradient descent simply and interpret results.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.3     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## Warning: package 'readr' was built under R version 4.1.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Ordinary Least Square Regression

Simulate data

Generate random data in which y is a noisy function of x

set.seed(123)

x <- runif(1000, -5, 5)
y <- x + rnorm(1000) + 3

Fit a linear model

lm <- lm( y ~ x ) # Ordinary Least Squares regression with General Linear Model 
mod <- print(lm)

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      3.0118       0.9942

mod

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      3.0118       0.9942

Plot the data and the model

plot(x,y, col = "grey80", main='Regression using lm()', xlim = c(-2, 5), ylim = c(0,10)); 
text(0, 8, paste("Intercept = ", round(mod$coefficients[1], 2), sep = ""));
text(4, 2, paste("Slope = ", round(mod$coefficients[2], 2), sep = ""));
abline(v = 0, col = "grey80"); # line for y-intercept
abline(h = mod$coefficients[1], col = "grey80") # plot horizontal line at intercept value
abline(a = mod$coefficients[1], b = mod$coefficients[2], col='blue', lwd=2) # use slope and intercept to plot best fit line

Calculate intercept and slope using sum of squares

x_bar <- mean(x) # calculate mean of independent variable
y_bar <- mean(y) # calculate mean of dependent variable

slope <- sum((x - x_bar)*(y - y_bar))/sum((x - x_bar)^2) # calculate sum of differences between x & y, and divide by sum of squares of x
slope

## [1] 0.9941662

intercept <- y_bar - (slope * x_bar) # calculate difference of y_bar across the linear predictor
intercept

## [1] 3.011774

Plot data using manually calculated parameters

plot(x,y, col = "grey80", main='Regression with manual calculations', xlim = c(-2, 5), ylim = c(0,10)); 
abline(a = intercept, b = slope, col='blue', lwd=2)

Gradient Descent:

Using the same simulated data as before, we will estimate parameters using a machine learning algorithm

Here’s some figures I found helpful while trying to understand how gradient descent works:

To determine the goodness of fit for a given set of parameters, we will empliment a Squared error cost function (a way to calculate the degree of error for a guess for slope and intercept)

cost <- function(X, y, theta) {
  sum( (X %*% theta - y)^2 ) / (2*length(y))
}

We must also set two additional parameters: learning rate and iteration limit

alpha <- 0.01
num_iters <- 1000

# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)

# initialize coefficients
theta <- matrix(c(0,0), nrow=2)

# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))

# gradient descent
for (i in 1:num_iters) {
  error <- (X %*% theta - y)
  delta <- t(X) %*% error / length(y)
  theta <- theta - alpha * delta
  cost_history[i] <- cost(X, y, theta)
  theta_history[[i]] <- theta
}

print(theta)

##           [,1]
## [1,] 3.0116439
## [2,] 0.9941657

Plot data and converging fit

iters <- c((1:31)^2, 1000)
cols <- rev(terrain.colors(num_iters))
library(gifski)
png("frame%03d.png")
par(ask = FALSE)

for (i in iters) {
  plot(x,y, col="grey80", main='Linear regression using Gradient Descent')
  text(x = -3, y = 10, paste("slope = ", round(theta_history[[i]][2], 3), sep = " "), adj = 0)
  text(x = -3, y = 8, paste("intercept = ", round(theta_history[[i]][1], 3), sep = " "), adj = 0)
  abline(coef=theta_history[[i]], col=cols[i], lwd = 2)
}

dev.off()

## png 
##   2

png_files <- sprintf("frame%03d.png", 1:32)
gif_file <- gifski(png_files, delay = 0.1)
unlink(png_files)
utils::browseURL(gif_file)

Calculate intercept and slope using gradient descent (Machine Learning):

plot(cost_history, type='line', col='blue', lwd=2, main='Cost function', ylab='cost', xlab='Iterations')

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

Using gradient descent with real data

I’ll demonstrate it’s features using an existing dataset from Bruno Oliveria: “Amphibio”:
• Link to publication: https://www.nature.com/articles/sdata2017123
• Link to data: https://ndownloader.figstatic.com/files/8828578

Load amphibio data!

install.packages("downloader")
library(downloader)

url <- "https://ndownloader.figstatic.com/files/8828578"
download(url, dest="lrgb/amphibio.zip", mode="wb") 
unzip("lrgb/amphibio.zip", exdir = "./lrgb")

df <- read_csv("AmphiBIO_v1.csv") %>%
  select("Order",
         "Body_mass_g",
         "Body_size_mm",
         "Size_at_maturity_min_mm",
         "Size_at_maturity_max_mm",
         "Litter_size_min_n",
         "Litter_size_max_n",
         "Reproductive_output_y") %>%
  na.omit %>%
  mutate_if(is.numeric, ~ log(.))

## Rows: 6776 Columns: 38

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (6): id, Order, Family, Genus, Species, OBS
## dbl (31): Fos, Ter, Aqu, Arb, Leaves, Flowers, Seeds, Arthro, Vert, Diu, Noc...
## lgl  (1): Fruits

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

plot(df$Body_size_mm, df$Size_at_maturity_max_mm, col = "grey80", main='Correlation of amphibian traits', xlab = "Body size (mm)", ylab = "Max size at maturity (mm)");

Fit a linear model

lm <- lm(Size_at_maturity_max_mm ~ Body_size_mm, data = df) # Ordinary Least Squares regression with General Linear Model 
mod <- print(lm)

## 
## Call:
## lm(formula = Size_at_maturity_max_mm ~ Body_size_mm, data = df)
## 
## Coefficients:
##  (Intercept)  Body_size_mm  
##       0.6237        0.7265

mod

## 
## Call:
## lm(formula = Size_at_maturity_max_mm ~ Body_size_mm, data = df)
## 
## Coefficients:
##  (Intercept)  Body_size_mm  
##       0.6237        0.7265

Plot the data and the model

plot(df$Body_size_mm, df$Size_at_maturity_max_mm, col = "grey80", main='Linear Regression using Sum of Squares', xlab = "Body size (mm)", ylab = "Max size at maturity (mm)"); 
text(4, 5, paste("Intercept = ", round(mod$coefficients[1], 2), sep = ""));
text(6, 3, paste("Slope = ", round(mod$coefficients[2], 2), sep = ""));
abline(a = mod$coefficients[1], b = mod$coefficients[2], col='blue', lwd=2) # use slope and intercept to plot best fit line

Calculate intercept and slope using sum of squares

x <- df$Body_size_mm
y <- df$Size_at_maturity_max_mm
x_bar <- mean(x) # calculate mean of independent variable
y_bar <- mean(y) # calculate mean of dependent variable

slope <- sum((x - x_bar)*(y - y_bar))/sum((x - x_bar)^2) # calculate sum of differences between x & y, and divide by sum of squares of x
slope

## [1] 0.7264703

intercept <- y_bar - (slope * x_bar) # calculate difference of y_bar across the linear predictor
intercept

## [1] 0.6237047

### plot data using manually calculated parameters
plot(x,y, col = "grey80", main='Linear Regression using Ordinary Least Squares', xlab = "Body size (mm)", ylab = "Max size at maturity (mm)"); 
abline(a = intercept, b = slope, col='blue', lwd=2)

Calculate intercept and slope using gradient descent (Machine Learning)

Squared error cost function (a way to calculate the degree of error for a guess for slope and intercept)

### learning rate and iteration limit
alpha <- 0.001
num_iters <- 1000

### keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)

### initialize coefficients
theta <- matrix(c(0,0), nrow=2)

### add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))

# gradient descent
for (i in 1:num_iters) {
  error <- (X %*% theta - y)
  delta <- t(X) %*% error / length(y)
  theta <- theta - alpha * delta
  cost_history[i] <- cost(X, y, theta)
  theta_history[[i]] <- theta
}

print(theta)

##           [,1]
## [1,] 0.1816407
## [2,] 0.8175962

Plot data and converging fit

plot(x,y, col="grey80", main='Linear regression using Gradient Descent', xlab = "Body size (mm)", ylab = "Max size at maturity (mm)")
for (i in c((1:31)^2, 1000)) {
  abline(coef=theta_history[[i]], col="red")
}
abline(coef=theta, col="blue", lwd = 2)

plot(cost_history, type='line', col='blue', lwd=2, main='Cost function', ylab='cost', xlab='Iterations')

## Warning in plot.xy(xy, type, ...): plot type 'line' will be truncated to first
## character

Gradient Descent Machine Learning Regression Modeling

Linear regression with gradient descent

Introduction linear regression with gradient descent

Ordinary Least Square Regression

Simulate data

Generate random data in which y is a noisy function of x

Fit a linear model

Plot the data and the model

Calculate intercept and slope using sum of squares

Plot data using manually calculated parameters

Gradient Descent:

Using the same simulated data as before, we will estimate parameters using a machine learning algorithm

Here’s some figures I found helpful while trying to understand how gradient descent works:

To determine the goodness of fit for a given set of parameters, we will empliment a Squared error cost function (a way to calculate the degree of error for a guess for slope and intercept)

We must also set two additional parameters: learning rate and iteration limit

Plot data and converging fit

Calculate intercept and slope using gradient descent (Machine Learning):

Using gradient descent with real data

Load amphibio data!

Fit a linear model

Plot the data and the model

Calculate intercept and slope using sum of squares

Calculate intercept and slope using gradient descent (Machine Learning)

Squared error cost function (a way to calculate the degree of error for a guess for slope and intercept)

Plot data and converging fit

J Alex Baecher

Postdoctoral Researcher

Related