One of the most well known examples of using Gaussian Process Modelling for forward prediction is the application described in Rasmussen & Williams, which shows the prediction for the future of atmospheric CO2 levels.

When the book was written, the prediction showed the increase in CO2 concentration continuing at roughly the same rate. However, things haven’t gone so well for CO2 concentrations since then.

In 2016, the carbon dioxide content of the Earth’s atmosphere hit a value of 400 parts per million. Even faster than the R&W model predicted.
Here I’m going to step through how to repeat R&W’s CO2 prediction in Python.
[Note: this post follows on from a previous post and an even more previous post.]
GPM with George
Some basic libraries…
import numpy as np import pylab as pl import scipy.optimize as op
In a previous post on Gaussian Process Modelling we coded up covariance kernels and a covariance matrix from scratch. This time we’re going to need more than one kernel. We could write a little library that defines a whole load of different covariance kernels… but in fact somebody has already done it for us 🙂
The george
Gaussian Process Modelling library is pip installable:
pip install george
We need to import the library and the covariance kernels:
import george from george import kernels
We’re also going to need some data. I downloaded these data from the NOAA website.
There is also a reduced version (up to 2001) of this dataset available directly through the Python Statsmodels Datasets package.
# this is a function to read the Mauna Loa data from file def read_co2(filename): co2file = open(filename,'r') time=[];co2=[] while True: line = co2file.readline() if not line: break items = line.split() if (items[0]!='#') and (items[3]!='-99.99'): time.append(float(items[2])) co2.append(float(items[3])) time=np.array(time) co2=np.array(co2) return time,co2 t,y = read_co2("mauna_loa.dat")
I’m going to pick out the data up to 2003 to use as training data points. Then I’m going to use the training data to predict the future at test data points from 2003 to 2025.
First off, here are the training data:
pl.subplot(111) pl.plot(t[np.where(t<2003.)],y[np.where(t<2003.)],ls=':') pl.ylabel(r"CO$_2$ concentration, ppm", fontsize=20) pl.xlabel("year", fontsize=20) pl.title(r"Mauna Loa CO$_2$ data: 1958 to 2003") pl.axis([1958.,2025.,310.,420.]) pl.grid() pl.show()
I’m going to split out my training data as a separate array:
y_to_2003 = y[np.where(t<2003.)] t_to_2003 = t[np.where(t<2003.)]
To select appropriate kernels to describe the behaviour of the data in the covariance matrix we need to look at the trends that are present.
Firstly, there is a long term increase:
We could include a mean function to model this long term rise, but we can also just use a covariance kernel with a large width:
# Squared exponential kernel # h = 66; lambda = 67 k1 = 66.0**2 * kernels.ExpSquaredKernel(67.0**2)
Note that I’m including the values of the hyper-parameters for each kernel in the code as they are defined in R&W.
Secondly, there is that periodic behaviour. So we need a periodic kernel.
However, we don’t know if the behaviour is exactly periodic, so we should allow for some decay away from exact periodicity.
# periodic covariance kernel with exponential component to # allow decay away from periodicity: # h = 2.4; lambda = 90; gamma = 1.3; P = 1 k2 = 2.4**2 * kernels.ExpSquaredKernel(90**2) * kernels.ExpSine2Kernel(2.0 / 1.3**2, 1.0)
Thirdly, there are the medium term irregularities in the long term behaviour. Technically I believe these are known as a wibble.
# rational quadratic kernel for medium term irregularities. # h = 0.66; alpha = 0.78; beta = 1.2 k3 = 0.66**2 * kernels.RationalQuadraticKernel(0.78, 1.2**2)
And… finally, there is the noise. These are empirical data and so they will always have some noise component due to the measurement equipment which can introduce both uncorrelated and short range correlated noise contributions:
# noise kernel: includes correlated noise & uncorrelated noise # h = 0.18; lambda = 1.6; sigma = 0.19 k4 = 0.18**2 * kernels.ExpSquaredKernel(1.6**2) + kernels.WhiteKernel(0.19)
Let’s put all these components together to make our final combined kernel:
kernel = k1 + k2 + k3 + k4
Now we need to feed our combined kernel to the george
library:
gp = george.GP(kernel, mean=0.0)
Then we compute the covariance matrix:
gp.compute(t_to_2003)
and… that’s it.
Predicting the Future
We can now predict the posterior mean and variance at all our test data points:
# range of times for prediction: x = np.linspace(max(t_to_2003), 2025, 2000) # calculate expectation and variance at each point: mu, cov = gp.predict(y_to_2003, x) std = np.sqrt(np.diag(cov))
So how does it look?
ax = pl.subplot(111) # plot the original values pl.plot(t_to_2003,y_to_2003,ls=':') # shade in the area inside a one standard deviation bound: ax.fill_between(x,mu-std,mu+std,facecolor='lightgrey', lw=0, interpolate=True) pl.ylabel(r"CO$_2$ concentration, ppm") pl.xlabel("year") pl.title(r"Mauna Loa CO$_2$ data - Initial Prediction") pl.axis([1958.,2025.,310.,420.]) pl.grid() # display the figure: pl.show()
Optimizing the Hyper-parameters
But… what if the values of the hyper-parameters from R&W weren’t exactly right? We should probably optimise them for the data. We can do that in simple cases using the optimization function in the scipy
library.
To use the scipy
library optimization function we need to provide
(1) some objective function to optimize, and
(2) the gradient of that function.
I’m going to define the objective function for the optimization as a negative log-likelihood. We could write this function ourselves, but in fact george
has a built in log-likelihood that we can simply call directly.
The log-likelihood is computed as:
where y is the variable and x are the points at which it is measured; K is the covariance matrix and X is the operator that maps x onto y.
def nll(p): # Update the kernel parameters and compute the likelihood. gp.kernel[:] = p ll = gp.lnlikelihood(y_to_2003, quiet=True) # The scipy optimizer doesn't play well with infinities. return -ll if np.isfinite(ll) else 1e25
def grad_nll(p): # Update the kernel parameters and compute the likelihood gradient. gp.kernel[:] = p gll = gp.grad_lnlikelihood(y_to_2003, quiet=True) return -gll
gp.compute(t_to_2003)
We can then run the optimization routine:
# initial guess at parameters (see above): p0 = gp.kernel.vector # if you want to view your initial guess values uncomment the line below # print p0 # run optimization: results = op.minimize(nll, p0, jac=grad_nll)
To use the results of the optimization, we then need to update the kernel with the results of the optimization:
gp.kernel[:] = results.x
Re-run the prediction with the updated parameters:
# range of times for prediction: x = np.linspace(max(t_to_2003), 2025, 2000) # calculate expectation and variance at each point: mu, cov = gp.predict(y_to_2003, x) std = np.sqrt(np.diag(cov))
And see what we get:
ax = pl.subplot(111) # plot the original values pl.plot(t_to_2003,y_to_2003,ls=':') # shade in the area inside a one standard deviation bound: ax.fill_between(x,mu-std,mu+std,facecolor='lightgrey', lw=0, interpolate=True) pl.title("Predicted") pl.ylabel(r"CO$_2$ concentration, ppm") pl.xlabel("year") pl.title(r"Mauna Loa CO$_2$ data - Optimised Prediction") pl.axis([1957.,2025.,310.,420.]) pl.grid() # display the figure: pl.show()
Well, the original parameters were basically optimal so things haven’t changed much.
Let’s now compare the prediction with the actual measured data from Mauna Kea after 2003:
ax = pl.subplot(111) # plot the training values pl.plot(t_to_2003,y_to_2003,ls=':',c='b') # shade in the area inside a one standard deviation bound: ax.fill_between(x,mu-std,mu+std,facecolor='lightgrey', lw=0, interpolate=True) # plot the full dataset pl.plot(t[np.where(t>2003)],y[np.where(t>2003)],ls='-',c='r') pl.ylabel(r"CO$_2$ concentration, ppm") pl.xlabel("year") pl.title(r"Mauna Loa CO$_2$ data - Comparison") pl.axis([1958.,2025.,310.,420.]) pl.grid() # display the figure: pl.show()
It’s not quite as much of a difference from the prediction as the news seems to think:
But it does tell us that our model is not perfect and that perhaps:
(1) Accelerated behaviour has appeared in the data after 2003; or
(2) We need another kernel to account for acceleration in the increase; or
(3) Perhaps we should introduce a non-zero mean [my personal guess].
Then for the blog this.