Beetroots illustration

In this post I will test the estimated noise sequence from the difference between the forecast and the actual time series. Taking the 2020’s electricity consumption in France and comparing it to the forecasts. First I’m going to test the raw residuals coming from the difference between the forecast and the consumption. In a second part I’ll fit a SARIMA model to the preceding residuals to further perform the same tests and compare them.

Table of contents

Electric energy consumption in France
J-1 Forecast Evaluation
SARIMA Residuals J-1 Forecast Evaluation
Final Comparison
Sources

First, let’s make a brief scan of the dataset, mostly by visual checking, without getting to much into the details.

Electric energy consumption in France

Let’s see the shape of the electricity consumption in France for year 2020. We also plot the forecast and the forecast error. The forecast error is the consumption minus the forecast.

Plot showing power electric energy consumption in France for year 2020

Data integrity

Firstly I performed a brief sanity check. Since the dataset sample rate is initially 15 minutes but only half hour data are provided, I removed the null values. Then I checked that the time steps are constant and continue over time.

Behavior shifts

Looking at the time serie of the consumption of electricity, we can easily see two different regimes. One in winter with a high consumption of electric energy and another in summer when people tend to consume less electricity (in France at least).

Extreme values

No extreme values seem to be present neither in the forecast nor in the prediction. However, the forecast error do seem to have a few extreme values.

There are about 70 values where the forecast error is greater than 5 GigaWatts in absolute value:

\[\lvert {FORECAST}\_{ERROR} \rvert > 5 GW\]

Note that 5 GigaWatts is a quite arbitrary value. Giving that 1 MegaWatts is equivalent to the consumption of about 600 homes, then 5GW would be of about 3 millions homes, which is quite a number in comparison with the France population, which has about 30 millions homes, knowing than more that one third of the total electric power consumption is consumed by private homes.

First lock down due to Covid

The spread of the coronavirus among the population leads to a national lock down in France. Looking at the time series, the unusual situation cause a higher variance during a few days. As a result, forecast the electricity consumption one day ahead had to be trickier than usual and over or underestimations of the consumption happened. For instance, the 17th of March might be the day with the higher forecast error, with an error peak at almost 10 GigaWatts.

First covid lock down

Thursday, March 12	Childcare centers are closed, as well as elementary schools and universities
Saturday, March 14	Third and last stage of the crisis plan announced by the prime minister
Tuesday, March 17	Lock down, movement restriction in all European Union, frontiers of the Schengen Area are closed

Second lock down

During the second lock down, the perturbations in the forecast error were less important. Sunday 25th of October was the day is the higher forecast error on this period. The error peaked at 6 GW and lasted for 1/2 hour, underestimating the consumption.

Second covid lock down

Saturday, October 17	Mandatory curfew in certain regions for at least 4 weeks (Ile de France, and eight other cities. Meetings are limited to 6 people.
Thursday, October 22	Curfew extension to 38 new french departements, in addition to the 16 departements already curfewed
Friday, October 30	Generalized lock down. Mandatory lock down of non-essential stores. Movement restriction. Unlike the first lock down childcare centers and schools stay opened

Christmas 2020

As for christams 2020, there is an unusual error in the forecast. This time it’s an overestimation of the consumption. Electric Energy Consumption during 2020's Christmas

Seasonalities

As one can expect, there might be at least 2 kind of seasonality in this time series: daily and weekly.

Daily seasonality box plot

Weekly seasonality box plot

J-1 Forecast Evaluation

The residuals are the prediction errors

\[w_t = {Consumption}_t - {Forecast}_t= x_t - \hat{x_t}\]

If \(w_t\) is positive, electric power consumption was greater than expected, otherwise it was underestimated.

Basic informations

	count	mean	std	variance	skewness	kurtosis
PREVISION_ERROR	17568	0.115271	1.11063	1.23351	0.88854	7.09978

	count	min	25%	median	75%	max
PREVISION_ERROR	17568	-5.819	-0.519	0.102	0.695	10.299

24-hours window rolling statistics

Rolling mean and standard deviation of the residuals

Autocorrelation tests

Correlogram

Autocorrelations of the residuals

From the correlograms we can see that:

Autocorrelation slowly decay
Looking at the PACF, it would be possible to fit and AR(1) or AR(2) model
Peaks are present at around 24 hours (\(48 \times {1 \over 2}\) hours), this shows the presence of a seasonal process every 24 hours

As we we test the randomness of residuals assuming they should be white noise, the confidence intervals of ACF values are at 2 standard errors around 0.

Looking at a correlogram with a maximum lag at more than a week, it seems that the main seasonal effect is at 24 hour interval. Indeed the 24 hour peaks decay every 24h, and no 7 days seasonal effect seems to come out. Weekly autocorrelations

Ljung-Box test

Ljung-Box test is a portmanteau test for testing the autocorrelation in either raw data or model residuals.

Lags	\(Q_{LB}\) statistic	p-value
96	142969	0
144	143366	0

Since the test statistic is high 143366 and the p-value is \(<<\) 0.05 we reject the iid hypothesis at level 0.05. It suggests that the sample autocorrelations are too large to be a sample from an iid sequence. We reject the null hypothesis of the test and conclude that the residuals are not independent.

McLeod-Li test

McLeod-Li test is a also a portmanteau test similar to Ljung-Box test. Since p-values for lags [0,…,144] are \(<<\) 0.05 we reject the iid hypothesis at level 0.05. We reject the null hypothesis of the test and conclude once again that the residuals are not independent.

Independence tests

Turning Point Test

Turning Point test is a test of statistical independence of a data series. It can detect cyclic trends. Under the hood, the turning point test compares the number of turning points present in the series with the number of turning points expected to be present in an i.i.d. series.

\(T\) statistic	p-value	n	\(\mu\)	\(\sigma\)
-41.5094	0	17558	11704	55.8668

Since \(T - \mu\) is way below zero, it indicates that there might be a positive correlation between neighboring observations. The p-value is \(<<\) 0.05, we reject the null hypothesis that the data series is i.i.d., then the data serie probably has a remaining periodic trend.

Rank Test

The rank test is useful for detecting linear trends in data series by comparing the number of increasing pairs in the series with the number expected in an i.i.d. series.

\(P\) statistic	p-value	pairs	n	\(\mu\)	\(\sigma\)
-4.56648	4.95979e-06	7.52957e+07	17558	7.70665e+07	387775

Since the \(P - \mu\) is large negative -7e+07, it indicates the presence of an decreasing trend in the data. The assumption that \({w_t}\) is a sample from an iid sequence is therefore rejected at level 0.05, giving the p-value is \(<<\) 0.05.

Stationarity tests

KPSS Test

KPSS test Statistic	p-Value	# Lags Used	Critical Value (10%)	Critical Value (5%)	Critical Value (2.5%)	Critical Value (1%)
0.246049	0.1	144	0.347	0.463	0.574	0.739

The null hypothesis \(H_0\) is that the data is stationary around a determinist trend. Since \(p>0.05\), we cannot reject the null hypothesis, thus the time series has no unit root, and seems stationary.

Augmented Dickey-Fuller Test

ADF Test Statistic	p-Value	# Lags Used	# Observations Used	Critical Value (1%)	Critical Value (5%)	Critical Value (10%)
-10.0775	1.21258e-17	144	17423	-3.43073	-2.86171	-2.56686

Since \(p<0.05\) we can reject the null hypothesis that the time series has a unit root. Then the alternative hypothesis \(H_a\) the time series hasn’t a unit root, seems to be stationary.

Normality tests

Here we plot the sample distribution of the forecast error in blue, and the ideal gaussian density distribution in orange.

Distribution of the forecasted error

The sample distribution calculated with kernel density estimation seems higher, and seems to have a fatter tail than the gaussian, maybe due to outliers.

Q-Q plot

Q-Q plot of the forecasted error

Both the ends of the Q-Q plot deviate from the straight line and its center follows a straight line. Kurtosis is 8, confirm that the tail is larger than one drawn from a normal distribution.

Jarque-Bera Tests

Since sample size is quite large (> 10000), we can use the Jarque-Bera test.

J-Q Test Statistic	p-value
39184.568	0.0

The test statistic is 39184 and the corresponding p-value is \(<<\) 0.05. Since this p-value is less than 0.05, we reject the null hypothesis. This data skewness and kurtosis is significantly different from a normal distribution.

SARIMA Residuals J-1 Forecast Evaluation

\[\Phi_p(L) \Phi_P(L^m) (1-L)^d (1 - L^m)^D X_t = \Theta_q(L) \Theta_Q(L^m) \epsilon_t\]

(p, d, q): order of the model for the autoregressive, differences, moving average components
(P, D, Q, m): order of the seasonal component of the model for the

Daily data with periodicity of 24h, so m = 48 in our case, and we take the following parameters:

order: (p=2, d=0, q=3),
seasonal order: (P=1, D=0, Q=0, m=48)

For some simple ARIMA models see my previous post Four simple instantiations of ARIMA processes as a cheat sheet

Basic informations

count	mean	std	variance	skewness	kurtosis
17568	-4.26557e-06	0.371329	0.137885	0.491632	8.19931

The mean of the SARIMA residuals are closer than 0, and the variance is smaller than the other residuals.

count	min	25%	median	75%	max
17568	-3.8873	-0.209651	-0.00853736	0.203478	5.05434

24-hours window rolling statistics

Rolling mean and standard deviation of the SARIMA residuals