Tuesday, October 16, 2018

Case Study - India Monthly Car Sales Data & Forecast 2018-2020 – A Comparative Analysis using Facebook Prophet & Auto Arima using R – Time Series Forecasting using R


The data used in the case study us gathered from online data sources and is collected from April, 2011 till September 2018 for all the companies in India. The Microsoft Excel Object is placed here:

 India Car Sales in the H1 2018
Due to the contracted Passenger vehicles (PV) sales in September 2018 by 5.61 per cent, auto makers body Society of Indian Automobile Manufacturers (SIAM) signalled a conservative outlook by signalling a slowdown in the sector for the fiscal 2018-19. Overall the sector registered a moderate 6.88 per cent growth in the first half of 2018-19 mainly due to the fall in September 2018 sales -- the biggest monthly slump in this fiscal -- marking the third straight month of decline. 

In the first half of the fiscal, PV sales have grown 6.88 per cent and for the second half also while the outlook is positive, sales will be "slightly" higher or similar, SIAM president Rajan Wadhera told media. "For the entire fiscal, we had a forecast of 9-11 per cent growth but we are now looking more at the lower end of around 9 per cent," he added.

PV sales during the first half of the fiscal were mainly driven by rural demand and new model launches, Wadhera said. He hoped that sales growth would further improve with the onset of festive season. “In the next six months, the growth story of automotive industry will continue to be on the positive side based on strong rural demand and festive season," Wadhera said.
Monthly Car Sales Forecast (October 2018 till September 2019)
The monthly car sales data was collected from April 2011 for all the car manufacturers in India till September 2018 from news paper articles as reported by SIAM and car manufacturers in media. For forecasting, two techniques are used Facebook Prophet and ARIMA based Auto Arima function in Forecast package of R.
Facebook Prophet
Prophet is optimized for the business forecast tasks based on challenges Facebook faced internally related to hourly, daily, or weekly observations with at least a few months (preferably a year) of history, strong multiple “human-scale” seasonality’s and important holidays that occur at irregular intervals that are known in advance (e.g. the Super Bowl), reasonable number of missing observations or large outliers, historical trend changes, for instance due to product launches or logging changes, trends that are non-linear growth curves, where a trend hits a natural limit or saturates.
How Prophet works
At its core, the Prophet procedure is an additive regression model with four main components:
  • ·    A piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting changepoints from the data.
  • ·      A yearly seasonal component modeled using Fourier series.
  • ·      A weekly seasonal component using dummy variables.
  • ·      A user-provided list of important holidays.


ARIMA Modelling
ARIMA stands for Auto-Regressive Integrated Moving Averages popular time series forecasting model. ARIMA models work on the following assumptions – Stationarity of Data i.e. the mean and variance should not vary with time and input of univariate time series data i.e. Date Column & Sales in Unit Columns only
ARIMA has three components – AR (autoregressive term), I (differencing term) and MA (moving average term). ARIMA(p,d,q)
  • AR term refers to the past values used for forecasting the next value. The AR term is defined by the parameter ‘p’ and value is determined using the PACF plot.
  • MA term is used to defines number of past forecast errors used to predict the future values and ‘q’ represents the MA term identified using ACF plot
  • Order of differencing specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to determine whether the series is stationary and help in identifying the d value.

Implementation in R
library("prophet") # This is Facebook Prophet Package
library("forecast") # Forecast package for Auto Arima
library("ggplot2") # DATA VISUALIZATION PACKAGE
indiacarsales=read.csv(file.choose())
# The above command helps to import the data from anywhere on the laptop or desktop
head(indiacarsales) # First 6 rows of data
tail(indiacarsales) # last 6 rows of data
summary(indiacarsales) # Basic Descriptive Statistics
str(indiacarsales)
# str - Data Structure & Individual variable types. if you observe Date format is not correct shows factor & Need to convert Date to proper format using as.Data
indiacarsales$Date=as.Date(indiacarsales$Date,format="%d/%m/%y")
str(indiacarsales) # Now Date is shown as Date format
# Data frame has 3 columns Date, Company, Sales(units) We need only Date & monthly Sales(units)
# Aggregate Data into Total Monthly Sales of all brands & Create a new DataFrame of the Monthly Sales
monthlysales=aggregate(. ~Date, data=indiacarsales,sum,na.rm=T)
monthlysalesdf=monthlysales[-2]
# Deleting the Company Column
summary(monthlysalesdf) # Basic Descriptive Statistics
plot(monthlysalesdf$Sales.in.Units,type="l")
# line plot of monthly sales of cars in india
colnames(monthlysalesdf)=c("ds","y")
# Rename colnames Date as "ds" & "y" Sales(units) & This required as per Facebook Prophet Package
monthlyforecast=prophet(monthlysalesdf, seasonality.mode = "multiplicative")
#This time series has a clear yearly cycle but the seasonality in the forecast is too large at the start of the time series and too small at the end. In this time series, the seasonality is not a constant additive factor as assumed by Prophet, rather it grows with the trend. This is multiplicative seasonality. Prophet can model multiplicative seasonality by setting seasonality_mode='multiplicative' in the input arguments.
future=make_future_dataframe(monthlyforecast,periods=24, freq="month")
#Creating Dataframe for future forecast for next 2 years monthly forecast 24 months, frequency must be specified  as "month"
forecast2year=predict(monthlyforecast,future)
# Using the model & prediction time frame exceute predict
plot(monthlyforecast,forecast2year)
# plot the car sales date with forecast for next 24 months
prophet_plot_components(monthlyforecast,forecast2year)
# See the decomposition of the forecast elements
######## ARIMA FORECAST using the foecast Package #####
carsales=monthlysalesdf$y  # SELECT ONLY SINGLE COLUMN SALES in UNITS
carsalests=ts(carsales,start=c(2011,4),frequency = 12)
 # CONVERT DATA INTO UNIVARIATE TIME SERIES USING TS
# FUNCTION, WITHOUT CONVERTION THE MODEL DOES NOT WORK
fit=auto.arima(carsalests)
# FITTING THE MODEL USING THE AUTO.ARIMA FUNCTION
summary(fit)
# SUMMARY OF THE ARIMA MODEL INCLUDING ACCURACY DIAGNOSTICS
# ROOT MEAN SQUARED ERROR, MEAN ABSOULTE ERROR,AIC & BIC
fitforecast=forecast(fit,h=24)
# FORECASTING FOR THE NEXT 24 MONTHS
plot(fitforecast)
# LINE PLOT OF THE FORECAST
arforecast=ts.union(fitforecast$fitted,fitforecast$mean)
arima=pmin(arforecast[,1], arforecast[,2], na.rm = TRUE)
# CONVERTING THE FORECASTS INTO A DATAFRAME FROM TIME SERIES OBJECT
predictcompare=data.frame(Date=time(arima), arima=as.matrix(arima), prophet=forecast2year$yhat)
# THE COMPARITIVE DATAFRAME OF BOTHE PROPHET FORECAST & ARIMA FORECAST
# USING GGPLOT TO PLOT BOTH FOR COMPARITIVE ANALYSIS
ggplot(predictcompare, aes(Date)) +
  geom_line(aes(y = arima, colour = "arima")) +
  geom_line(aes(y = prophet, colour = "prophet"))
# PLOTTING USE BASE FUNCTIONS & NOT GGPLOT PACKAGE
plot(predictcompare$arima,type="l",col="blue",yaxt="n",ann=F,xaxt="n")
par(new=T)
plot(predictcompare$prophet,type="l",col="yellow",yaxt="n",ann=F,xaxt="n")
par(new=T)
plot(monthlysalesdf$y,type="l")

ARIMA OUTPUT
Series: carsalests
ARIMA(1,1,1)(1,0,0)[12]

Coefficients:
         ar1      ma1    sar1
      0.3503  -0.8838  0.6281
s.e.  0.1238   0.0572  0.0857

sigma^2 estimated as 314686491:  log likelihood=-998.81
AIC=2005.61   AICc=2006.09   BIC=2015.57

Training set error measures:
                   ME     RMSE
Training set 2497.539 17340.72
                  MAE       MPE
Training set 13364.85 0.5996624
                 MAPE      MASE
Training set 5.786336 0.6909754
                     ACF1
Training set -0.006334485

No comments:

Post a Comment