Exploratory Data Analysis
First, we create a time series plot for weekly sales of type A stores from the Walmart dataset mentioned in Data Visualization tab.
Code
ggplot(A_sales,aes(x=Date,y=avg))+
geom_line()+
labs(
x = "Date",
y = "Weekly sales"
+
)ggtitle("Weekly sales of type A stores")
According to this plot, we can clearly see some periodic fluctuations every year. At the end of every year, the weekly sales reach two consecutive peaks. The trend and monthly seasonality are not obvious. Since the variation of data does not change significantly in different years, the components of this time series are more likely to be additive.
1 Lag Plot
Code
= ts(A_sales$avg, start=decimal_date(as.Date("2010-02-05")), frequency = 365.25/7)
temp.ts
gglagplot(temp.ts, do.lines=FALSE, lags=1)+xlab("Lag 1")+ylab("Yi")+ggtitle("Lag Plot for Weekly Sales")
This lag plot shows information consistent to what we discussed above. The weekly sales near the end of years are much higher than sales of other time. Besides these extremely higher values, weekly sales of other time seem to be stationary.
2 Decomposition
Now, let’s decompose this time series:
Code
=decompose(temp.ts, "additive")
ts_decompautoplot(ts_decomp)
From the results of decomposing, we can notice yearly seasonal component. The trend does exist, but it is not so significant according to the y-axis.
3 ACF and PACF
Then, plot ACF and PACF to see the autocorrelation and stationarity:
Code
ggAcf(temp.ts,150)
According to this ACF plot, we notice an extreme value at lag 50, which also represents the end of years. Besides this extreme value, the autocorrelation is not significant and this time series is likely to be stationary.
Code
ggPacf(temp.ts,150)
PACF plot also shows relatively good stationarity.
4 Dickey-Fuller Test
Finally, use the Augmented Dickey-Fuller Test to further check the stationarity:
Code
adf.test(temp.ts)
Augmented Dickey-Fuller Test
data: temp.ts
Dickey-Fuller = -5.3171, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary
From the results of Augmented Dickey-Fuller Test, the p-value is smaller than 0.05, which means that we have enough evidence to reject the null hypothesis and we can conclude that this time series is stationary.
5 Moving Average Smoothing
In this part, I will use moving average smoothing to identify the underlying pattern of the data.
Code
=SMA(temp.ts,n=52)
sma52=SMA(temp.ts,n=13)
sma13=SMA(temp.ts,n=5)
sma5
=data.frame(
dfyear=index(temp.ts),
original=temp.ts,
ma_5=sma5,
ma_13=sma13,
ma_52=sma52
)=melt(df,id=c("year"))
dfnames(df)=c("Date", "Type", "Sales")
<- c("black", "red", "green", "orange")
Palette
ggplot(df, aes(x=Date,y=Sales,color = Type))+
geom_line()+
scale_colour_manual(values=Palette)+
ggtitle("Weekly sales of type A stores for different moving average windows")
In this plot, black line represents for the original data and other color represents for three different windows of moving average smoothing. We can notice that when the number of moving average window is 5, the trend and seasonality is still as obvious as the original data. When the number is 13, the peaks at the end of every year is also significant but the peak values are much lower than original data. When the number is 52, which is the number of weeks in a year, the seasonality disappears and we can only notice a slightly increasing trend.