Learning & Reasoning/R

Stationarity of Financial Time Series

이현봉 2015. 12. 31. 16:41

Title: Talk about Financial Time Series(시계열) Stationarity

Author: “Hyun Bong Lee”

Get Financial Time Series

library(quantmod)
## Warning: package 'quantmod' was built under R version 3.2.2
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.2.2
## Version 0.4-0 included new data defaults. See ?getSymbols.
Posco <- getSymbols("005490.KS", src="yahoo", auto.assign = F)  #포항제철 
##     As of 0.4-0, 'getSymbols' uses env=parent.frame() and
##  auto.assign=TRUE by default.
## 
##  This  behavior  will be  phased out in 0.5-0  when the call  will
##  default to use auto.assign=FALSE. getOption("getSymbols.env") and 
##  getOptions("getSymbols.auto.assign") are now checked for alternate defaults
## 
##  This message is shown once per session and may be disabled by setting 
##  options("getSymbols.warning4.0"=FALSE). See ?getSymbols for more details.
SPY <- getSymbols("SPY", auto.assign = F)    # S&P 500 ETF

invisible(Sys.setlocale("LC_TIME", "C"))  # google에서 getSymbol 사용시 필요
KOSPI200 = getSymbols('KRX:KOSPI200', src="google", auto.assign = F, 
                      from="2007-01-01")  # 코스피200

A temporal random process (time series) is called (strictly) stationary if all possible probabilistic
behavior/statistical propertiess of the time series do not change over time (ie. time invariant).
Strict stationary process is very hard to verify.

Thus, a milder form of stationary process is widely used. We say a time series X is weakly stationary if its
first two moments(mean & autocovariance) are time invariant. That is E[Xt]==E[Xs] & cov(Xt, Xs) == cov(Xt+h, Xs+h) for any h, and t != s.

시계열이 stationary 하면 우리가 어느 시간에 시계열을 보고 확률/통계치를 구해도 모두 같은 값이라는 의미

우리가 어떤 stochastic process를 stationary하다고 할 때는 보통 weakly stationary를 말한다.

Why we care about stationarity?  과거를 이해/분석해서 미래를 예측할 수 있어야 하기에...
Stochastic process가 non-stationary 하다는 말은 프로세스의 probabilitic
행태를 이해하기 어렵다는 의미. So, we want inputs to our stochastic prediction
model to be stationary. If we cannot find stationary process, we make one as close to
stationary process as possible from the original process.

Stationary process의 대표적인 것이 white noise. 실제에서 마주하는 random process가 stationary 한가를
판별하는 것은 쉽지않다. 한 시스템에서 process를 여러번 검출할 수 있으면 (ensemble of Processes) 그나마 형편이 좀 낫지만 Financial process같이 한 번 밖에 프로세스를 구할 수 없는 경우는 특히 그렇다. 그리고, 어느
정도로 (이상적인) stationarity 정의를 만족하면 stationary하다고 할 수 있는지도 분명치 않기에 Stationarity를
시험하는 여러 방법들이 만들어졌다. 따라서, 우린 상황에 따라 적절한 방법을 적용해 stationarity 체크를 한다.


# White noise made with series of 1000 rv's. Each are N(0, 4) and are 
# independent and identically distributed (iid) 
set.seed(101)
wn = ts(rnorm(1000, mean=0, sd=2))
plot(wn)  # white noise

# 어떤 구간에서도 mean이 같은가 검사 
means = numeric(10)
start = 1
for (i in 1:10) {
  means[i] <- mean(wn[start: (100*i)])
  start = start+100
}

means # 어디에서 mean을 구해도 모든 mean 값이 0에 가까운 값을 보임.  
##  [1] -0.074382200 -0.084004744  0.004140797 -0.050933026 -0.422634356
##  [6]  0.099202711 -0.239808195 -0.272373935  0.104018544  0.239533281
# 아래 플롯에서 wn의 autocovariance가 lag 절대값 |s-t| 에만 좌우되는지 확실히   
# 보이지는 않음.  그러나 모든 구간에서 거의 모든 lag의 |autocovariance|  
# 가 다 작은 값을 나타냄. 

acf(wn[1:500], type="covariance")

acf(wn[201:700], type="covariance")

acf(wn[401:900], type="covariance")

포항제철의 주가 움직임이 stationary인가 눈으로 보면…

plot(Posco$"005490.KS.Close")  # means are different depending on time. So, non-stationary  

# 그냥 보기만 해도 어느 지점에서 보는가에 따라 mean 값이 다르다는 것을 알수 있다. 

# Stationary 같은가 그래프로 우선 살피면...;
Posco_Close_xts = as.xts(Posco$"005490.KS.Close")  # change to xts object 

test_ranges <- function(data, start_date, end_date, title) {
  no_ranges = length(start_date)
  par(mfrow = c(ceiling(no_ranges/2), 2))
  for(i in 1:no_ranges) {
    # 기간을 설정 
    range <- paste0(start_date[i], "::", end_date[i])
    time_series <- data[range]
    mean_data <- round(mean(time_series, na.rm = TRUE), 3)
    sd_data <- round(sd(time_series, na.rm = TRUE), 3)

    # Plot the histogram along with a legend
    hist_title <- paste(title, range)
    hist(time_series, breaks = 100, prob=TRUE, xlab = "", 
         xlim = range(data, na.rm=T), main = hist_title, cex.main = 1.1)
    legend("topright", cex = 0.9, bty = 'n', paste("mean=", mean_data, "; sd=", sd_data))
    abline(v=mean_data, lwd=2, col="red", lty=2)
  }
  par(mfrow = c(1, 1))
}

begin_dates <- c("2007-01-01", "2009-01-01", "2011-01-01", "2013-01-01", "2015-01-01")
end_dates <-   c("2008-12-31", "2010-12-31", "2012-12-31", "2014-12-31", "2015-12-31")

# Create plots
test_ranges(Posco_Close_xts, begin_dates, end_dates, "POSCO 주가:")

# 5 ranges have different means.  So Posco Stock Quotes is non-stationary w.r.t. mean


대부분의 Financial Price 데이터는 nonstationary
Let’s make a stationry process from the original nonstationary data

Log Returns : Stationary?

returns_log <- Delt(Posco_Close_xts, k=1, type="log") # diff(log(Posco_Close_xts))
plot(returns_log)

test_ranges(returns_log, begin_dates, end_dates, "POSCO log returns for:")

그래프를 보면 4 구간에서 평균값들이 모두 비슷하고 (0), sd도 모두 비슷하고, distribution
모양들이 모두 Gaussian 비슷한 것이 “대충” (weakly) stationary 할 것 같아 보인다.


Formal Stationarity Test : (Widely used) Stationarity Test in R

  • Augmented Dickey–Fuller (ADF) test
  • Elliott–Rothenberg–Stock test
  • KPSS unit root test
  • Phillips–Perron test

* Issue: Trend Stationary, Seasonal Stationary 라는 개념이 말이 되나?

ADF Test using tseries adf.test()
The Augmented Dickey–Fuller (ADF) test: test for null hypothesis of non-stationarity
- the more negative Dickey-Fuller statistics is, the more likely to reject null (non-stationarity)
- small p-values suggest the data is stationary
- https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test

#  앞에서 Posco_Close_xts 그래프를 보고 nonstationary 일 것이라 예상 
library(tseries)
adf.test(Posco_Close_xts, alternative = "stationary")  # test for null hypothesis of non-stationarity 
## 
##  Augmented Dickey-Fuller Test
## 
## data:  Posco_Close_xts
## Dickey-Fuller = -3.6656, Lag order = 13, p-value = 0.02635
## alternative hypothesis: stationary
# p값이 생각보다 작다. 따라서 stationary 하다고 생각할 수 있는 여지가 있다. 
# adf.test()는 trend/seasonal stationary 특성을 보이는 시계열을
# reject하려는 속성을 보여준다. 따라서, tseries package의 adf.test()는 trend/seasonal stationary 특성을  
# 보이는 시계열에 대해 non-stationarity null hypothesis를 reject 하여 결과적으로 그 시계열을 
# stationary 하다고 생각케 한다. adf.test()는 trend stationarity와 보통 constant stationarity를   
# 구별하는 데에는 적절한 도구가 아니다   

adf.test(na.remove(returns_log), alternative = "stationary")   
## Warning in adf.test(na.remove(returns_log), alternative = "stationary"): p-
## value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.remove(returns_log)
## Dickey-Fuller = -14.897, Lag order = 13, p-value = 0.01
## alternative hypothesis: stationary


ADF Test using adfTest() of fUnitRoots
Tests for null hypothesis of non-stationarity. More negative the “Dickey-Fuller” statistic and smaller P value,
the more confidence on rejecting the null hypothesis (of nonstationarity). And thus more confidence on stationarity.

library(fUnitRoots)

adfTest(Posco_Close_xts, lags = 0, type = "c")  # null (nonstatinarity)를 reject하기 어려움 
## Warning in if (class(x) == "timeSeries") x = series(x): the condition has
## length > 1 and only the first element will be used
## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 0
##   STATISTIC:
##     Dickey-Fuller: -1.5288
##   P VALUE:
##     0.4893 
## 
## Description:
##  Thu Dec 31 16:19:17 2015 by user: Administrator
adfTest(returns_log, lags = 0, type = "c")   # suggests stationarity 
## Warning in if (class(x) == "timeSeries") x = series(x): the condition has
## length > 1 and only the first element will be used
## Warning in adfTest(returns_log, lags = 0, type = "c"): p-value smaller than
## printed p-value
## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 0
##   STATISTIC:
##     Dickey-Fuller: -45.7847
##   P VALUE:
##     0.01 
## 
## Description:
##  Thu Dec 31 16:19:17 2015 by user: Administrator
# Dickey-Fuller: -45.7847; big negative, P value: 0.01 ; small -> null (nonstatinarity)를 reject.  


Determine stationarity with ’urca" package
Performs the KPSS unit root test, where the Null hypothesis is stationarity

library(urca)
test = ur.kpss(as.numeric(Posco_Close_xts) )  # type="mu" for level/constant stationarity
test@teststat  # 16.0658,  S4 object          # type="tau" for trend stationarity
## [1] 16.0658
test@cval      # 0.347 0.463  0.574 0.739
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
# For given 0.05 level, since 16.0658 > 0.463, the null hypothesis(level stationay) is
# rejected. So, likely non_stationary

test = ur.kpss(as.numeric(Posco_Close_xts), type="tau" )
test@teststat   # 1.750902
## [1] 1.750902
test@cval
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216
# at 0.05 level, since 1.750902 > 0.146 the null hypothesis of trend stationarity is rejected  
# So, trend nonstationary is likely 

test = ur.kpss(as.numeric(returns_log) )  
test@teststat  # 0.3034485
## [1] 0.3034485
test@cval      # 0.347 0.463  0.574 0.739
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
# at 0.05 level, since 0.3034485 < 0.463, the (level stationarity) null hypothesis cannot be
# rejected (But barely).  Thus could be said "level stationary" (with caution)

test = ur.kpss(as.numeric(returns_log),  type="tau")  
test@teststat  # 0.03532884
## [1] 0.03532884
test@cval    # 0.119 0.146  0.176 0.216
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216
# trend stationarity cannot be rejected. "Trend stationary"" likely


Determine stationarity with kpss.test in ’tseries" package

kpss.test(Posco_Close_xts, null="Level")
## Warning in kpss.test(Posco_Close_xts, null = "Level"): p-value smaller than
## printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  Posco_Close_xts
## KPSS Level = 12.103, Truncation lag parameter = 11, p-value = 0.01
# KPSS Level = 12.103, Truncation lag parameter = 11, p-value = 0.01
# since KPSS Level 12.103 > 0.01, the null hypothesis is rejected. Thus nonstationary

kpss.test(returns_log, null="Trend")
## Warning in kpss.test(returns_log, null = "Trend"): p-value greater than
## printed p-value
## 
##  KPSS Test for Trend Stationarity
## 
## data:  returns_log
## KPSS Trend = 0.037189, Truncation lag parameter = 11, p-value =
## 0.1
# KPSS Trend = 0.037189, Truncation lag parameter = 11, p-value = 0.1
# Since 0.037189 < 0.1, the null hypothesis is not rejected. Thus Trend stationary

kpss.test(returns_log, null="Level")
## Warning in kpss.test(returns_log, null = "Level"): p-value greater than
## printed p-value
## 
##  KPSS Test for Level Stationarity
## 
## data:  returns_log
## KPSS Level = 0.3178, Truncation lag parameter = 11, p-value = 0.1
# KPSS Level = 0.3178, Truncation lag parameter = 11, p-value = 0.1
# in this test since 0.3178 > 0.1, the process cannot be said "Level stationary"