AdaBoost Ensemble ML Algorithm
Random Forest 모델 만들기와 비슷하게 만듦
우선 AdaBoost를 담을 environment 만듦
> weatherADA <- new.env()
> evalq({
+ data <- na.omit(weather) # weather dataset 중에 NA가 있는 observation/row 들을 제외시킴
+ nobs <- nrow(data)
+ form <- formula(RainTomorrow ~ .)
+ target <- all.vars(form)[1]
+ vars <- -grep('^(Date|Location|RISK_)', names(data))
+ set.seed(42)
+ train <- sample(nobs, 0.7*nobs) # 1부터 nobs 숫자 사이에서 임의로 70% 숫자를 train으로 샘플링
+ validate = sample(setdiff(seq_len(nobs), train), 0.15*nobs) # train에 속하지 않은 것 중에서 0.15*nobs 을 validate으로
+ test <- setdiff(setdiff(seq_len(nobs), train), validate) # train과 validate에 속하지 않은 나머지를 test dataset으로 포함.
+ }, weatherADA)
> dim(weatherADA$data)
[1] 328 24
> install.packages("ada") # adaboost package 설치
> library(ada) # adaboost 로드
Loading required package: rpart
> evalq({
+ control <- rpart.control(maxdepth=1, # decision stump 활용
+ cp=0.010000,
+ minsplit=20,
+ xval=10)
+ modelADA <- ada(formula=form,
+ data=data[train, vars],
+ control=control,
+ iter=300) # 300번 차례로 decision stump 생성
+ }, weatherADA)
> weatherADA$modelADA
Call:
ada(form, data = data[train, vars], control = control, iter = 300)
Loss: exponential Method: discrete Iteration: 300
Final Confusion Matrix for Data:
Final Prediction
True value No Yes
No 184 5
Yes 12 28
Train Error: 0.074
Out-Of-Bag Error: 0.079 iteration= 159
Additional Estimates of number of iterations:
train.err1 train.kap1
249 249
> varplot(weatherADA$modelADA) # variable importance plot
> plot(weatherADA$modelADA) # Training Error vs. Iteration
> predictADA<- predict(weatherADA$modelADA, weather[weatherADA$test , ])
> tt = data.frame(weatherADA$test, predictADA)
> tt
weatherADA. test predictADA
1 4 No
2 7 No
3 9 No
4 13 No
5 29 Yes
6 34 No
7 35 No
8 48 No
9 54 No
10 62 No
11 67 No
12 68 No
13 82 No
14 83 No
15 91 No
16 96 Yes
17 102 No
18 104 Yes
19 105 Yes
20 117 No
21 125 No
22 130 No
23 152 No
24 153 No
25 164 No
26 167 No
27 176 No
28 187 No
29 189 No
30 191 No
31 196 No
32 207 No
33 224 No
34 226 No
35 227 No
36 232 No
37 239 No
38 246 No
39 252 No
40 258 No
41 259 No
42 262 No
43 266 No
44 276 No
45 280 No
46 285 No
47 322 No
48 326 No
49 327 Yes
50 328 No
> cm = table(Observation = weather[weatherADA$test , "RainTomorrow"], Prediction = predictADA)
> cm
Prediction
Observation No Yes
No 38 2
Yes 7 3 # false negative가 많다. 튜닝이 필요함. decision tree의 depth 증가함이?
'Learning & Reasoning > R ' 카테고리의 다른 글
Erratum in "The art of R programming" (0) | 2013.02.08 |
---|---|
vector라 불리는 R 자료구조 (0) | 2013.02.06 |
random forest (0) | 2013.01.28 |
rpart 패키지를 이용해 decision tree 만들기 (0) | 2013.01.25 |
R 데이터 cleaning (0) | 2013.01.21 |