Learning & Reasoning/R

AdaBoost

이현봉 2013. 2. 1. 17:12

AdaBoost Ensemble ML Algorithm

Random Forest 모델 만들기와 비슷하게 만듦

우선 AdaBoost를 담을 environment 만듦

> weatherADA <- new.env()

> evalq({
+ data <- na.omit(weather)                  # weather dataset 중에 NA가 있는 observation/row 들을 제외시킴
+ nobs <- nrow(data)          
+ form <- formula(RainTomorrow ~ .)
+ target <- all.vars(form)[1]
+ vars <- -grep('^(Date|Location|RISK_)', names(data))
+ set.seed(42)
+ train <- sample(nobs, 0.7*nobs)        # 1부터 nobs 숫자 사이에서 임의로 70% 숫자를 train으로 샘플링         
+ validate = sample(setdiff(seq_len(nobs), train), 0.15*nobs)   # train에 속하지 않은 것 중에서 0.15*nobs 을 validate으로
+ test <- setdiff(setdiff(seq_len(nobs), train), validate)   # train과 validate에 속하지 않은 나머지를 test dataset으로 포함.
+ }, weatherADA)

> dim(weatherADA$data)
[1] 328  24

> install.packages("ada")    # adaboost package 설치

> library(ada)                     # adaboost 로드
Loading required package: rpart

> evalq({
+ control <- rpart.control(maxdepth=1,        # decision stump 활용
+ cp=0.010000,
+ minsplit=20,
+ xval=10)
+ modelADA <- ada(formula=form,
+ data=data[train, vars],
+ control=control,
+ iter=300)                                                # 300번 차례로 decision stump 생성
+ }, weatherADA)
> weatherADA$modelADA
Call:
ada(form, data = data[train, vars], control = control, iter = 300)

Loss: exponential Method: discrete   Iteration: 300

Final Confusion Matrix for Data:
                  Final Prediction
True value      No Yes
       No          184   5
       Yes          12  28

Train Error: 0.074

Out-Of-Bag Error:  0.079  iteration= 159

Additional Estimates of number of iterations:

train.err1 train.kap1
       249        249

> varplot(weatherADA$modelADA)   # variable importance plot


> plot(weatherADA$modelADA)   # Training Error vs. Iteration

> predictADA<-  predict(weatherADA$modelADA, weather[weatherADA$test , ])

> tt = data.frame(weatherADA$test, predictADA)
> tt
   weatherADA.     test predictADA
1                4        No
2                7        No
3                9        No
4               13        No
5               29       Yes
6               34        No
7               35        No
8               48        No
9               54        No
10              62        No
11              67        No
12              68        No
13              82        No
14              83        No
15              91        No
16              96       Yes
17             102        No
18             104       Yes
19             105       Yes
20             117        No
21             125        No
22             130        No
23             152        No
24             153        No
25             164        No
26             167        No
27             176        No
28             187        No
29             189        No
30             191        No
31             196        No
32             207        No
33             224        No
34             226        No
35             227        No
36             232        No
37             239        No
38             246        No
39             252        No
40             258        No
41             259        No
42             262        No
43             266        No
44             276        No
45             280        No
46             285        No
47             322        No
48             326        No
49             327       Yes
50             328        No


> cm = table(Observation = weather[weatherADA$test , "RainTomorrow"], Prediction = predictADA)
> cm
                   Prediction
Observation     No Yes
        No          38   2
        Yes          7   3                 # false negative가 많다.  튜닝이 필요함.  decision tree의 depth 증가함이?




'Learning & Reasoning > R ' 카테고리의 다른 글

Erratum in "The art of R programming"  (0) 2013.02.08
vector라 불리는 R 자료구조  (0) 2013.02.06
random forest  (0) 2013.01.28
rpart 패키지를 이용해 decision tree 만들기  (0) 2013.01.25
R 데이터 cleaning  (0) 2013.01.21