[R][Windows 10] R 4.0에서 Tensorflow 사용해보기 간단 예제 (conda)

[R][Windows 10] R 4.0에서 Tensorflow 사용해보기 간단 예제 (conda)

2020. 5. 4. 19:34ㆍ분석 R/구현

광고 한 번씩 눌러주세요! 블로그 운영에 큰 힘이 됩니다 :)

2020/04/29 - [꿀팁 분석 환경 설정/Linux 관련 팁] - ubuntu18.04 에 R 4.0 설치 및 Rstudio Server 설치하기

2020/05/04 - [분석 R/구현] - [R][Windows 10] R 4.0 에서 reticulate를 사용하여 conda에서 Tensorflow 설치해보기

2020/05/04 - [분석 R/구현] - [R][Windows 10] R 4.0에서 Tensorflow 사용해보기 간단 예제 (conda)

2017년도에 올라온 글을 따라 해 봤다.

요약하자면 약간의 오류 빼고는 거의 똑같이 작동하는 것을 알 수 있었다.

http://freesearch.pe.kr/archives/4546

필자는 reticulate를 사용하여 conda 환경을 만들어서 진행해봤다.

conda 환경에서 Tensorflow 설치는 다음 글을 참고해라.

https://data-newbie.tistory.com/508

불러오는 중입니다...

conda env 접속

library(reticulate)
use_condaenv(condaenv = "tensorflow_1",required = TRUE)
py_config()

Load the R Library

library(Matrix)
library(caret)
library(pROC)
library(data.table)
library(mlbench)

기존 파이썬 환경과 Rstudio에서 했을 때의 차이점은 다음과 같다.

아직 사용한 지 얼마 안되서 그런지 몰라도 느낀 점은 파이썬 환경과 비교해서 사용하기에 크게 위화감이 없다는 것이다.

shape을 만들 때 list 사용
None은 NULL로 표현
tf. 이 tf$으로 해야 한다는 것

tf$set_random_seed(1)

data(Sonar, package="mlbench")

Sonar[,61] = as.numeric(Sonar[,61])-1

#scale ,centering으로 변수 전처리 수행 on caret
sonar_scaled <- predict(preProcess(Sonar[,-61]), Sonar[,-61])

#학습셋과 테스트셋 분류 
train.ind<- createDataPartition(Sonar[,61],p = 0.7, list = F)

train.x <- data.matrix(sonar_scaled[train.ind[,1],])
train.y <- Sonar[train.ind, 61]
test.x <- data.matrix(sonar_scaled[-train.ind[,1],])
test.y <- Sonar[-train.ind, 61]


num_x <- as.integer(ncol(train.x))
num_y <- 1L
tf$reset_default_graph()
x <- tf$placeholder(dtype = tf$float32, shape = list(NULL, num_x))
y <- tf$placeholder(dtype = tf$float32, shape = list(NULL, num_y))

Build the Graph

with(tf$name_scope("DNN"),{ 
  fc1 <- tf$contrib$layers$fully_connected(x,10L,
                                           activation_fn=tf$nn$selu, 
                                           weights_initializer=tf$contrib$layers$xavier_initializer(uniform = F))
  pred <- tf$contrib$layers$fully_connected(fc1, num_y,activation_fn = NULL,
                                            weights_initializer=tf$contrib$layers$xavier_initializer(uniform = F))
})
with({tf$name_scope("loss");tf$device('/cpu:0')},{
  loss <- tf$reduce_mean(tf$nn$sigmoid_cross_entropy_with_logits(logits = pred, labels = y))
  optimizer <- tf$train$AdamOptimizer(learning_rate=0.01)$minimize(loss)
})

with(tf$name_scope("metric"), {
  tf_bool = tf$greater_equal(x = tf$sigmoid(pred),y= tf$constant(c(.5)) )
  compare_pred <- tf$cast(tf_bool, 
                          tf$float32)
  accuracy <- tf$reduce_mean(tf$cast(tf$equal(compare_pred, y),tf$float32))
})

Train

# training 
num_of_epoc <- 100L


sess <- tf$Session()

sess$run(tf$global_variables_initializer())

n_batch <- 5

aucs <- list()
for(i in 1:num_of_epoc){
  #print(sprintf("epoc %d", i))
  
  folds <- createFolds(train.y, k = n_batch)
  for(j in 1:n_batch){
    sess$run(optimizer, dict(x=train.x[folds[[j]],], y=matrix(train.y[folds[[j]]])))
  }
  
  accu <- sess$run(tf$sigmoid(pred), dict(x=test.x))
  accur <- sess$run(accuracy, dict(x=test.x, y=matrix(test.y)))
  
  aucs[[i]]<- data.table(epoc=i,aucs=as.numeric(auc(roc(test.y, accu[,1]))),  accuracy=accur)
}

dt_aucs <- rbindlist(aucs)

knitr::kable(dt_aucs[order(-aucs)][1:5])

R에서는 Python에 비해 쉽게 시각화해줄 수 있는 ggplot2가 있어서 서로의 장점을 혼합해서 사용하면 유용할 것 같다.

library(ggplot2)
dt_aucs <- rbindlist(aucs)
ggplot(data = dt_aucs , aes(x=epoc)) + geom_line(aes(y=aucs,color="red")) + 
  geom_line(aes(y=accuracy, color="blue")) +
  labs(title = "Learning Curve", x= "Epoch", y = "Performance") +
  scale_color_discrete(name = "Y series", labels = c("AUC", "ACCURACY"))

result_melt = gather(dt_aucs,key="metric",value="value",-epoc)
ggplot(data=result_melt , aes(x=epoc,y=value)) + geom_line(aes(color=metric)) + 
  labs(title = "Learning Curve", x= "Epoch", y = "Performance")+
  scale_color_discrete(name = "Metric", labels = c("AUC", "ACCURACY")) + 
  theme_classic()+
  theme(legend.position = c(0.7, 0.2),
        legend.direction = "horizontal")

추가로 tf.dataset 을 사용한 글도 찾게 되었다. 추후에 기회가 되면 진행하려고 한다.

https://tensorflow.rstudio.com/guide/tfdatasets/introduction/

TensorFlow for R

To create a dataset, use one of the dataset creation functions. Dataset can be created from delimted text files, TFRecords files, as well as from in-memory data. Text Files For example, to create a dataset from a text file, first create a specification for

tensorflow.rstudio.com

저작자표시

'분석 R > 구현' 카테고리의 다른 글

R 패키지 설치 여부 체크 후, 필요시 설치 후 라이브러리 불러들이기 (0)	2020.06.13
[R][Windows 10] R 4.0 에서 reticulate를 사용하여 conda에서 Tensorflow 설치해보기 (0)	2020.05.04
[ R ] RandomForest tree visualization (0)	2019.10.29
[R] ggplot Geom_Bar Tips (0)	2019.10.20
[R] ggwordcloud example (0)	2019.09.29

All I Need Is Data.