All I Need Is Data.

WordCloud
Library Load
Data Load
1차 Data Handling
2차 Data Handling
3차 Data Handling
Wordcloud 시각화

[R] ggwordcloud example

2019. 9. 29. 22:26ㆍ분석 R/구현

wordcloud.zip

0.02MB

참고자료

WordCloud

Library Load

Packages <- c("KoNLP", "dplyr", "ggplot2", "ggwordcloud", "tm")

invisible(lapply(Packages, suppressMessages(library) , character.only = TRUE ))

Data Load

data : 가사 모음집
stopword : 불용어 리스트

data <- read.table("text_data.txt",
           sep = "\t" , 
           fileEncoding = "utf-8"
           )
stopword <- read.table("stop.txt",
                   sep = "\t" , 
                   fileEncoding = "utf-8")$V1

1차 Data Handling

불용어 처리 및 명사 추출 과정 진행

text = paste(data$V1 , collapse = " " )
text = gsub("\n" , " ", text)
## 1차 stopwords
text2 = removeWords(text , stopword )
## 명사 추출
noun = extractNoun(text2)

2차 Data Handling

명사 빈도수 세기

text <- data.frame(noun) %>% 
  group_by(noun) %>% 
  summarise(n=n()) %>% arrange(desc(n))
text %>% head()

## # A tibble: 6 x 2
##   noun      n
##   <fct> <int>
## 1 너       65
## 2 나       37
## 3 말       26
## 4 사랑     25
## 5 것       23
## 6 오늘     19

3차 Data Handling

아직 처리되지 않은 불용어 추가 처리

## 2차 stopwords
notuse = text$noun %in% c("너","내","나","것" , 
                          "수","한","날" , 
                          "랄라라라랄라라라")
text = text[!notuse,]

text %>% head()

## # A tibble: 6 x 2
##   noun      n
##   <fct> <int>
## 1 말       26
## 2 사랑     25
## 3 오늘     19
## 4 생각     14
## 5 꽃       10
## 6 밤       10

Wordcloud 시각화

ggplot(
  text,
  aes(
    label = noun, size = n,
    color = n ) ) +
  geom_text_wordcloud_area(
    aes(angle = 45 * sample(-2:2, 
                            nrow(text),
                            replace = TRUE,
                            prob = replicate(5,1)
  )),
  area_corr_power = 1.1 ,
  mask = png::readPNG("mike.png"),
  rm_outside = TRUE
  ) +
  scale_size_area(max_size = 11) +
  theme_minimal() +
  scale_color_gradient(low = "red", high = "darkred")

'분석 R > 구현' 카테고리의 다른 글

[R][Windows 10] R 4.0 에서 reticulate를 사용하여 conda에서 Tensorflow 설치해보기 (0)	2020.05.04
[ R ] RandomForest tree visualization (0)	2019.10.29
[R] ggplot Geom_Bar Tips (0)	2019.10.20
Markov Chain Monte Carlo Simulation (0)	2019.03.16
[ R ] AUCROC and KS (H2O) (0)	2019.03.16

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

All I Need Is Data.

All I Need Is Data.

태그

최근글

댓글

공지사항

아카이브

참고자료

WordCloud

Library Load

Data Load

1차 Data Handling

2차 Data Handling

3차 Data Handling

Wordcloud 시각화

'분석 R > 구현' 카테고리의 다른 글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역