R: wordcloud

From OnnoWiki
Revision as of 11:08, 1 November 2018 by Onnowpurbo (talk | contribs) (Created page with "The 5 main steps to create word clouds in R # Create a text file # Install and load the required packages # Text mining # Build a term-document matrix # Generate the Word clo...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The 5 main steps to create word clouds in R

  1. Create a text file
  2. Install and load the required packages
  3. Text mining
  4. Build a term-document matrix
  5. Generate the Word cloud


Install Packages

# Install
install.packages("tm")  # for text mining
install.packages("SnowballC") # for text stemming
install.packages("wordcloud") # word-cloud generator 
install.packages("RColorBrewer") # color palettes
# Load
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

Text Mining

# dari file
text <- readLines(file.choose())
text <- read.delim("out.txt")
# Read the text file from internet
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
text <- readLines(filePath)
# Load the data as a corpus
docs <- Corpus(VectorSource(text))
inspect(docs)

Clean up

# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2")) 
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
# docs <- tm_map(docs, stemDocument)