R: tidy text dataset - tibble

From OnnoWiki

Revision as of 09:17, 31 October 2018 by Onnowpurbo (talk | contribs) (→‎Tidy Text Novel)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Text Vector

text <- c("Because I could not stop for Death -",
          "He kindly stopped for me -",
          "The Carriage held but just Ourselves -",
          "and Immortality")
text

Tidy Text Dataset

install.packages("dplyr")
library(dplyr)
text_df <- data_frame(line = 1:4, text = text)
text_df

Tidy Text Novel

library(janeaustenr)
library(dplyr)
library(stringr)
original_books <- austen_books() %>%
   group_by(book) %>%
   mutate(linenumber = row_number(),
          chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                            ignore_case = TRUE)))) %>%
   ungroup()
original_books

Buat menjadi one-token-per-row

library(tidytext)
tidy_books <- original_books %>%
              unnest_tokens(word, text)
tidy_books

Pranala Menarik

R

Retrieved from "https://onnocenter.or.id/wiki/index.php?title=R:_tidy_text_dataset_-_tibble&oldid=52374"