Difference between revisions of "R: read CSV"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) (Created page with " 9 down vote favorite 9 I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want t...") |
Onnowpurbo (talk | contribs) |
||
Line 8: | Line 8: | ||
Originally I did the following: | Originally I did the following: | ||
− | fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t") | + | fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t") |
This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each. | This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each. | ||
Line 18: | Line 18: | ||
Here's a complete workflow to get what you want: | Here's a complete workflow to get what you want: | ||
− | # change this file location to suit your machine | + | # change this file location to suit your machine |
− | file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv" | + | file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv" |
− | # change TRUE to FALSE if you have no column headings in the CSV | + | # change TRUE to FALSE if you have no column headings in the CSV |
− | x <- read.csv(file_loc, header = TRUE) | + | x <- read.csv(file_loc, header = TRUE) |
− | require(tm) | + | require(tm) |
− | corp <- Corpus(DataframeSource(x)) | + | corp <- Corpus(DataframeSource(x)) |
− | dtm <- DocumentTermMatrix(corp) | + | dtm <- DocumentTermMatrix(corp) |
In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word. | In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word. |
Latest revision as of 17:48, 8 May 2024
9
down vote favorite 9
I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different document within the corpus, so that I can compare the feedback in a DocTerms Matrix. There are over 10,000 rows in my data set.
Originally I did the following:
fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")
This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.
Here's a complete workflow to get what you want:
# change this file location to suit your machine file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv" # change TRUE to FALSE if you have no column headings in the CSV x <- read.csv(file_loc, header = TRUE) require(tm) corp <- Corpus(DataframeSource(x)) dtm <- DocumentTermMatrix(corp)
In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.