Difference between revisions of "R: read CSV"

From OnnoWiki
Jump to navigation Jump to search
(Created page with " 9 down vote favorite 9 I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want t...")
 
 
Line 8: Line 8:
 
Originally I did the following:
 
Originally I did the following:
  
fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")
+
fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")
  
 
This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.
 
This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.
Line 18: Line 18:
 
Here's a complete workflow to get what you want:
 
Here's a complete workflow to get what you want:
  
# change this file location to suit your machine
+
# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
+
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
+
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
+
x <- read.csv(file_loc, header = TRUE)
require(tm)
+
require(tm)
corp <- Corpus(DataframeSource(x))
+
corp <- Corpus(DataframeSource(x))
dtm <- DocumentTermMatrix(corp)
+
dtm <- DocumentTermMatrix(corp)
  
 
In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.
 
In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.

Latest revision as of 17:48, 8 May 2024

9

down vote favorite 9

I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different document within the corpus, so that I can compare the feedback in a DocTerms Matrix. There are over 10,000 rows in my data set.

Originally I did the following:

fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")

This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.



Here's a complete workflow to get what you want:

# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
dtm <- DocumentTermMatrix(corp)

In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.