Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Error: inherits(x, "Source") is not TRUE in R

Text Analytics is interesting but challenging. I started with a simple goal to create a "WordCloud" using R. I thought of using the datasets from the Kaggle competition "Sentiment Analysis on Movie Reviews". But I got challenged at each and every step. 
First, I got an Error while loading the .tsv files. The details are here. I resolved that issue and finally loaded the required library for Text Mining "tm". Below is the code to load the training dataset.
Next, I learned that I have to create a Corpus first because "The main structure for managing documents in tm is a so-called Corpus, representing a collection of text documents". So, I used the below code and got an error.
movies_corpus
The error is as below:
Error: inherits(x, "Source") is not TRUE
I was not very clear about the concept of Corpus and then an error. Some investigation is now mandatory!
What is a Corpus?
"Corpus" is a collection of text documents. The function corpus() is a convenience alias to SimpleCorpus or VCorpus depending on the arguments provided.
A SimpleCorpus is fully kept in memory and it supports only the DataframeSource, DirSource and VectorSource.
A VCorpus means "Volatile" corpus which implies that the corpus is stored in memory and would be gone when the R object containing it is destroyed. 
The syntax for creating such a corpus is as below:
VCorpus(x, readerControl). 
x: 
a Source object which abstracts the input location.tm provides a set of predefined source.  getSources() lists the available sources, and users can create their own sources.VectorSource is for character vector only.
readerControl: 
a list of the named components of reader and language. Again tm provides a set of predefined readers and getReaders() lists the up-to-date list of available readers.Each source has a default reader which can be overridden.
Now, coming back to my error it says "inherits(x, "Source") is not TRUE". It is something about the Source argument. Since I am passing character values, let me try the below code:
movies_corpus
moview_corpus
It worked! So, the above code created a SimpleCorpus of 156060 documents.
There is a lot more information about the tm package here.


This post first appeared on What The Data Says, please read the originial post: here

Share the post

Error: inherits(x, "Source") is not TRUE in R

×

Subscribe to What The Data Says

Get updates delivered right to your inbox!

Thank you for your subscription

×