Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 40 elements

I was trying my hands on the Kaggle competition "Sentiment Analysis on Movie Reviews" and was stuck at the first step itself. The Data files are in ".tsv" format, which is tab delimited and I tried "read.table" to load the files and got the below error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 40 elements
The error says line1 did not have 40 elements which is true. The line1 is the header  and have "PhraseId", "SentenceId","Phrase" and "Sentiment". 
Usually, if I am stuck in R, the first thing I do is to read the help file for that command. I ran the below command:
?read.table
The help file says read.table() "reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file". Well, that's what I thought and that is the reason I am trying to use this function.I skipped a few lines and continued reading further and got another important information "read.delim and read.delim2 are for reading delimited files, defaulting to the TAB character for the delimiter". That is useful! I used the below statement and it worked as magic!
train
Still, something was bothering me. What was the error for and why it was expecting 40 elements in the first line? I re-read the help file and the answer was there itself. It says "The number of data columns is determined by looking at the first five lines of input (or the whole input if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary". I tried to pass the argument fill = true as below  in read.trable().
train_data
The above command worked and created the data frame train_data, but with the below warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string
Now, how to get rid of this warning message? I was searching if the help file is telling anything about quoted string and got this information "To disable quoting altogether, use quote = ""." So, I tried the below read.table() command and it worked without any error or warning. yay!
train_data
                         sep = "\t", fill = TRUE,
                         quote = "")
All the code that worked and that didn't work is below:



This post first appeared on What The Data Says, please read the originial post: here

Share the post

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 40 elements

×

Subscribe to What The Data Says

Get updates delivered right to your inbox!

Thank you for your subscription

×