user1702490 user1702490 - 18 days ago 6
R Question

Reading big data in R by read.big.matrix

I am reading a data of dimension 3131875*5 in r using

read.big.matrix
. My data has both character and numeric columns including date variable. The command which I should use is

as1 <- read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt",
header=TRUE,
backingfile="session.bin",
descriptorfile="session.desc",
type = NA)


But
type = NA
is not accepted in R in this case and I am getting an error:

Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type, :
Problem creating filebacked matrix.
In addition: Warning messages:
1: In na.omit(as.integer(firstLineVals)) : NAs introduced by coercion
2: In na.omit(as.double(firstLineVals)) : NAs introduced by coercion
3: In read.big.matrix("C:/Documents and Settings/Arundhati.Mukherjee/My Documents/Arundhati/big data/MB07_Arundhati/sample2.txt", :
Because type was not specified, we chose double based on the first line of data.


I need to know what should be the
type
here. I tried with options like
double
but that is throwing me same error.

Please help me.

Answer

From ?read.big.matrix:

Files must contain only one atomic type (all integer, for example).

Therefore, you won't be able to read in data with combinations of character, numeric, integer, date, etc. You could do some work on the file, for instance using a different program to convert the character variables to integer representations (like converting to a factor in R).

EDIT:

On the bigmemory website there's an example of preprocessing data using a python script to change character information to integer. The script is written for a specific dataset, but perhaps you could use it as a guideline for your data.