R Question

Error loading data with too many levels/categories h2o.importFile()

I am trying to import a large .csv file using h2o.importfile in R

dataFile <- "big_file.csv"
h2o.importFile(dataFile,header=TRUE,destination_frame = "data.hex")

The file has a number of id columns. I get the following error message.

Error: water.parser.ParseDataset$H2OParseException: Exceeded categorical limit on columns [id1, id2]. Consider reparsing these columns as a string.

Is there way to specify these colum types to be strings similar to data.frame(stringAsFactors = FALSE)

Answer Source

Specifying the col.types argument in h2o.importFile function should work for you.

write.csv(iris, "iris.csv")
hf0 <- h2o.importFile("iris.csv", col.types = c("int","real","real","real","real","string"))
[1] "int"    "real"   "real"   "real"   "real"   "string"
