iboboboru iboboboru -4 years ago 98
R Question

Error loading data with too many levels/categories h2o.importFile()

I am trying to import a large .csv file using h2o.importfile in R

library(h2o)
h2o.init()
dataFile <- "big_file.csv"
h2o.importFile(dataFile,header=TRUE,destination_frame = "data.hex")


The file has a number of id columns. I get the following error message.

Error: water.parser.ParseDataset$H2OParseException: Exceeded categorical limit on columns [id1, id2]. Consider reparsing these columns as a string.

Is there way to specify these colum types to be strings similar to data.frame(stringAsFactors = FALSE)

Answer Source

Specifying the col.types argument in h2o.importFile function should work for you.

write.csv(iris, "iris.csv")
hf0 <- h2o.importFile("iris.csv", col.types = c("int","real","real","real","real","string"))
unlist(h2o.getTypes(hf0))
[1] "int"    "real"   "real"   "real"   "real"   "string"
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download