I've got a cluster on AWS where I've installed H2O, Sparkling Water and H2O Flow for Machine Learning purposes on lots of data.
Now, these files come in a JSON format from a streaming job. Let's say they are placed in S3 in a folder called
sc = SparkContext()
When running Sparkling Water you can convert RDD/DF/DS to H2O frames quite easily. Something like this (Scala, Python would look similar) should work:
val dataDF = sc.read.json('path/streamed-data') val h2oContext = H2OContext.getOrCreate(sc) import h2oContext.implicits._ val h2oFrame = h2oContext.asH2OFrame(dataDF, "my-frame-name")
From now on you can use the frame from code level and/or from FlowUI.