Bob Wakefield Bob Wakefield - 8 months ago 74
Scala Question

When importing a CSV with Scala in Spark, date shows as null

I have a simple task of importing a CSV and then filtering the data on date. I started with just converting the string into a date and I can't even seem to get that far. I've used some code samples from others work but it keeps failing. When I run the following, I get nothing but NULL. Dates look like this in the file: 8/29/2013 12:06. The ultimate goal here is to filter by date. Do you even need to cast the string as a date before you do that? I would assume so..

package net.massstreet.hour10

import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.log4j._
import java.text._
import org.apache.spark.sql.functions._


object TempTest {



def main(args: Array[String]) {

Logger.getLogger("org").setLevel(Level.ERROR)

// Use new SparkSession interface in Spark 2.0
val spark = SparkSession
.builder
.appName("BayAreaBikeAnalysis")
.master("local[*]")
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
.getOrCreate()

//Load files into data sets
import spark.implicits._
val stations = spark.read.format("CSV").option("header","true").load("Data/station.csv")
stations.select(to_date($"installation_date")).show()
spark.stop()
}



}

Answer Source

Do you even need to cast the string as a date before you do that?

Nope. But, if you did, you'll need to parse 8/29/2013 12:06 yourself.

For example,

unix_timestamp($"installation_date"), "M/dd/yyyy hh:mm").cast("timestamp")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download