Bob Wakefield Bob Wakefield - 3 years ago 215
Scala Question

When importing a CSV with Scala in Spark, date shows as null

I have a simple task of importing a CSV and then filtering the data on date. I started with just converting the string into a date and I can't even seem to get that far. I've used some code samples from others work but it keeps failing. When I run the following, I get nothing but NULL. Dates look like this in the file: 8/29/2013 12:06. The ultimate goal here is to filter by date. Do you even need to cast the string as a date before you do that? I would assume so..

package net.massstreet.hour10

import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.log4j._
import java.text._
import org.apache.spark.sql.functions._

object TempTest {

def main(args: Array[String]) {


// Use new SparkSession interface in Spark 2.0
val spark = SparkSession
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.

//Load files into data sets
import spark.implicits._
val stations ="CSV").option("header","true").load("Data/station.csv")$"installation_date")).show()


Answer Source

Do you even need to cast the string as a date before you do that?

Nope. But, if you did, you'll need to parse 8/29/2013 12:06 yourself.

For example,

unix_timestamp($"installation_date"), "M/dd/yyyy hh:mm").cast("timestamp")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download