I have a simple task of importing a CSV and then filtering the data on date. I started with just converting the string into a date and I can't even seem to get that far. I've used some code samples from others work but it keeps failing. When I run the following, I get nothing but NULL. Dates look like this in the file: 8/29/2013 12:06. The ultimate goal here is to filter by date. Do you even need to cast the string as a date before you do that? I would assume so..
package net.massstreet.hour10
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.log4j._
import java.text._
import org.apache.spark.sql.functions._
object TempTest {
def main(args: Array[String]) {
Logger.getLogger("org").setLevel(Level.ERROR)
// Use new SparkSession interface in Spark 2.0
val spark = SparkSession
.builder
.appName("BayAreaBikeAnalysis")
.master("local[*]")
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
.getOrCreate()
//Load files into data sets
import spark.implicits._
val stations = spark.read.format("CSV").option("header","true").load("Data/station.csv")
stations.select(to_date($"installation_date")).show()
spark.stop()
}
}
Do you even need to cast the string as a date before you do that?
Nope. But, if you did, you'll need to parse 8/29/2013 12:06
yourself.
For example,
unix_timestamp($"installation_date"), "M/dd/yyyy hh:mm").cast("timestamp")