bobo32 bobo32 - 1 month ago 19
Scala Question

SparkSQL Timestamp query failure

I put some log files into sql tables through Spark and my schema looks like this:

|-- timestamp: timestamp (nullable = true)
|-- c_ip: string (nullable = true)
|-- cs_username: string (nullable = true)
|-- s_ip: string (nullable = true)
|-- s_port: string (nullable = true)
|-- cs_method: string (nullable = true)
|-- cs_uri_stem: string (nullable = true)
|-- cs_query: string (nullable = true)
|-- sc_status: integer (nullable = false)
|-- sc_bytes: integer (nullable = false)
|-- cs_bytes: integer (nullable = false)
|-- time_taken: integer (nullable = false)
|-- User_Agent: string (nullable = true)
|-- Referrer: string (nullable = true)


As you can notice I created a timestamp field which I read is supported by Spark (Date wouldn't work as far as I understood). I would love to use for queries like "where timestamp>(2012-10-08 16:10:36.0)" but when I run it I keep getting errors.
I tried these 2 following sintax forms:
For the second one I parse a string so Im sure Im actually pass it in a timestamp format.
I use 2 functions: parse and date2timestamp.

Any hint on how I should handle timestamp values?

Thanks!

1)
scala> sqlContext.sql("SELECT * FROM Logs as l where l.timestamp=(2012-10-08 16:10:36.0)").collect

java.lang.RuntimeException: [1.55] failure: ``)'' expected but 16 found

SELECT * FROM Logs as l where l.timestamp=(2012-10-08 16:10:36.0)
^


2)
sqlContext.sql("SELECT * FROM Logs as l where l.timestamp="+date2timestamp(formatTime3.parse("2012-10-08 16:10:36.0"))).collect

java.lang.RuntimeException: [1.54] failure: ``UNION'' expected but 16 found

SELECT * FROM Logs as l where l.timestamp=2012-10-08 16:10:36.0
^

Answer

I figured that the problem was the precision of the timestamp first of all and also the string that I pass representing the timestamp has to be casted as a String

So this query works now:

sqlContext.sql("SELECT * FROM Logs as l where cast(l.timestampLog as String) <= '2012-10-08 16:10:36'")