Dhananjay Dhananjay - 4 months ago 15
SQL Question

Load Date Column from raw data with '/' separator in sparkSQL

I have columns with data type DATE in sparkSQL
e.g.

CREATE TABLE ABC(startDate DATE, EndDate DATE....


and I load data as
LOAD DATA INPATH './input/user.txt' INTO TABLE ABC



In user.txt data is like


2016/06/12 2016/06/15
2016/06/12 2016/06/15


but it loads data as


null null
null null


if it's

2016-06-12 2016-06-15
2016-06-12 2016-06-15


then it takes the data correctly.

How to handle data when the date separator is '/ '?

I don't want to replace the separator in input file.

Please help me. Thanks.

Answer

I faced this issue before in Hive. I found a workaround for this. First load them as string instead of Data type DATE

ex:

CREATE TABLE ABC(startDate string, EndDate string....)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ']'
STORED AS TEXTFILE
LOCATION './input/user.txt';

Then i used string functions to extract date/month/year from above fields. For example

select substr(date,1,4) as year,substr(date,6,1) as month .... from ABC

one other way is to replace the '/' with '-' and then cast them as DATE type and use Date functions

example

select regexp_replace(startDate,'/','-') from ABC

All the above is how to achieve it in Hive. To work on this in spark is also to first load them as string in to a dataframe.

val s1 = Seq(("2016/06/12", "2016/06/15" ), ("2016/06/12", "2016/06/15")).toDF("x", "y")
val result = s1.select(regexp_replace($"x","/", "-"),regexp_replace($"y","/", "-")).show()

result 
+----------+----------+
| startDate|   EndDate|
+----------+----------+
|2016-06-12|2016-06-15|
|2016-06-12|2016-06-15|
+----------+----------+

Hope this helps.