Abhishek Choudhary Abhishek Choudhary - 3 months ago 272x
Python Question

how to change a Dataframe column from String type to Double type in pyspark

I have a dataframe with column as String.
I wanted to change the column type to Double type in pyspark.

Following is the way, I did,-

toDoublefunc = UserDefinedFunction(lambda x: x,DoubleType())
changedTypedf = joindf.withColumn("label",toDoublefunc(joindf['show']))

Just wanted to know , is this the right way to do it as while running
through Logistic Regression , I am getting some error, so I wonder ,
is this the reason for the trouble.


There is no need for an UDF here. Column already provides cast method with DataType instance:

from pyspark.sql.types.import DoubleType

changedTypedf = joindf.withColumn("label", joindf["show"].cast(DoubleType()))

or short string:

changedTypedf = joindf.withColumn("label", joindf["show"].cast("double"))