Sean Nguyen Sean Nguyen - 2 months ago 11
Python Question

What is causing 'unicode' object has no attribute 'toordinal' in pyspark?

I got this error but I don't what causes it. My python code ran in pyspark. The stacktrace is long and i just show some of them. All the stacktrace doesn't show my code in it so I don't know where to look for. What is possible the cause for this error?

/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
306 raise Py4JJavaError(
307 "An error occurred while calling {0}{1}{2}.\n".
--> 308 format(target_id, ".", name), value)
309 else:
310 raise Py4JError(

Py4JJavaError: An error occurred while calling o107.parquet.

...
File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in toInternal
return self.dataType.toInternal(obj)
File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 172, in toInternal
return d.toordinal() - self.EPOCH_ORDINAL
AttributeError: 'unicode' object has no attribute 'toordinal'


Thanks,

Answer

The specific exception is caused by trying to store a unicode value in a date datatype that is part of a struct. The conversion of the Python type to Spark internal representation expected to be able to call date.toordinal() method.

Presumably you have a dataframe schema somewhere that consists of a struct type with a date field, and something tried to stuff a string into that.

You can trace this based on the traceback you do have. The Apache Spark source code is hosted on GitHub, and your traceback points to the pyspark/sql/types.py file. The lines point to the StructField.toInternal() method, which delegates to the self.dataType.toInternal() method:

class StructField(DataType):
    # ...
    def toInternal(self, obj):
        return self.dataType.toInternal(obj)

which in your traceback ends up at the DateType.toInternal() method:

class DateType(AtomicType):
    # ...
    def toInternal(self, d):
        if d is not None:
            return d.toordinal() - self.EPOCH_ORDINAL

So we know this is about a date field in a struct. The DateType.fromInternal() shows you what Python type is produced in the opposite direction:

def fromInternal(self, v):
    if v is not None:
        return datetime.date.fromordinal(v + self.EPOCH_ORDINAL)

It is safe to assume that toInternal() expects the same type when converting in the other direction.