MBack MBack - 1 month ago 5
Python Question

what should be the argument for sqlContext.createDataFrame()?

This code is creating dataframe from given lists:

sample_one = [(0, 'mouse'), (1, 'black')]
sample_two = [(0, 'cat'), (1, 'tabby'), (2, 'mouse')]
sample_three = [(0, 'bear'), (1, 'black'), (2, 'salmon')]
sample_data_df = sqlContext.createDataFrame([(sample_one,), (sample_two,),(sample_three,)], ['features'])


In createDataFrame() , why extra comma is given after sample_one(sample_one,)?

Answer

This syntax is to create a tuple. You can try the following:

>>> sample_one = [(0, 'mouse'), (1, 'black')]
>>> type((sample_one))
<type 'list'>
>>> type((sample_one,))
<type 'tuple'>
Comments