data_garden data_garden - 1 month ago 12
Python Question

Pandas - Empty Dataframe

I am trying to load the following data into my

pandas
dataframe
:

jsons_data = pd.DataFrame(columns=['playlist', 'user', 'track', 'count'])

for index, js in enumerate(json_files):
with open(os.path.join(path_to_json, js)) as json_file:
json_text = json.load(json_file)
#my json layout
user = json_text.keys()
playlist = 'all_playlists'
track = [p for p in json_text.values()[0]]
count = [p.values() for p in json_text.values()]
print jsons_data


but I get an
empty dataframe
:

[u'user1']
all_playlists
[{u'Make You Feel My Love': 1.0, u'I See Fire': 1.0, u'High And Dry': 1.0, u'Fake Plastic Trees': 1.0, u'One': 1.0, u'Goodbye My Lover': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]
[u'user2']
all_playlists
[{u'Codex': 1.0, u'No Surprises': 1.0, u'O': 1.0, u'Go It Alone': 1.0}]
[[1.0, 1.0, 1.0, 1.0]]
[u'user3']
all_playlists
[{u'Fake Plastic Trees': 1.0, u'High And Dry': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0]]
[u'user4']
all_playlists
[{u'No Distance Left To Run': 1.0, u'Running Up That Hill': 1.0, u'Fake Plastic Trees': 1.0, u'The Numbers': 1.0, u'No Surprises': 1.0}]
[[1.0, 1.0, 1.0, 1.0, 1.0]]
[u'user5']
all_playlists
[{u'Wild Wood': 1.0, u'You Do Something To Me': 1.0, u'Reprise': 1.0}]
[[1.0, 1.0, 1.0]]
Empty DataFrame
Columns: [playlist, user, track, count]
Index: []


what is wrong with the code?

EDIT:

json
files are structured in this fashion:

{
'user1':{
'Karma Police':1.0,
'Roxanne':1.0,
'Sonnet':1.0,
'We Will Rock You':1.0,
}}

Answer

Okay, first let's start by making some dummy data to play with that will make the comprehension of this problem much easier:

# Dummy data to play with
data1 = {
'user1':{
    'Karma Police':1.0,
    'Roxanne':1.0,
    'Sonnet':1.0,
    'We Will Rock You':1.0,
    }
}

data2 = {
'user2':{
    'Karma Police':1.0,
    'Creep':1.0,
    }
}

Let me illustrate something we'll use below:

In : pd.DataFrame(data1).unstack()

Out:
user1  Karma Police        1.0
       Roxanne             1.0
       Sonnet              1.0
       We Will Rock You    1.0
dtype: float64

# This is where you would normally iterate on the files
mylist = []
for data in [data1, data2]:
    # Make a dataframe then unstack,
    # producing a series with a 2-multiindex as above
    # And append it to the lsit
    mylist.append(pd.DataFrame(data).unstack())

Now we concat that list, and do a little bit of cleaning up

merged = pd.concat(mylist)
# Renaming to get the right column names
merged.index.names = ['User', 'Track']
merged.name = 'Count'
# Transpose to a dataframe instead of a Series
merged = merged.to_frame()
# Adding a new column with the same value throughout
merged['Playlist'] = 'all_playlists'


merged

Out:

Output

You could then call reset_index if you don't like it this way.