RomainD RomainD - 1 year ago 45
Python Question

Pandas changing cell values based on another cell

I am currently formatting data from two different data sets.
One of the dataset reflects an observation count of people in room on hour basis, the second one is a count of people based on wifi logs generated in 5 minutes interval.

After merging these two dataframes into one, I run into the issue where each hour (as "10:00:00") has the data from the original set, but the other data (every 5min like "10:47:14") does not include this data.

Here is how the merge dataframe looks:

room time con auth capacity % Count module size
0 B002 Mon Nov 02 10:32:06 23 23 90 NaN NaN NaN NaN`
1 B002 Mon Nov 02 10:37:10 25 25 90 NaN NaN NaN NaN`
12527 B002 Mon Nov 02 10:00:00 NaN NaN 90 50% 45.0 COMP30520 60`
12528 B002 Mon Nov 02 11:00:00 NaN NaN 90 0% 0.0 COMP30520 60`

Is there a way for me to go through the dataframe and find all the information regarding the "occupancy", "occupancyCount", "module" and "size" from 11:00:00 and write it to all the cells that are of the same day and where the hour is between 10:00:00 and 10:59:59?

That would allow me to have all the information on each row and then allow me to gather the
based on 'day' and 'hour'.

To answer the comment for the original dataframes, here there are:

first dataframe:

time room module size
0 Mon Nov 02 09:00:00 B002 COMP30190 29
1 Mon Nov 02 10:00:00 B002 COMP40660 53

second dataframe:

room time con auth capacity % Count
0 B002 Mon Nov 02 20:32:06 0 0 NaN NaN NaN
1 B002 Mon Nov 02 20:37:10 0 0 NaN NaN NaN
2 B002 Mon Nov 02 20:42:12 0 0 NaN NaN NaN
12797 B008 Wed Nov 11 13:00:00 NaN NaN 40 25 10.0
12798 B008 Wed Nov 11 14:00:00 NaN NaN 40 50 20.0
12799 B008 Wed Nov 11 15:00:00 NaN NaN 40 25 10.0

this is how these two dataframes were merged together:

DFinal = pd.merge(DF, d3, left_on=["room", "time"], right_on=["room", "time"], how="outer", left_index=False, right_index=False)

Any help with this would be greatly appreciated.

Thanks a lot,


Answer Source

Actually I was able to fix this by:

First: using partition on "time" feature in order to generate two additional columns, one for the day showed in "time" and one for the hour in the "time" column. I used the lambda functions to get these columns:

df['date'] = df['date'].map(lambda x: x[10:-6])
df['time'] = df['time'].map(lambda x: x[8:-8])

Based on these two new columns I modified the way the dataframes were being merged.

here is the code I used to fix it:

dataframeFinal = pd.merge(dataframe1, dataframe2, left_on=["room", "date", "hour"],
                right_on=["room", "date", "hour"], how="outer",
                left_index=False, right_index=False, copy=False)

After this merge I ended up having duplicate time columns ('time_y' and "time_x').
So I replaced the NaN values as follows:

dataframeFinal.time_y.fillna(dataframeFinal.time_x, inplace=True)

Now the column "time_y" contains all the time values, no more NaN. I do not need the "time_x" column so I drop it from the dataframe

dataframeFinal = dataframeFinal.drop('time_x', axis=1)