RomainD RomainD - 4 months ago 13
Python Question

Pandas changing cell values based on another cell

I am currently formatting data from two different data sets.
One of the dataset reflects an observation count of people in room on hour basis, the second one is a count of people based on wifi logs generated in 5 minutes interval.

After merging these two dataframes into one, I run into the issue where each hour (as "10:00:00") has the data from the original set, but the other data (every 5min like "10:47:14") does not include this data.

Here is how the merge dataframe looks:

room time con auth capacity % Count module size
0 B002 Mon Nov 02 10:32:06 23 23 90 NaN NaN NaN NaN`
1 B002 Mon Nov 02 10:37:10 25 25 90 NaN NaN NaN NaN`
12527 B002 Mon Nov 02 10:00:00 NaN NaN 90 50% 45.0 COMP30520 60`
12528 B002 Mon Nov 02 11:00:00 NaN NaN 90 0% 0.0 COMP30520 60`


Is there a way for me to go through the dataframe and find all the information regarding the "occupancy", "occupancyCount", "module" and "size" from 11:00:00 and write it to all the cells that are of the same day and where the hour is between 10:00:00 and 10:59:59?

That would allow me to have all the information on each row and then allow me to gather the
min()
,
max()
and
median()
based on 'day' and 'hour'.

To answer the comment for the original dataframes, here there are:

first dataframe:

time room module size
0 Mon Nov 02 09:00:00 B002 COMP30190 29
1 Mon Nov 02 10:00:00 B002 COMP40660 53


second dataframe:

room time con auth capacity % Count
0 B002 Mon Nov 02 20:32:06 0 0 NaN NaN NaN
1 B002 Mon Nov 02 20:37:10 0 0 NaN NaN NaN
2 B002 Mon Nov 02 20:42:12 0 0 NaN NaN NaN
12797 B008 Wed Nov 11 13:00:00 NaN NaN 40 25 10.0
12798 B008 Wed Nov 11 14:00:00 NaN NaN 40 50 20.0
12799 B008 Wed Nov 11 15:00:00 NaN NaN 40 25 10.0


this is how these two dataframes were merged together:

DFinal = pd.merge(DF, d3, left_on=["room", "time"], right_on=["room", "time"], how="outer", left_index=False, right_index=False)


Any help with this would be greatly appreciated.

Thanks a lot,

-Romain

Answer

Actually I was able to fix this by:

First: using partition on "time" feature in order to generate two additional columns, one for the day showed in "time" and one for the hour in the "time" column. I used the lambda functions to get these columns:

df['date'] = df['date'].map(lambda x: x[10:-6])
df['time'] = df['time'].map(lambda x: x[8:-8])

Based on these two new columns I modified the way the dataframes were being merged.

here is the code I used to fix it:

dataframeFinal = pd.merge(dataframe1, dataframe2, left_on=["room", "date", "hour"],
                right_on=["room", "date", "hour"], how="outer",
                left_index=False, right_index=False, copy=False)

After this merge I ended up having duplicate time columns ('time_y' and "time_x').
So I replaced the NaN values as follows:

dataframeFinal.time_y.fillna(dataframeFinal.time_x, inplace=True)

Now the column "time_y" contains all the time values, no more NaN. I do not need the "time_x" column so I drop it from the dataframe

dataframeFinal = dataframeFinal.drop('time_x', axis=1)