Ilya Ilya - 2 months ago 23
Python Question

Pandas: add column with the most recent values

I have two pandas dataframes, both index with datetime entries. The

df1
has non-unique time indices, whereas
df2
has unique ones. I would like to add a column
df2.a
to
df1
in the following way: for every row in
df1
with timestamp
ts
,
df1.a
should contain the most recent value of
df2.a
whose timestamp is less then
ts
.

For example, let's say that
df2
is sampled every minute, and there are rows with timestamps
08:00:15
,
08:00:47
,
08:02:35
in
df1
. In this case I would like the value from
df2.a[08:00:00]
to be used for the first two rows, and
df2.a[08:02:00]
for the third. How can I do this?

Answer

apply to rows of df1, reindex on df2 with ffill.

df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)