Chuck Chuck - 2 years ago 98
Python Question

Concatenate Two Dataframes Pandas with single Row

[Most likely a duplicate, let me know, I'll delete]

I have a dataframe

df
that looks like:

one three two
0 1.0 10.0 4.0
1 2.0 3.0 3.0
2 3.0 22.0 2.0
3 4.0 1.0 1.0


I have another single row dataframe
df2
that looks like:

a b m u
0 1.0 2.0 1.0 4.0


I want to concatenate the two to end up with:

one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0


I tried:

df3 = pd.concat([df, df2], axis=1, ignore_index=True)

0 1 2 3 4 5 6
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 NaN NaN NaN NaN
2 3.0 22.0 2.0 NaN NaN NaN NaN
3 4.0 1.0 1.0 NaN NaN NaN NaN


Err Wrong answer...

How can I sort this out?

Many thanks.

Answer Source

I think you can use numpy.tile for repeat data:

df2 = pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)), 
                   columns=df2.columns)
print (df2)
     a    b    m    u
0  1.0  2.0  1.0  4.0
1  1.0  2.0  1.0  4.0
2  1.0  2.0  1.0  4.0
3  1.0  2.0  1.0  4.0

df3 = df.join(df2)
print (df3)
   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

Or improved John Galt solution - only replaced NaNs of columns from df2:

df3 = df.join(df2)
df3[df2.columns] = df3[df2.columns].ffill()
print (df3)
   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

Another solution with assign by Series created by iloc, but columns names has to be strings:

df3 = df.assign(**df2.iloc[0])
print (df3)
   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

Timings:

np.random.seed(44)
N = 1000000

df = pd.DataFrame(np.random.random((N,5)), columns=list('ABCDE'))

df2 = pd.DataFrame(np.random.random((1, 50)))
df2.columns = 'a' + df2.columns.astype(str)


In [369]: %timeit df.join(pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)), columns=df2.columns))
1 loop, best of 3: 897 ms per loop

In [370]: %timeit df.assign(**df2.iloc[0])
1 loop, best of 3: 467 ms per loop

In [371]: %timeit df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)
1 loop, best of 3: 1.55 s per loop

In [372]: %%timeit
     ...: df3 = df.join(df2)
     ...: df3[df2.columns] = df3[df2.columns].ffill()
     ...: 
1 loop, best of 3: 1.9 s per loop
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download