piRSquared piRSquared - 2 years ago 212
Python Question

pandas .at versus .loc

I've been exploring how to optimize my code and ran across

method. Per the documentation

Fast label-based scalar accessor

Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.

So I ran some samples:


import pandas as pd
import numpy as np
from string import letters, lowercase, uppercase

lt = list(letters)
lc = list(lowercase)
uc = list(uppercase)

def gdf(rows, cols, seed=None):
"""rows and cols are what you'd pass
to pd.MultiIndex.from_product()"""
gmi = pd.MultiIndex.from_product
df = pd.DataFrame(index=gmi(rows), columns=gmi(cols))
df.iloc[:, :] = np.random.rand(*df.shape)
return df

seed = [3, 1415]
df = gdf([lc, uc], [lc, uc], seed)

print df.head().T.head().T

looks like:

a A 0.444939 0.407554 0.460148 0.465239 0.462691
B 0.032746 0.485650 0.503892 0.351520 0.061569
C 0.777350 0.047677 0.250667 0.602878 0.570528
D 0.927783 0.653868 0.381103 0.959544 0.033253
E 0.191985 0.304597 0.195106 0.370921 0.631576

Lets use
and ensure I get the same thing

print "using .loc", df.loc[('a', 'A'), ('c', 'C')]
print "using .at ", df.at[('a', 'A'), ('c', 'C')]

using .loc 0.37374090276
using .at 0.37374090276

Test speed using

df.loc[('a', 'A'), ('c', 'C')]

10000 loops, best of 3: 180 µs per loop

Test speed using

df.at[('a', 'A'), ('c', 'C')]

The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8 µs per loop

This looks to be a huge speed increase. Even at the caching stage
6.11 * 8
is a lot faster than


What are the limitations of
? I'm motivated to use it. The documentation says it's similar to
but it doesn't behave similarly. Example:

# small df
sdf = gdf([lc[:2]], [uc[:2]], seed)

print sdf.loc[:, :]

a 0.444939 0.407554
b 0.460148 0.465239

where as
print sdf.at[:, :]
results in
TypeError: unhashable type

So obviously not the same even if the intent is to be similar.

That said, who can provide guidance on what can and cannot be done with the

Answer Source

df.at can only access a single value at a time.

df.loc can select multiple rows and/or columns.

Note that there is also df.get_value, which may be even quicker at accessing single values:

In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 µs per loop

In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 µs per loop

In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 µs per loop

Under the hood, df.at[...] calls df.get_value, but it also does some type checking on the keys.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download