Kaushik - 9 months ago 54

Python Question

I have a series

`x=pd.Series(np.random.random(16),index=[[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],['a','b','c','d','a','b','c','d','a','b','c','d','a','b','c','d']])`

that looks like this:

`1 a -0.068167`

b -1.036551

c -0.246619

d 1.318381

2 a -0.119061

b 0.249653

c 0.819153

d 1.334510

3 a 0.029305

b -0.879798

c 1.081574

d -1.590322

4 a 0.620149

b -2.197523

c 0.927573

d -0.274370

dtype: float64

What is the difference between x[1,'a'] and x[1]['a']. It gives me the same answer. I am confused as to what the difference internally means? When should I use the above two indexes?

Answer

This explanation is from the numpy docs, however I believe a similar thing is happening in pandas (which uses numpy inside, using "indexers" to provide a mapping between a (possibly) named index and the underlying integer-based index).

So note that x[0,2] = x[0][2] though the second case is less efficient as a new temporary array is created after the first index that is subsequently indexed by 2.

Here are the timings for your series; the first method is around 30 times faster:

```
In [80]: %timeit x[1, 'a']
100000 loops, best of 3: 8.46 µs per loop
In [79]: %timeit x[1]['a']
1000 loops, best of 3: 274 µs per loop
```