Mike Williamson Mike Williamson - 2 months ago 6
R Question

Python list or pandas dataframe arbitrary indexing and slicing

I have used both R and Python extensively in my work, and at times I get the syntax between them confused.

In R, if I wanted to create a model from only some features of my data set, I can do something like this:

subset = df[1:1000, c(1,5,14:18,24)]


This would take the first 1000 rows (yes, R starts on index 1), and it would take the 1st, 5th, 14th through 18th, and 24th columns.

I have tried to do any combination of
slice
,
range
, and similar sorts of functions, and have not been able to duplicate this sort of flexibility. In the end, I just enumerated all of the values.

How can this be done in Python?


Pick an arbitrary subset of elements from a list, some of which are selected individually (as in the commas shown above) and some selected sequentially (as in the colons shown above)?

Answer

In a file of index_tricks, numpy defines a class instance that converts a scalars and slices into an enumerated list, using the r_ method:

In [560]: np.r_[1,5,14:18,24]
Out[560]: array([ 1,  5, 14, 15, 16, 17, 24])

It's an instance with a __getitem__ method, so it uses the indexing syntax. It expands 14:18 into np.arange(14,18). It can also expand values with linspace.

So I think you'd rewrite

subset = df[1:1000, c(1,5,14:18,24)]

as

df.iloc[:1000, np.r_[0,4,13:17,23]]