Mike Williamson - 8 months ago 43

R Question

I have used both R and Python extensively in my work, and at times I get the syntax between them confused.

In R, if I wanted to create a model from only ** some** features of my data set, I can do something like this:

`subset = df[1:1000, c(1,5,14:18,24)]`

This would take the first 1000 rows (yes, R starts on index 1), and it would take the 1st, 5th, 14th

I have tried to do any combination of

`slice`

`range`

How can this be done in Python?

Pick an arbitrary subset of elements from a list, some of which are selected individually (as in the commas shown above) and some selected sequentially (as in the colons shown above)?

Answer

In a file of `index_tricks`

, `numpy`

defines a class instance that converts a scalars and slices into an enumerated list, using the `r_`

method:

```
In [560]: np.r_[1,5,14:18,24]
Out[560]: array([ 1, 5, 14, 15, 16, 17, 24])
```

It's an instance with a `__getitem__`

method, so it uses the indexing syntax. It expands `14:18`

into `np.arange(14,18)`

. It can also expand values with `linspace`

.

So I think you'd rewrite

```
subset = df[1:1000, c(1,5,14:18,24)]
```

as

```
df.iloc[:1000, np.r_[0,4,13:17,23]]
```