user2643394 - 4 months ago 33

Python Question

Variations of this question have been asked before, I'm still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that I'd like to set.

In R, what I'm trying to do is:

`df[which(df[,colnumber] > somenumberIchoose),]`

The which() function finds indices of row entries in a column in the dataframe which are greater than somenumberIchoose, and returns this as a vector. Then, I slice the dataframe by using these row indices to indicate which rows of the dataframe I would like to look at in the new form.

Is there an equivalent way to do this in python? I've seen references to enumerate, which I don't fully understand after reading the documentation. My sample in order to get the row indices right now looks like this:

`indexfuture = [ x.index(), x in enumerate(df['colname']) if x > yesterday]`

However, I keep on getting an invalid syntax error. I can hack a workaround by for looping through the values, and manually doing the search myself, but that seems extremely non-pythonic and inefficient.

What exactly does enumerate() do? What is the pythonic way of finding indices of values in a vector that fulfill desired parameters?

Note: I'm using Pandas for the dataframes

Answer

I may not understand clearly the question, but it looks like the response is easier than what you think:

using pandas DataFrame:

```
df['colname'] > somenumberIchoose
```

returns a pandas series with True / False values and the original index of the DataFrame.

Then you can use that boolean series on the original DataFrame and get the subset you are looking for:

```
df[df['colname'] > somenumberIchoose]
```

should be enough.

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing