Bong Kyo Seo Bong Kyo Seo -4 years ago 168
Python Question

Python - pickup data based on a list of values or conditions

I have a data set that has 9 columns, and I managed to extract two of the columns using pandas (Thank you Stack members for your help before!). Now, my question is: I have a list of values that will be used to pickup from the data set and extract the corresponding values. The extracted data set looks like:

Exp. m/z Intensity
1000 2000
2000 3000
3000 4000
4000 5000


and so on (there are about 500+ rows for each data set). The list used for pickup looks like:

mass
1200
1300


and so on (about 200 rows for the pickup list). Each mass value will be used to calculate the upper and lower bins, and they will be used to pickup the Exp. m/z values from the data set. So, for example, mass 1200 will be calculated as 1250 (upper) and 1150 (lower), and whatever falls within this range from the data set will be picked up, and their corresponding Intensity values are what I want. If none are picked up, I want the result to be empty value if possible as I believe 0 values affect average and other statistical analysis.

Below is my code, where file is the data set and pickupfile is the pickup list:

from pandas import DataFrame

import pandas as pd
import numpy as np

file = 'C09.xls'
pickupfile = 'pickuplist.xlsx'

xl = pd.ExcelFile(file)
pl = pd.ExcelFile(pickupfile)

plist = pd.read_excel(xl)
pickuplist = pd.read_excel(pl)

cmass = plist['Exp. m/z']
height = plist['Intensity']


plistcollect = pd.concat([cmass, height], axis=1)


ppm = 150

peak1upper = round(pickuplist*(1+ppm/1000000),4)

peak1lower = round(pickuplist*(1-ppm/1000000),4)

pickup = plistcollect[((plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper))]
print(pickup['Intensity'])


When I execute this code, I get an error message: ValueError: Boolean array expected for the condition, not float64. I do understand my data set are floating values, so do you guys have some hints or tips/recommendations?

Thank you so much!!

EDIT:
Forgot to mention that my data and the two limits (peak1lower & peak1upper) are float64 types.

I also tried the pickup with isin like:

pickup = plistcollect[plistcollect.isin(np.arange(peak1lower,peak1upper))]

Answer Source

If you have more than one condition while indexing a dataframe, all the conditions have be within another bracket together.

plistcollect[(plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper)]

should be

plistcollect[((plistcollect['Exp. m/z']>peak1lower) & (plistcollect['Exp. m/z'] < peak1upper))]

Edit: Since you need to perform it on every element, you have to do something like this:

limit_df = pd.DataFrame([peak1lower['Exp. m/z'],peak1upper['Exp. m/z']], index=['lower','upper']).T
filtered_df = limit_df.apply(lambda x: ((plistcollect['Exp. m/z'] > x.lower) & (plistcollect['Exp. m/z'] < x.upper)), axis=1)

filtered_df will give you a boolean DataFrame, each row will have True, False corresponding to DataFrame entry that fall within the given element of mass list.

The simpler way can be to save the file individually:

def filter_df(x):
    plistcollect[((plistcollect['Exp. m/z'] > x.lower) & (plistcollect['Exp. m/z'] < x.upper))].to_csv("test_%s.csv"%x.name)

limit_df.apply(lambda x: filter_df(x), axis=1)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download