Alter Native Alter Native - 3 years ago 165
Python Question

Pandas data frame

I have a question regarding my following code,
I have a data set and a list , I want to compare each data value of my data set with two conditions, if the condition is true then keep the previous value of the data frame otherwise make it as None, My code works perfectly for small data set however it will takes too much time and without any values for my big data set. Is there better solution?

for col in df.columns:
for i in range(len(df)):
if (df.iloc[i][col] >list_min[i] ) & (df.iloc[i][col]<list_max[i]):

thanks for comments or another solution.

This is my code that is not work :

data = pd.read_csv('./dataset/RMSSD/RMSSD_Exam_new.csv')
data = data.applymap(np.log)
data = data.drop('time', axis=1)
q75_list = []
q25_list = []
iqr_list = []
min_list = []
max_list = []
for col in data.columns.values:
q75_list.append(np.nanpercentile(data[col], 75))
q25_list.append(np.nanpercentile(data[col], 25))

iqr_list = np.array(q75_list) - np.array(q25_list)
min_list = np.array(q25_list) - (np.array(iqr_list * 1.5))
max_list = np.array(q75_list) + (np.array(iqr_list * 1.5))

print("Max :\n",max_list,"\n Min :\n",min_list)

for col in data.columns:
for (i, j) in [(i, j) for i in range(len(data)) for j in range(len(min_list))]:

if (data.iloc[i][col] >min_list[j] ) & (data.iloc[i][col]<max_list[j]):


Answer Source

If I am correctly understanding what you are doing, there are a couple places you could try to vectorize things. See if this speeds things up:

q75s = data.quantile(.75)
q25s = data.quantile(.25)
mins = 2.5*q25s - 1.5*q75s
maxs = 2.5*q75s - 1.5*q25s

newdata = data.copy()
newdata[(data < mins) | (data > maxs)] = None
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download