duckertito duckertito - 11 days ago 8
Python Question

Making apply() function's code applicable to 2GB

I have approximately 2GB of data and I want to create some new columns based on analyzing the values of other columns. In particular I have the following code that works fine on a smaller data set, but fails all the time when applied to 2G (memory error).

Is it possible to replace this code by some other one that would be more efficient and would require less RAM?

def calculate(row):
features = [111,222,333,444,555]
if row['C_1'] in features:
return 1
if row['C_2'] in features:
return 1
if row['C_3'] in features:
return 1
if row['C_4'] in features:
return 1
if row['C_5'] in features:
return 1
return 0
result["NEW_COL"] = result.apply (lambda row: calculate(row),axis=1)

Answer

Other than processing in parts, something like the following may be more efficient:

features = (111, 222, 333, 444, 555)
t = result.isin(features)
result['NEW_COL'] = t['C_1'] | t['C_2'] | t['C_3'] | t['C_4'] | t['C_5']