meepl meepl - 2 months ago 6
Python Question

Chaining functions for cleaner code

I have the following functions that I would like to be able to chain together for usage to have cleaner code:

def label_encoder(dataframe, column):
"""
Encodes categorical variables
"""
le = preprocessing.LabelEncoder()
le.fit(dataframe[column])
dataframe[column] = le.transform(dataframe[column])
return dataframe

def remove_na_and_inf(dataframe):
"""
Removes rows containing NaNs, inf or -inf from dataframes
"""
dataframe.replace([np.inf, -np.inf], np.nan, inplace=True).dropna(how="all", inplace=True)
return dataframe

def create_share_reate_vars(dataframe):
"""
Generate share rate to use as interaction var
"""
for interval in range(300, 3900, 300):
interval = str(interval)
dataframe[interval + '_share_rate'] = dataframe[interval + '_shares'] / dataframe[interval + '_video_views']
return dataframe

def generate_logged_values(dataframe):
"""
Generate logged values for all features which can be logged
"""
columns = list(dataframe.columns)

for feature in columns:
try:
dataframe[str(feature + '_log')] = np.log(dataframe[feature])
except AttributeError:
continue
return dataframe


I would like to do something like this:

new_df = reduce(lambda x, y: y(x), reversed([label_encoder, remove_na_and_inf, create_share_reate_vars, generate_logged_values]), df)


but since the first function takes two arguments this will not work. Any solutions to this, or maybe a completely different paradigm?

Answer

You could partially evaluate label_encoder first using functools.partial, and then use that version to parse to your lambda. E.g.

from functools import partial
tmp_encoder = partial(label_encoder, column=2)
new_df = reduce(lambda x, y: y(x), reversed([tmp_encoder, remove_na_and_inf, create_share_reate_vars, generate_logged_values]), df)