user2573355 user2573355 - 2 months ago 4
Python Question

Use Class in a loop- python

I'm new to using Classes in Python, and could use some guidance on what resources to consult/how to use a class in a loop.

Sample data:

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0, 1, size=(100, 1)), columns=list('E'))
df['E']= df2

here's the code outside of a class:

styles = [1, 3, 7]

def train_model(X, y):
clf = LogisticRegression(random_state=0, C=1, penalty='l1'), y)

for value in styles:
X = df[['A',
y = df['E'][df['D']==value]
train_model(X, y)

I need to translate this into a class, like so:

class BaseTrainer(object):
""" Abstract class to define run order """

def run(self):
for value in [1, 3, 7]:
# I think there's a better way to do this
if value = 1:
pickle_model(self.model, self.model_file)
if value = 3:
pickle_model(self.model, self.model_file2)
if value = 7:
pickle_model(self.model, self.model_file3)

class ModelTrainer(BaseTrainer):
""" Class to train model for predicting Traits of Customers """

def __init__(self):
self.model_file = '/wayfair/mnt/crunch_buckets/central/data_science/customer_style/train_modern.pkl'
self.model_file2 = '/wayfair/mnt/crunch_buckets/central/data_science/customer_style/train_traditional.pkl'
self.model_file3 = '/wayfair/mnt/crunch_buckets/central/data_science/customer_style/train_rustic.pkl'

def import_training_data(self):

self.df = _read_data('training_data.csv')
self.df.columns = [['CuID', 'StyID', 'StyName',
'Filter', 'PropItemsViewed', 'PropItemsOrdered', 'DaysSinceView']]

def extract_variables(self, value):
# Take subset of columns for training purposes (remove CuID, Segment)
self.X = self.df[['PropItemsViewed', 'PropItemsOrdered',

y = self.df[['Filter']][df['StyID']==value]
self.y = y.flatten()

def train_model(self):
self.model = LogisticRegression(C=1, penalty='l1'), self.y)

I think there must be a better way to structure it or run through the three different values in the styles list. But I don't even know what to search for to improve this. Any suggestions, pointers, etc. would be appreciated!


An elegant way to do it is to iterate through both lists at the same time using zip

def run(self):
    for value,model_file in zip([1, 3, 7],[self.model_file, self.model_file2, self.model_file3]):

        pickle_model(self.model, model_file)

As for the design it could be improved

For instance, define your model files as a list directly:

self.model_list = map(lambda x : os.path.join('/wayfair/mnt/crunch_buckets/central/data_science/customer_style',x),['train_modern.pkl','train_traditional','train_rustic.pkl'])

Which gives:

def run(self):
    for value,model_file in zip([1, 3, 7],self.model_file_list):