K. ossama K. ossama - 3 months ago 10
Python Question

What's the use of "flag" in pandas

When i was training with an exercise for predictive modeling, I couldn't understand the use of flags. I googled it but I couldn't find the best explanation.

train = pd.read_csv('C:/Users/Analytics Vidhya/Desktop/challenge/Train.csv')
test = pd.read_csv('C:/Users/Analytics Vidhya/Desktop/challenge/Test.csv')
train['Type'] = 'Train' #Create a flag for Train and Test Data set
test['Type'] = 'Test'
fullData = pd.concat([train,test], axis=0) #Combined both Train and Test Data set


Can you explain what does flag means in Python pandas and what's the importance of flags. Thank you.

Answer

I guess it's easier and faster to show it as an example:

In [102]: train = pd.DataFrame(np.random.randint(0, 5, (5, 3)), columns=list('abc'))

In [103]: test = pd.DataFrame(np.random.randint(0, 5, (3, 3)), columns=list('abc'))

In [104]: train
Out[104]:
   a  b  c
0  3  4  0
1  0  0  1
2  2  4  1
3  4  2  0
4  2  4  0

In [105]: test
Out[105]:
   a  b  c
0  1  0  3
1  3  3  0
2  4  4  3

let's add Type column to each DF:

In [106]: train['Type'] = 'Train'

In [107]: test['Type'] = 'Test'

now let's join / merge (vertically) both DFs - the Type column will help to distinguish data from two different DFs:

In [108]: fullData = pd.concat([train,test], axis=0)

In [109]: fullData
Out[109]:
   a  b  c   Type
0  3  4  0  Train
1  0  0  1  Train
2  2  4  1  Train
3  4  2  0  Train
4  2  4  0  Train
0  1  0  3   Test
1  3  3  0   Test
2  4  4  3   Test