Ruffy26 Ruffy26 - 15 days ago 5
Python Question

How to assign values to a column of a dataframe based on a condition?

I have a dataframe which looks like the following:

POSITION Code_Count
S1 {"[471E;1]"}
S2 {"[471E;1]"}
S3 {"[471E;1]"}
S4 {"[471E;1]"}
S5 {"[471E;1]"}
S6 {"[5812;1]"}
S7 {"[471E;1]"}
S8 {"[471E;1]"}
T1 {"[7A2A;1]"}
T2 {"[471E;1]"}
T3 {"[7C95;1]"}
T4 {"[471E;1]"}
T5 {"[471E;1]"}
T6 {"[471E;1]"}
T7 {"[471E;1]"}
T8 {"[471E;1]"}


In the Code_Count column the first string is a code and the number is the count.
Furthermore, the codes are classified into 4 categories A through D. The list of all the codes present in the categories is given below:
The codes are classified into 4 categories say A through D as follows:

Category A contains these codes:  7749  7783  7784  7786  7A14  7AC5  7C88  7C92  7C93  7C95  C749  C783  C784  C786  CA14  CAC5  CC88  CC92  CC93  CC95  442A  49C2

Category B has these codes:  1D  32  430B  4415  448E  4490  4492  457A  457B  496C  4970  778A  7A09  7A2A  7A2C  7C7C  7C80  C78A  CA09  CA2A  CA2C

Category C has these codes:  7A7F  7A80  7C7E  CA7F  CA80  CAC8 7AC8 C77E  445A  496E  471E  49CA

Category D: 7AF0 7AF1 7AF2 7AF3 CAF0 CAF1 CAF2 CAF3 4616 4617 4618 5812

I would like for my final dataframe to contain the counts of codes to corresponding positions according to the codes present in the initial dataframe by sorting them out according to the Category they belong to. For example, the output dataframe according to the above dataframe should be:

POSITION Category A Category B Category C Category D
S1 0 0 1 0
S2 0 0 1 0
S3 0 0 1 0
S4 0 0 1 0
S5 0 0 1 0
S6 0 0 0 1
S7 0 0 1 0
S8 0 0 1 0
T1 0 1 0 0
T2 0 0 1 0
T3 1 0 0 0
T4 0 0 1 0
T5 0 0 1 0
T6 0 0 1 0
T7 0 0 1 0
T8 0 0 1 0


I have tried using str.contains method but without any success. Any help would be much appreciated. Thanks a lot in advance!

Answer

I think you can use for Category A:

catA = ['7749','7783','7784','7786','7A14','7AC5','7C88','7C92','7C93','7C95','C749','C783','C784','C786','CA14','CAC5','CC88','CC92','CC93','CC95','442A','49C2']


df[['Code','Count']] = df.Code_Count.str.strip('{["]}').str.split(';', expand=True)
df['Category A'] = df.ix[df.Code.isin(catA), 'Count']
df['Category A'] = df['Category A'].fillna(0)
print (df)
   POSITION    Code_Count  Code Count Category A
0        S1  {"[471E;1]"}  471E     1          0
1        S2  {"[471E;1]"}  471E     1          0
2        S3  {"[471E;1]"}  471E     1          0
3        S4  {"[471E;1]"}  471E     1          0
4        S5  {"[471E;1]"}  471E     1          0
5        S6  {"[5812;1]"}  5812     1          0
6        S7  {"[471E;1]"}  471E     1          0
7        S8  {"[471E;1]"}  471E     1          0
8        T1  {"[7A2A;1]"}  7A2A     1          0
9        T2  {"[471E;1]"}  471E     1          0
10       T3  {"[7C95;1]"}  7C95     1          1
11       T4  {"[471E;1]"}  471E     1          0
12       T5  {"[471E;1]"}  471E     1          0
13       T6  {"[471E;1]"}  471E     1          0
14       T7  {"[471E;1]"}  471E     1          0
15       T8  {"[471E;1]"}  471E     1          0
Comments