Sanchit Aluna Sanchit Aluna - 3 months ago 14
Python Question

Need warning message if count of each country code is less than 5

I am trying to get a warning or print message if count or frequency of a particular country code is less than 5.

QuoteID
1500759-BE
1500759-BE
1500759-BE
1500759-BE
1605101-FR
1605101-FR
1605101-FR
1605119-FR
1605119-FR
1605119-FR
1605119-FR
1605119-FR
1600896-NL
1600896-NL
1600896-NL
1600898-NL
1600898-NL
1600898-NL
1600898-NL
1600898-NL
1600898-NL


Tried the below code

chars=('BE','FR','NL')
check_string=OutputData['QuoteID']

for char in chars:
count = check_string.count(char)
if count < 5:
print ('count is less than 5 )


expected result is - "warning 'category BE' has less than 5 records"

OutputData
- Data set name

QuoteID
- variable name

values like
1500759-BE
is value in variable and frequency or count of 'BE', 'FR' and 'NL' has to be counted and warning message required if count is less than 5.

Many thanks in advance

Answer

You could use str.extract to extract the country codes from each QuoteID string as follows:

In [16]: df['CountryCode'] = df['QuoteID'].str.extract('(?P<letter>BE|FR|NL)', expand=True)

In [17]: df
Out[17]: 
       QuoteID CountryCode
0   1500759-BE          BE
1   1500759-BE          BE
2   1500759-BE          BE
3   1500759-BE          BE
4   1605101-FR          FR
5   1605101-FR          FR
6   1605101-FR          FR
7   1605119-FR          FR
8   1605119-FR          FR
9   1605119-FR          FR
10  1605119-FR          FR
11  1605119-FR          FR
12  1600896-NL          NL
13  1600896-NL          NL
14  1600896-NL          NL
15  1600898-NL          NL
16  1600898-NL          NL
17  1600898-NL          NL
18  1600898-NL          NL
19  1600898-NL          NL
20  1600898-NL          NL

By using value_counts to compute the counts of unique values, you could then convert the series object to a dictionary by calling to_dict() followed by a list-comprehension to get your desired result.

In [18]: ["count of %s is %d" % (key, value) if value > 5 else   \
         "WARN!: count of category %s is less than 5" % (key)    \
         for key, value in df['CountryCode'].value_counts().to_dict().items()]
Out[18]: 
['WARN!: count of category BE is less than 5',
 'count of NL is 9',
 'count of FR is 8']