Kiran Kiran - 3 years ago 65
Python Question

Is there any dynamic code for for-loop or any other loop for works on big data?

Data.csv file(sample data)

Taluka Crop Village Area
T1 C1 V1 11
T1 C1 V2 15
T1 C1 V3 3
T1 C1 V4 1
T1 C1 V5 2
T1 C2 V1 12
T1 C2 V2 16
T1 C2 V3 4
T1 C2 V4 100
T1 C2 V5 52
T1 C3 V1 47
T1 C3 V2 15
T1 C3 V3 21
T1 C3 V4 5
T1 C3 V5 7
T1 C4 V1 20
T1 C4 V2 14
T1 C4 V3 18
T1 C4 V4 5
T1 C4 V5 24
T2 C1 V1 21
T2 C1 V2 20
T2 C1 V3 14
T2 C1 V4 7
T2 C1 V5 8
T2 C2 V1 18
T2 C2 V2 3
T2 C2 V3 12
T2 C2 V4 78
T2 C2 V5 56
T2 C3 V1 16
T2 C3 V2 11
T2 C3 V3 15
T2 C3 V2 45
T2 C3 V3 2
T2 C4 V1 3
T2 C4 V2 12
T2 C4 V3 12
T2 C4 V4 44
T2 C4 V5 10


I want to find out,

which villages have high risk,medium risk and low risk area for particular crop for a particular taluka.

I have total 500 taluka's and under 500 taluka's there have 10 to 14 crops , and in each taluka's there will be 100 to 200 villages.

So, I want to find out , for Taluka-1 (i.e-Thane) for Crop-1(i.e Paddy) which villages are under high risk ,medium risk and low risk. using percentile method.

I have done some work. but problem is my code is not dynamic. I need to type each taluka - each crop and there was so many combinations. So. I need to do this dynamically, using some loop ( i.e for loop, if loop )
But I am stuck on this part.

Please see my code.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df=pd.read_csv("/home/desktop/Data.csv")


df.head()

##part-1 Partition taluka's
T1= df[df['Taluka'] == 'T1']
T2= df[df['Taluka'] == 'T2']


##Part-2 Partition crop wise in each taluka's

T1_C1= T1[T1['Crop'] == 'C1']
T1_C2= T1[T1['Crop'] == 'C2']
T1_C3= T1[T1['Crop'] == 'C3']
T1_C4= T1[T1['Crop'] == 'C4']

T2_C1= T2[T2['Crop'] == 'C1']
T2_C2= T2[T2['Crop'] == 'C2']
T2_C3= T2[T2['Crop'] == 'C3']
T2_C4= T2[T2['Crop'] == 'C4']


##Descending order
T1_C1 = T1_C1.sort('Area', ascending=False)
T1_C2 = T1_C2.sort('Area', ascending=False)
T1_C3 = T1_C3.sort('Area', ascending=False)
T1_C4 = T1_C4.sort('Area', ascending=False)

T2_C1 = T2_C1.sort('Area', ascending=False)
T2_C2 = T2_C2.sort('Area', ascending=False)
T2_C3 = T2_C3.sort('Area', ascending=False)
T2_C4 = T2_C4.sort('Area', ascending=False)


#####Add levels for for each crops in each taluka's

T1_C1['Level'] = pd.qcut(T1_C1['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C2['Level'] = pd.qcut(T1_C2['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C3['Level'] = pd.qcut(T1_C3['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C4['Level'] = pd.qcut(T1_C4['Area'], 3, ['Low Risk','Medium Risk','High Risk'])

T2_C1['Level'] = pd.qcut(T2_C1['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C2['Level'] = pd.qcut(T2_C2['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C3['Level'] = pd.qcut(T2_C3['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C4['Level'] = pd.qcut(T2_C4['Area'], 3, ['Low Risk','Medium Risk','High Risk'])


print(T1_C1)


So, here i will get for crop C1 , for taluka T1 ,which villages are in high risk area , low risk area...

How to this in loop ? where I have reduce code. and code will be use for 500 taluka's ?

Answer Source

I think you need groupby with apply and custom function:

def f(x):
    labels = ['Low Risk','Medium Risk','High Risk']
    x['Level'] = pd.qcut(x['Area'].sort_values(ascending=False), 3, labels = labels)
    return x


df1 = df.groupby(['Taluka','Crop']).apply(f)

print (df1)
   Taluka Crop Village  Area        Level
0      T1   C1      V1    11    High Risk
1      T1   C1      V2    15    High Risk
2      T1   C1      V3     3  Medium Risk
3      T1   C1      V4     1     Low Risk
4      T1   C1      V5     2     Low Risk
5      T1   C2      V1    12     Low Risk
6      T1   C2      V2    16  Medium Risk
7      T1   C2      V3     4     Low Risk
8      T1   C2      V4   100    High Risk
9      T1   C2      V5    52    High Risk
10     T1   C3      V1    47    High Risk
11     T1   C3      V2    15  Medium Risk
12     T1   C3      V3    21    High Risk
13     T1   C3      V4     5     Low Risk
14     T1   C3      V5     7     Low Risk
15     T1   C4      V1    20    High Risk
16     T1   C4      V2    14     Low Risk
17     T1   C4      V3    18  Medium Risk
18     T1   C4      V4     5     Low Risk
19     T1   C4      V5    24    High Risk
20     T2   C1      V1    21    High Risk
21     T2   C1      V2    20    High Risk
22     T2   C1      V3    14  Medium Risk
23     T2   C1      V4     7     Low Risk
24     T2   C1      V5     8     Low Risk
25     T2   C2      V1    18  Medium Risk
26     T2   C2      V2     3     Low Risk
27     T2   C2      V3    12     Low Risk
28     T2   C2      V4    78    High Risk
29     T2   C2      V5    56    High Risk
30     T2   C3      V1    16    High Risk
31     T2   C3      V2    11     Low Risk
32     T2   C3      V3    15  Medium Risk
33     T2   C3      V2    45    High Risk
34     T2   C3      V3     2     Low Risk
35     T2   C4      V1     3     Low Risk
36     T2   C4      V2    12  Medium Risk
37     T2   C4      V3    12  Medium Risk
38     T2   C4      V4    44    High Risk
39     T2   C4      V5    10     Low Risk
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download