david david - 1 month ago 8
Python Question

How to do a substring using pandas or numpy

I'm trying to do a substring on data from column "ORG". I only need the 2nd and 3rd character. So for 413 I only need 13. I've tried the following:

Attempt 1: dr2['unit'] = dr2[['ORG']][1:2]
Attempt 2: dr2['unit'] = dr2[['ORG'].str[1:2]
Attempt 3: dr2['unit'] = dr2[['ORG'].str([1:2])


My dataframe:

REGION ORG
90 4 413
91 4 413
92 4 413
93 5 503
94 5 503
95 5 503
96 5 503
97 5 504
98 5 504
99 1 117
100 1 117
101 1 117
102 1 117
103 1 117
104 1 117
105 1 117
106 3 3
107 3 3
108 3 3
109 3 3


Expected output:

REGION ORG UNIT
90 4 413 13
91 4 413 13
92 4 413 13
93 5 503 03
94 5 503 03
95 5 503 03
96 5 503 03
97 5 504 04
98 5 504 04
99 1 117 17
100 1 117 17
101 1 117 17
102 1 117 17
103 1 117 17
104 1 117 17
105 1 117 17
106 3 3 03
107 3 3 03
108 3 3 03
109 3 3 03


thanks for any and all help!

Answer

Your square braces are not matching and you can easily slice with [-2:].

apply str.zfill with a width of 2 to pad the items in the new series:

>>> import pandas as pd
>>> ld = [{'REGION': '4', 'ORG': '413'}, {'REGION': '4', 'ORG': '414'}]
>>> df = pd.DataFrame(ld)
>>> df
   ORG REGION
0  413      4
1  414      4
>>> df['UNIT'] = df['ORG'].str[-2:].apply(str.zfill, args=(2,))
>>> df
   ORG REGION UNIT
0  413      4   13
1  414      4   14
2    3      4   03