Debayan Chakraborty Debayan Chakraborty - 2 months ago 17
Python Question

Mask numbers in pandas

i have an input to coloumns of a dataframe as 12345 and want to output to excel sheet as 1XXX5 how to do this . The data type in the dataframe coloumn is an integer

df=pd.read_excel('INVAMF.xls',sheetname=4,header=0,skiprows=0)
#df created

print df.dtypes
print np.count_nonzero(pd.value_counts(df['ACCOUNT_NUMBER'].values))

s = (df['ACCOUNT_NUMBER'])

print s
s = s.astype(str)

s.apply(lambda x: x[0] + 'X' * (len(x) - 2) + x[-1])

print s

0 32642
1 32643
2 32644
3 32677
4 32680
5 32680
6 32688
7 32688
8 32695
9 32708
10 32708
11 32709
12 32710
13 32734
14 32734
15 32738
16 32738
17 6109
18 6109



  1. List item


Answer

As you've failed to post any data and code here is a general form assuming that numbers are varying length:

In [141]:
s = pd.Series([8815392,2983])
s = s.astype(str)
s.apply(lambda x: x[0] + 'X' * (len(x) - 2) + x[-1])

Out[141]:
0    8XXXXX2
1       2XX3
dtype: object

if the numbers are equal length you can use a vectorised method to set the entire column:

In [142]:
s = pd.Series([8815392,1291283])
s = s.astype(str)
s.str[0] + 'X' * (s.str.len() - 2)[0] + s.str[-1]

Out[142]:
0    8XXXXX2
1    1XXXXX3
dtype: object

Also just to clarify a common problem you need to assign back the operation as most pandas methods return a copy and don't work in place, although some methods do have a inplace arg. So you need to do the following:

s = s.apply(lambda x: x[0] + 'X' * (len(x) - 2) + x[-1])