Akshay Hazari Akshay Hazari - 3 months ago 22
Python Question

Move non empty cells to left in grouped columns pandas

I have a dataframe where there are multiple columns with similar column names. I want the empty cells to be populated with those columns which have data to the right.

Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
ABC nan def nan 9091-XYz nan nan XYZ-ABZ


Should be column shifted to something like

Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
ABC def nan nan 9091-XYz XYZ-ABZ nan nan


There's another question which solves a similar problem.

pdf = pd.read_csv('Data.txt',sep='\t')

# gets a set of columns removing the numerical part
columns = set(map(lambda x : x.rstrip('0123456789'),pdf.columns))

for col_pattern in columns:
# get columns with similar names
current = [col for col in pdf.columns if col_pattern in col]
coldf= pdf[current]
# shift columns to the left


The file
Data.txt
has columns sorted by column names so all the columns with similar names come together.

Any help with this is appreciated

I had tried adding this to the above code from the link, which ran out of memory :

newdf=pd.read_csv(StringIO(u''+re.sub(',+',',',df.to_csv()).decode('utf-8')))
list_.append(newdf)
pd.concat(list_,axis=0).to_csv('test.txt')

Answer

pushna
Pushes all null values to the end of the series

coltype
Uses regex to extract the non-numeric prefix from all column names

def pushna(s):
    notnull = s[s.notnull()]
    isnull = s[s.isnull()]
    values = notnull.append(isnull).values
    return pd.Series(values, s.index)

coltype = df.columns.to_series().str.extract(r'(\D*)', expand=False)

df.groupby(coltype, axis=1).apply(lambda df: df.apply(pushna, axis=1))

enter image description here