Federico Gentile Federico Gentile - 29 days ago 5
Python Question

Identify string in dataframe and replace content using Python

I have a csv file and I loaded using Pandas. Firstly I decide to rename the columns. The dataframe is this:

enter image description here

My goal is to check if all the columns of each row contain the following characters

\n
. If so, the cells of the row containing the previously mentioned string must be modified in such a way that the only content left is what comes after
\n
. The output of such algorithm should be like this:

enter image description here

The code so far is this but I got stuck on finding and removing \n along with what precedes it.

import pandas as pd
df = pd.read_csv('prova.csv', sep=',', skiprows=0, header=None,low_memory=False)
df.columns = ['A','B','C','D','E','F']
for index, row in df.iterrows():
if '\n' in row[?]:
# how do I remove the unwanted characters for each cell?


Notice: I want to investigate all the columns, not only those where
\n
appears.

A object
B object
C object
D object
E int64
F object
dtype: object

Answer

Solution
Use str accessor with split after stack to get a series.

df.astype(str).stack().str.split('\n').str[-1].unstack()

enter image description here


Setup Reference

df = pd.DataFrame([
        ['bello', 'bot', 'corpo', '105', 245, 'Yes'],
        ['bello', 'par\nsot', 'testo\ncorpo', '105', 660, 'Yes\nno'],
        ['bello', 'pic\nhot', 'fallo', '195\n250', 660, 'Yes'],
        ['bello', 'hot', 'fallo\nbacca', '105', 245, 'Yes']
    ], columns=list('ABCDEF'))