Hoap Humanoid Hoap Humanoid - 3 months ago 8
Python Question

Read file into dataframe spliting the text after the first word in python

I have a file of strings

file.txt
, where the first word is a class name and the rest is a description, like the following:

n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias


I would like to read the file into a dataframe of two columns
df['class']
with the class and
df['description']
with the rest of the content.

Answer

You could do:

df = pd.read_csv(data, sep='\s{2,}', engine='python', names=['col'])

df['class'] = df['col'].str.split().apply(lambda x: x[0])
# Splitting on first occurence of whitespace
df['description'] = df['col'].str.join('').apply(lambda x: x.split(' ',1)[1])
del(df['col'])

print (df)

       class                                        description
0  n01440764                                 tench, Tinca tinca
1  n01443537                        goldfish, Carassius auratus
2  n01484850  great white shark, white shark, man-eater, man...