Edward Edward - 1 year ago 89
Python Question

How to remove square brackets in result pos_tag

I want to extract nouns from dataframe. I do as below

import pandas as pd
import nltk
from nltk.tag import pos_tag
df = pd.DataFrame({'pos': ['noun', 'Alice', 'good', 'well', 'city']})
for index, row in df.iterrows():
noun.append([word for word,pos in pos_tag(row) if pos == 'NN'])
df['noun'] = noun

and i get df['noun']

0 [noun]
1 [Alice]
2 []
3 []
4 [city]

I use regex

df['noun'].replace('[^a-zA-Z0-9]', '', regex = True)

and again

0 [noun]
1 [Alice]
2 []
3 []
4 [city]
Name: noun, dtype: object

what's wrong?

Answer Source

The bracket means you have lists in each cell of the data frame. If you are sure there is only one element at most in each list, you can use str on the noun column and extract the first element:

df['noun'] = df.noun.str[0]

#    pos    noun
#0  noun    noun
#1  Alice   Alice
#2  good    NaN
#3  well    NaN
#4  city    city