如何删除结果 pos_tag 中的方括号
How to remove square brackets in result pos_tag
我想从数据框中提取名词。我做的如下
import pandas as pd
import nltk
from nltk.tag import pos_tag
df = pd.DataFrame({'pos': ['noun', 'Alice', 'good', 'well', 'city']})
noun=[]
for index, row in df.iterrows():
noun.append([word for word,pos in pos_tag(row) if pos == 'NN'])
df['noun'] = noun
我得到 df['noun']
0 [noun]
1 [Alice]
2 []
3 []
4 [city]
我使用正则表达式
df['noun'].replace('[^a-zA-Z0-9]', '', regex = True)
又一次
0 [noun]
1 [Alice]
2 []
3 []
4 [city]
Name: noun, dtype: object
怎么了?
括号表示您在数据框的每个单元格中都有列表。如果你确定每个列表中最多只有一个元素,你可以在名词栏上使用str
并提取第一个元素:
df['noun'] = df.noun.str[0]
df
# pos noun
#0 noun noun
#1 Alice Alice
#2 good NaN
#3 well NaN
#4 city city
我想从数据框中提取名词。我做的如下
import pandas as pd
import nltk
from nltk.tag import pos_tag
df = pd.DataFrame({'pos': ['noun', 'Alice', 'good', 'well', 'city']})
noun=[]
for index, row in df.iterrows():
noun.append([word for word,pos in pos_tag(row) if pos == 'NN'])
df['noun'] = noun
我得到 df['noun']
0 [noun]
1 [Alice]
2 []
3 []
4 [city]
我使用正则表达式
df['noun'].replace('[^a-zA-Z0-9]', '', regex = True)
又一次
0 [noun]
1 [Alice]
2 []
3 []
4 [city]
Name: noun, dtype: object
怎么了?
括号表示您在数据框的每个单元格中都有列表。如果你确定每个列表中最多只有一个元素,你可以在名词栏上使用str
并提取第一个元素:
df['noun'] = df.noun.str[0]
df
# pos noun
#0 noun noun
#1 Alice Alice
#2 good NaN
#3 well NaN
#4 city city