如何将 pandas 文本列转换为 nltk 文本对象
how to convert pandas text column to nltk text object
我在 pandas
中有以下数据框
publish_date headline_text
20030219 aba decides against community broadcasting
20030219 act fire witnesses must be aware of defamation
20030219 a g calls for infrastructure protection summit
20030219 air nz staff in aust strike for pay rise
20030219 air nz strike to affect australian travellers
我想将 headline_text
列转换为 nltk 文本对象,以便在其上应用所有 nltk 方法。
我正在跟进,但似乎不起作用
headline_text = nlp_df['headline_text'].apply(lambda x: ''.join(x))
你可以这样做:
nltk_col = df.headline_text.apply(lambda row: nltk.Text(row.split(' ')))
要将此列分配给数据框,您可以执行以下操作:
df=df.assign(nltk_texts=nltk_col)
然后我们可以检查新 nltk_texts
列中第一行的类型:
print(type(df.nltk_texts.loc[0])) # outputs: nltk.text.Text
要将所有行统一到一个 NLTK 文本对象中,您可以这样做:
single = nltk.Text([word for row in df.headline_text for word in row.split(' ')])
那么print(type(single))
会输出nltk.text.Text
.
我在 pandas
中有以下数据框publish_date headline_text
20030219 aba decides against community broadcasting
20030219 act fire witnesses must be aware of defamation
20030219 a g calls for infrastructure protection summit
20030219 air nz staff in aust strike for pay rise
20030219 air nz strike to affect australian travellers
我想将 headline_text
列转换为 nltk 文本对象,以便在其上应用所有 nltk 方法。
我正在跟进,但似乎不起作用
headline_text = nlp_df['headline_text'].apply(lambda x: ''.join(x))
你可以这样做:
nltk_col = df.headline_text.apply(lambda row: nltk.Text(row.split(' ')))
要将此列分配给数据框,您可以执行以下操作:
df=df.assign(nltk_texts=nltk_col)
然后我们可以检查新 nltk_texts
列中第一行的类型:
print(type(df.nltk_texts.loc[0])) # outputs: nltk.text.Text
要将所有行统一到一个 NLTK 文本对象中,您可以这样做:
single = nltk.Text([word for row in df.headline_text for word in row.split(' ')])
那么print(type(single))
会输出nltk.text.Text
.