python 字数统计 (defaultdict) 列未显示

Question

import pandas as pd
from collections import defaultdict
word_name = []
y = 0

text_list = ['france', 'spain', 'spain beaches', 'france beaches', 'spain best beaches']

word_freq = defaultdict(int)

for text in text_list:
    for word in text.split():
        word_freq[word] += 1
        word_name.append(word)


df = pd.DataFrame.from_dict(word_freq, orient='index') \
.sort_values(0, ascending=False) \
.rename(columns={0: 'Word_freq'}) \
.rename(columns={0: 'Word'})

所以我尝试了多种方法将其转换为数据框，但它没有显示单词的列名。我怎样才能表明它？

Answer 1

我不太确定你所说的 "it does not show the column name for the words," 是什么意思，但假设你想正确设置 column/index 名称，你可以这样做：

>>> df = pd.DataFrame.from_dict(word_freq, orient='index')
>>> df = df.rename(columns={0: 'WordFreq'})
>>> df.index.name = 'Word'
>>> df
         WordFreq
Word
france          2
spain           3
beaches         3
best            1

Answer 2

您知道 collections 库中的计数器 class 吗？你可以通过使用默认字典的 in-place 来简化你的代码。

from collections import Counter


text_list = ['france', 'spain', 'spain beaches', 'france beaches', 'spain best beaches']

counter_dict = Counter([split_word for word in text_list for split_word in word.split()]
#Counter({'france': 2, 'spain': 3, 'beaches': 3, 'best': 1})

然后使用 to_dict 附件构建您的数据框。

df = pd.DataFrame.from_dict(counter_dict
    ,
    orient="index",
    columns=["WordFreq"],
).rename_axis('Word')

         WordFreq
Word             
france          2
spain           3
beaches         3
best            1

python 字数统计 (defaultdict) 列未显示

python word count(defaultdict) column not showing

python

dataframe

pandas

defaultdict