字数分布 Pandas 数据框
Word Count Distribution Pandas Dataframe
需要从数据帧中进行单词分布计数。有人知道如何解决吗?
原始数据:
word
apple pear
pear
best apple pear
期望的输出:
word count
apple 2
pear 3
best 1
运行 此代码:
rawData = pd.concat([rawData.groupby(rawData.word.str.split().str[0]).sum(),rawData.groupby(rawData.word.str.split().str[-1]).sum()]).reset_index()
出现此错误:
ValueError: cannot insert keyword, already exists
使用str.split
,然后explode
将每个列表放入一列,最后使用value_counts
计算每个单词的出现次数:
out = df['word'].str.split().explode().value_counts()
print(out)
# Output:
pear 3
apple 2
best 1
Name: word, dtype: int64
一步一步:
>>> df['word'].str.split()
0 [apple, pear]
1 [pear]
2 [best, apple, pear]
Name: word, dtype: object
>>> df['word'].str.split().explode()
0 apple
0 pear
1 pear
2 best
2 apple
2 pear
Name: word, dtype: object
>>> df['word'].str.split().explode().value_counts()
pear 3
apple 2
best 1
Name: word, dtype: int64
更新
要准确获得预期结果:
>>> df['word'].str.split().explode().value_counts(sort=False) \
.rename('count').rename_axis('word').reset_index()
word count
0 apple 2
1 pear 3
2 best 1
更新 2
按国家/地区获取价值计数:
data = {'country': [' US', ' US', ' US', ' UK', ' UK', ' UK', ' UK'],
'word': ['best pear', 'apple', 'apple pear',
'apple', 'apple', 'pear', 'apple pear ']}
df = pd.DataFrame(data)
out = df.assign(word=df['word'].str.split()) \
.explode('word').value_counts() \
.rename('count').reset_index()
print(out)
# Output:
country word count
0 UK apple 3
1 UK pear 2
2 US apple 2
3 US pear 2
4 US best 1
需要从数据帧中进行单词分布计数。有人知道如何解决吗?
原始数据:
word
apple pear
pear
best apple pear
期望的输出:
word count
apple 2
pear 3
best 1
运行 此代码:
rawData = pd.concat([rawData.groupby(rawData.word.str.split().str[0]).sum(),rawData.groupby(rawData.word.str.split().str[-1]).sum()]).reset_index()
出现此错误:
ValueError: cannot insert keyword, already exists
使用str.split
,然后explode
将每个列表放入一列,最后使用value_counts
计算每个单词的出现次数:
out = df['word'].str.split().explode().value_counts()
print(out)
# Output:
pear 3
apple 2
best 1
Name: word, dtype: int64
一步一步:
>>> df['word'].str.split()
0 [apple, pear]
1 [pear]
2 [best, apple, pear]
Name: word, dtype: object
>>> df['word'].str.split().explode()
0 apple
0 pear
1 pear
2 best
2 apple
2 pear
Name: word, dtype: object
>>> df['word'].str.split().explode().value_counts()
pear 3
apple 2
best 1
Name: word, dtype: int64
更新
要准确获得预期结果:
>>> df['word'].str.split().explode().value_counts(sort=False) \
.rename('count').rename_axis('word').reset_index()
word count
0 apple 2
1 pear 3
2 best 1
更新 2
按国家/地区获取价值计数:
data = {'country': [' US', ' US', ' US', ' UK', ' UK', ' UK', ' UK'],
'word': ['best pear', 'apple', 'apple pear',
'apple', 'apple', 'pear', 'apple pear ']}
df = pd.DataFrame(data)
out = df.assign(word=df['word'].str.split()) \
.explode('word').value_counts() \
.rename('count').reset_index()
print(out)
# Output:
country word count
0 UK apple 3
1 UK pear 2
2 US apple 2
3 US pear 2
4 US best 1