Python、Pandas：将数据帧过滤到一个子集并就地更新这个子集

Question

我有一个 pandas 数据框，看起来像：

cleanText.head()
    source      word    count
0   twain_ess            988
1   twain_ess   works    139
2   twain_ess   short    139
3   twain_ess   complete 139
4   twain_ess   would    98
5   twain_ess   push     94

以及包含每个来源的总字数的字典：

titles
{'orw_ess': 1729, 'orw_novel': 15534, 'twain_ess': 7680, 'twain_novel': 60004}

我的目标是通过该来源中的总字数将每个来源的字数标准化，即将它们转换为百分比。这看起来应该是微不足道的，但 python 似乎让它变得非常困难（如果有人能向我解释就地操作的规则，那就太好了）。

需要注意的是需要将 cleanText 中的条目过滤为仅来自单一来源的条目，然后我尝试将此子集的计数除以字典中的值。

# Adjust total word counts and normalize
for key, value in titles.items():

    # This corrects the total words for overcounting the '' entries
    overcounted= cleanText[cleanText.iloc[:,0]== key].iloc[0,2]
    titles[key]= titles[key]-overcounted

    # This is where I divide by total words, however it does not save inplace, or at all for that matter
    cleanText[cleanText.iloc[:,0]== key].iloc[:,2]= cleanText[cleanText.iloc[:,0]== key]['count']/titles[key]

如果有人能解释如何更改此除法语句，以便输出实际保存在原始列中，那就太好了。

谢谢

Answer 1

如果我理解正确的话：

cleanText['count']/cleanText['source'].map(titles)

这给你：

0    0.128646
1    0.018099
2    0.018099
3    0.018099
4    0.012760
5    0.012240
dtype: float64

要将这些百分比值重新分配到您的 count 列中，请使用：

cleanText['count'] = cleanText['count']/cleanText['source'].map(titles)

Python、Pandas：将数据帧过滤到一个子集并就地更新这个子集

Python, Pandas: Filter dataframe to a subset and update this subset in place

python

in-place

pandas