如何在给定条件下填充列，该条件检查索引上的列表并根据该索引进行分配

Question

为了以防万一，我会解释完整的上下文，我找到了一些解决方案，但只有明确的 for i in range 或通过设置简单的条件，而不是我需要的。

我有一个包含以下列的数据框：post、author、DateTime、day_of_week、hours

现在我想计算以下概率： that any author post a post on a specific day of the week 即 number_post_that_week_day/total_post

这很简单，可以按照下面的方式完成（可能不是最好的方法，但可以接受）：

count_by_field = data_set.groupby('day_of_week').count()['post']
total_by_field = data_set.groupby('day_of_week').count()['post'].sum()
temp_prob_by_field = count_by_field / total_by_field

# In case I need that the size of temp_prob_by_field should be 7
# but my sample, in some cases, only has Monday, Saturday
# With the next lines I will always have 7 records 
for index in range(size):
        if not index in temp_prob_by_field.index:
            temp_prob_by_field.loc[index] = 0

问题

我想将我的概率值分配给新列 (prob) 上的原始 data_set，但我希望它与星期几列相匹配，我的意思是：如果在记录中，我在列 day_of_week 上有 3（这意味着星期三）。我想要的是，在 probs 列的记录中相关的概率。

我一直在尝试的（没有成功）：

data_set[data_set.loc[ data_set['hours'] in  temp_prob_by_field.index, temp_prob_by_field ]] 
= temp_prob_by_field.loc[data_set.loc[ data_set['hours'] in  temp_prob_by_field.index] # ‍♂️

我可以通过如下方式执行 for in 来做到这一点：

for i in range(7):
  data_set.loc[data_set['hours'] == i, 'probs' ] = temp_prob_by_field.loc[i]

我真的是 pandas 的新手，在我看来这不是解决这个问题的好方法，也许我错了。

作为 @not_speshai 作为 data_sample 一起玩：

import pandas as pd
import numpy as np
np.random.seed(1213)
c = ['post', 'author', 'datetime', 'day_of_week', 'hours']
data = pd.DataFrame(np.random.choice([1,0,3,5], size=(10,5)), columns=c)
data['post']='A post about something"


"""                  post  author  datetime  day_of_week  hours
0  A post about something       5         5            0      3
1  A post about something       1         1            1      5
2  A post about something       3         1            3      5
3  A post about something       5         3            5      1
4  A post about something       0         5            3      0
5  A post about something       3         3            0      1
6  A post about something       0         5            5      0
7  A post about something       3         3            5      3
8  A post about something       5         1            1      0
9  A post about something       1         0            0      3
"""

Answer 1

我想你要找的是pd.merge。尝试：

data.merge(temp_prob_by_field, left_on="day_of_week", right_index=True)

如何在给定条件下填充列，该条件检查索引上的列表并根据该索引进行分配

How to fill a column given a condition that check a list on a index and assign given that index

python

multiple-columns

conditional-statements

assign

pandas