如何在给定条件下填充列,该条件检查索引上的列表并根据该索引进行分配
How to fill a column given a condition that check a list on a index and assign given that index
为了以防万一,我会解释完整的上下文,我找到了一些解决方案,但只有明确的 for i in range
或通过设置简单的条件,而不是我需要的。
我有一个包含以下列的数据框:post
、author
、DateTime
、day_of_week
、hours
现在我想计算以下概率:
that any author post a post on a specific day of the week
即 number_post_that_week_day/total_post
这很简单,可以按照下面的方式完成(可能不是最好的方法,但可以接受):
count_by_field = data_set.groupby('day_of_week').count()['post']
total_by_field = data_set.groupby('day_of_week').count()['post'].sum()
temp_prob_by_field = count_by_field / total_by_field
# In case I need that the size of temp_prob_by_field should be 7
# but my sample, in some cases, only has Monday, Saturday
# With the next lines I will always have 7 records
for index in range(size):
if not index in temp_prob_by_field.index:
temp_prob_by_field.loc[index] = 0
问题
我想将我的概率值分配给新列 (prob
) 上的原始 data_set
,但我希望它与星期几列相匹配,我的意思是:
如果在记录中,我在列 day_of_week 上有 3(这意味着星期三)。我想要的是,在 probs
列的记录中相关的概率。
我一直在尝试的(没有成功):
data_set[data_set.loc[ data_set['hours'] in temp_prob_by_field.index, temp_prob_by_field ]]
= temp_prob_by_field.loc[data_set.loc[ data_set['hours'] in temp_prob_by_field.index] # ♂️
我可以通过如下方式执行 for in 来做到这一点:
for i in range(7):
data_set.loc[data_set['hours'] == i, 'probs' ] = temp_prob_by_field.loc[i]
我真的是 pandas 的新手,在我看来这不是解决这个问题的好方法,也许我错了。
作为 @not_speshai 作为 data_sample 一起玩:
import pandas as pd
import numpy as np
np.random.seed(1213)
c = ['post', 'author', 'datetime', 'day_of_week', 'hours']
data = pd.DataFrame(np.random.choice([1,0,3,5], size=(10,5)), columns=c)
data['post']='A post about something"
""" post author datetime day_of_week hours
0 A post about something 5 5 0 3
1 A post about something 1 1 1 5
2 A post about something 3 1 3 5
3 A post about something 5 3 5 1
4 A post about something 0 5 3 0
5 A post about something 3 3 0 1
6 A post about something 0 5 5 0
7 A post about something 3 3 5 3
8 A post about something 5 1 1 0
9 A post about something 1 0 0 3
"""
我想你要找的是pd.merge
。尝试:
data.merge(temp_prob_by_field, left_on="day_of_week", right_index=True)
为了以防万一,我会解释完整的上下文,我找到了一些解决方案,但只有明确的 for i in range
或通过设置简单的条件,而不是我需要的。
我有一个包含以下列的数据框:post
、author
、DateTime
、day_of_week
、hours
现在我想计算以下概率:
that any author post a post on a specific day of the week
即 number_post_that_week_day/total_post
这很简单,可以按照下面的方式完成(可能不是最好的方法,但可以接受):
count_by_field = data_set.groupby('day_of_week').count()['post']
total_by_field = data_set.groupby('day_of_week').count()['post'].sum()
temp_prob_by_field = count_by_field / total_by_field
# In case I need that the size of temp_prob_by_field should be 7
# but my sample, in some cases, only has Monday, Saturday
# With the next lines I will always have 7 records
for index in range(size):
if not index in temp_prob_by_field.index:
temp_prob_by_field.loc[index] = 0
问题
我想将我的概率值分配给新列 (prob
) 上的原始 data_set
,但我希望它与星期几列相匹配,我的意思是:
如果在记录中,我在列 day_of_week 上有 3(这意味着星期三)。我想要的是,在 probs
列的记录中相关的概率。
我一直在尝试的(没有成功):
data_set[data_set.loc[ data_set['hours'] in temp_prob_by_field.index, temp_prob_by_field ]]
= temp_prob_by_field.loc[data_set.loc[ data_set['hours'] in temp_prob_by_field.index] # ♂️
我可以通过如下方式执行 for in 来做到这一点:
for i in range(7):
data_set.loc[data_set['hours'] == i, 'probs' ] = temp_prob_by_field.loc[i]
我真的是 pandas 的新手,在我看来这不是解决这个问题的好方法,也许我错了。
作为 @not_speshai 作为 data_sample 一起玩:
import pandas as pd
import numpy as np
np.random.seed(1213)
c = ['post', 'author', 'datetime', 'day_of_week', 'hours']
data = pd.DataFrame(np.random.choice([1,0,3,5], size=(10,5)), columns=c)
data['post']='A post about something"
""" post author datetime day_of_week hours
0 A post about something 5 5 0 3
1 A post about something 1 1 1 5
2 A post about something 3 1 3 5
3 A post about something 5 3 5 1
4 A post about something 0 5 3 0
5 A post about something 3 3 0 1
6 A post about something 0 5 5 0
7 A post about something 3 3 5 3
8 A post about something 5 1 1 0
9 A post about something 1 0 0 3
"""
我想你要找的是pd.merge
。尝试:
data.merge(temp_prob_by_field, left_on="day_of_week", right_index=True)