将一行分布在共享相同键的其他行上
Distribute a row over other rows sharing same key
我有一个如下所示的数据框:
+------+------------+-------+--------------+
| name | date | value | replacement |
+------+------------+-------+--------------+
| A | 20/11/2016 | 10 | NaN |
| C | 20/11/2016 | 8 | [A,B] |
| B | 20/11/2016 | 12 | NaN |
| E | 25/12/2016 | 16 | NaN |
| F | 25/12/2016 | 18 | NaN |
| D | 25/12/2016 | 11 | [E,F] |
+------+------------+-------+--------------+
我想做什么:
对于在 'replacement' 列中有名称列表的每一行,我希望分发它的 'value'在包含相同日期的那些替换项的行上均等。
对于前面的示例,输出将如下所示:
+------+------------+-------+------------------+
| name | date | value | additional value |
+------+------------+-------+------------------+
| A | 20/11/2016 | 10 | 4 |
| B | 20/11/2016 | 12 | 4 |
| A | 25/12/2016 | 16 | 5.5 |
| B | 25/12/2016 | 18 | 5.5 |
+------+------------+-------+------------------+
我设法通过拆分这些行并按名称 + 日期分组来找到一种无需创建新列即可直接执行分发的方法,但是 1/ 它太慢了 + 2/ 我确实需要创建该附加列并且找不到这样做的方法。
想法是按 replacement
列表的长度创建新列,其中 Series.str.len
and then DataFrame.explode
(pandas 0.25+) them to scalars. Divide columns value
by new
and merge
按原始列名添加原始列:
df1 = df.assign(new=df['replacement'].str.len()).explode('replacement')
df1['new'] = df1['value'].div(df1['new'])
df1 = df1[['name','date','value']].merge(df1[['replacement','date','new']],
left_on=['name','date'],
right_on=['replacement','date'])
df1['replacement'] = df1.pop('new')
print (df1)
name date value replacement
0 A 20/11/2016 10 4.0
1 B 20/11/2016 12 4.0
2 A 25/12/2016 16 5.5
3 B 25/12/2016 18 5.5
类似的解决方案,使用删除而不是选择:
df1 = df.assign(new=df['replacement'].str.len()).explode('replacement')
df1['new'] = df1['value'].div(df1['new'])
df1 = df1.drop(['replacement','new'],1).merge(df1.drop(['name','value'],1),
left_on=['name','date'],
right_on=['replacement','date'])
df1['replacement'] = df1.pop('new')
print (df1)
name date value replacement
0 A 20/11/2016 10 4.0
1 B 20/11/2016 12 4.0
2 A 25/12/2016 16 5.5
3 B 25/12/2016 18 5.5
这是另一种使用 explode
(需要 pandas 0.25+)和 groupby
的方法:
m = df[[isinstance(i,list) for i in df.replacement]] #df which has lists in replacement col
g = m.explode('replacement').groupby('date') #explode and groupby by date
#drop indices of m and assign the divided value
final = df.drop(m.index).set_index('date').assign(
replacement=(g['value'].mean()/g.size())).reset_index()
date name value replacement
0 20/11/2016 A 10.0 4.0
1 20/11/2016 B 12.0 4.0
2 25/12/2016 A 16.0 5.5
3 25/12/2016 B 18.0 5.5
我有一个如下所示的数据框:
+------+------------+-------+--------------+
| name | date | value | replacement |
+------+------------+-------+--------------+
| A | 20/11/2016 | 10 | NaN |
| C | 20/11/2016 | 8 | [A,B] |
| B | 20/11/2016 | 12 | NaN |
| E | 25/12/2016 | 16 | NaN |
| F | 25/12/2016 | 18 | NaN |
| D | 25/12/2016 | 11 | [E,F] |
+------+------------+-------+--------------+
我想做什么:
对于在 'replacement' 列中有名称列表的每一行,我希望分发它的 'value'在包含相同日期的那些替换项的行上均等。
对于前面的示例,输出将如下所示:
+------+------------+-------+------------------+
| name | date | value | additional value |
+------+------------+-------+------------------+
| A | 20/11/2016 | 10 | 4 |
| B | 20/11/2016 | 12 | 4 |
| A | 25/12/2016 | 16 | 5.5 |
| B | 25/12/2016 | 18 | 5.5 |
+------+------------+-------+------------------+
我设法通过拆分这些行并按名称 + 日期分组来找到一种无需创建新列即可直接执行分发的方法,但是 1/ 它太慢了 + 2/ 我确实需要创建该附加列并且找不到这样做的方法。
想法是按 replacement
列表的长度创建新列,其中 Series.str.len
and then DataFrame.explode
(pandas 0.25+) them to scalars. Divide columns value
by new
and merge
按原始列名添加原始列:
df1 = df.assign(new=df['replacement'].str.len()).explode('replacement')
df1['new'] = df1['value'].div(df1['new'])
df1 = df1[['name','date','value']].merge(df1[['replacement','date','new']],
left_on=['name','date'],
right_on=['replacement','date'])
df1['replacement'] = df1.pop('new')
print (df1)
name date value replacement
0 A 20/11/2016 10 4.0
1 B 20/11/2016 12 4.0
2 A 25/12/2016 16 5.5
3 B 25/12/2016 18 5.5
类似的解决方案,使用删除而不是选择:
df1 = df.assign(new=df['replacement'].str.len()).explode('replacement')
df1['new'] = df1['value'].div(df1['new'])
df1 = df1.drop(['replacement','new'],1).merge(df1.drop(['name','value'],1),
left_on=['name','date'],
right_on=['replacement','date'])
df1['replacement'] = df1.pop('new')
print (df1)
name date value replacement
0 A 20/11/2016 10 4.0
1 B 20/11/2016 12 4.0
2 A 25/12/2016 16 5.5
3 B 25/12/2016 18 5.5
这是另一种使用 explode
(需要 pandas 0.25+)和 groupby
的方法:
m = df[[isinstance(i,list) for i in df.replacement]] #df which has lists in replacement col
g = m.explode('replacement').groupby('date') #explode and groupby by date
#drop indices of m and assign the divided value
final = df.drop(m.index).set_index('date').assign(
replacement=(g['value'].mean()/g.size())).reset_index()
date name value replacement
0 20/11/2016 A 10.0 4.0
1 20/11/2016 B 12.0 4.0
2 25/12/2016 A 16.0 5.5
3 25/12/2016 B 18.0 5.5