根据另一个数据框中的其他列创建新列
Creating a new column based on other columns from another dataframe
我有 2 个数据帧:
df1
Name Apples Pears Grapes Peachs
James 3 5 5 2
Harry 1 0 2 9
Will 20 2 7 3
df2
Class User Factor
A Harry 3
A Will 2
A James 5
B NaN 4
我想在 df2 中创建一个名为 Total
的新列,它是 df1 中每个用户的所有列的列表,乘以该用户的因子 - 只有当他们是在 Class A.
最终的 df 应该是这样的
df2
Class User Factor Total
A Harry 3 [3,0,6,27]
A Will 2 [40,4,14,6]
A James 5 [15,25,25,10]
B NaN 4
这是我试过的:
df2['Total'] = list(df1.Name.isin((df2.User) and (df2.Class==A)) * df2.Factor)
您可以使用:
# First lookup
factor = df2[df2['Class'] == 'A'].set_index('User')['Factor']
df1['Total'] = df1[cols].mul(df1['Name'].map(factor), axis=0).agg(list, axis=1)
# Second lookup
df2['Total'] = df2['User'].map(df1.set_index('Name')['Total'])
输出:
>>> df2
Class User Factor Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]
3 B NaN 4 NaN
>>> df1
Name Apples Pears Grapes Peachs Total
0 James 3 5 5 2 [15, 25, 25, 10]
1 Harry 1 0 2 9 [3, 0, 6, 27]
2 Will 20 2 7 3 [40, 4, 14, 6]
这将解决您的问题:
df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
完整测试代码:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name Apples Pears Grapes Peachs'.split()], data =[
['James', 3, 5, 5, 2],
['Harry', 1, 0, 2, 9],
['Will', 20, 2, 7, 3]])
print(df)
df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class User Factor'.split()], data =[
['A', 'Harry', 3],
['A', 'Will', 2],
['A', 'James', 5],
['B', np.nan, 4]])
print(df2)
df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)
输入:
Name Apples Pears Grapes Peachs
0 James 3 5 5 2
1 Harry 1 0 2 9
2 Will 20 2 7 3
Class User Factor
0 A Harry 3
1 A Will 2
2 A James 5
3 B NaN 4
输出
Class User Factor Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]
On-liner 受虐狂,问候 ;)
df2['Total'] = pd.Series(df1.sort_values(by='Name').reset_index(drop=True).iloc[:,1:5]\
.mul(df2[df2.Class == 'A'].sort_values(by='User')['Factor'].reset_index(drop=True), axis=0)\
.values.tolist())
df2
输出:
index
Class
User
Factor
Total
0
A
Harry
3
3,0,6,27
1
A
Will
2
15,25,25,10
2
A
James
5
40,4,14,6
3
B
NaN
4
NaN
我有 2 个数据帧:
df1
Name Apples Pears Grapes Peachs
James 3 5 5 2
Harry 1 0 2 9
Will 20 2 7 3
df2
Class User Factor
A Harry 3
A Will 2
A James 5
B NaN 4
我想在 df2 中创建一个名为 Total
的新列,它是 df1 中每个用户的所有列的列表,乘以该用户的因子 - 只有当他们是在 Class A.
最终的 df 应该是这样的
df2
Class User Factor Total
A Harry 3 [3,0,6,27]
A Will 2 [40,4,14,6]
A James 5 [15,25,25,10]
B NaN 4
这是我试过的:
df2['Total'] = list(df1.Name.isin((df2.User) and (df2.Class==A)) * df2.Factor)
您可以使用:
# First lookup
factor = df2[df2['Class'] == 'A'].set_index('User')['Factor']
df1['Total'] = df1[cols].mul(df1['Name'].map(factor), axis=0).agg(list, axis=1)
# Second lookup
df2['Total'] = df2['User'].map(df1.set_index('Name')['Total'])
输出:
>>> df2
Class User Factor Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]
3 B NaN 4 NaN
>>> df1
Name Apples Pears Grapes Peachs Total
0 James 3 5 5 2 [15, 25, 25, 10]
1 Harry 1 0 2 9 [3, 0, 6, 27]
2 Will 20 2 7 3 [40, 4, 14, 6]
这将解决您的问题:
df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
完整测试代码:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=[
x.strip() for x in 'Name Apples Pears Grapes Peachs'.split()], data =[
['James', 3, 5, 5, 2],
['Harry', 1, 0, 2, 9],
['Will', 20, 2, 7, 3]])
print(df)
df2 = pd.DataFrame(columns=[
x.strip() for x in 'Class User Factor'.split()], data =[
['A', 'Harry', 3],
['A', 'Will', 2],
['A', 'James', 5],
['B', np.nan, 4]])
print(df2)
df2 = df2[df2.Class=='A'].join(df.set_index('Name'), on='User').set_index(['Class','User'])
df2['Total'] = df2.apply(lambda x: list(x * x.Factor)[1:], axis=1)
df2 = df2.reset_index()[['Class','User','Factor','Total']]
print(df2)
输入:
Name Apples Pears Grapes Peachs
0 James 3 5 5 2
1 Harry 1 0 2 9
2 Will 20 2 7 3
Class User Factor
0 A Harry 3
1 A Will 2
2 A James 5
3 B NaN 4
输出
Class User Factor Total
0 A Harry 3 [3, 0, 6, 27]
1 A Will 2 [40, 4, 14, 6]
2 A James 5 [15, 25, 25, 10]
On-liner 受虐狂,问候 ;)
df2['Total'] = pd.Series(df1.sort_values(by='Name').reset_index(drop=True).iloc[:,1:5]\
.mul(df2[df2.Class == 'A'].sort_values(by='User')['Factor'].reset_index(drop=True), axis=0)\
.values.tolist())
df2
输出:
index | Class | User | Factor | Total |
---|---|---|---|---|
0 | A | Harry | 3 | 3,0,6,27 |
1 | A | Will | 2 | 15,25,25,10 |
2 | A | James | 5 | 40,4,14,6 |
3 | B | NaN | 4 | NaN |