查找两个值并填写
Lookup on two values and fill in
我有一个数据框(我的真实数据框有 50 000 行和 34 列):
df = pd.DataFrame({
'NAME': ['APPLE COMPANY A', 'BANANA COMPANY B', 'ORANGE COMPANY C', 'APPLE COMPANY A'],
'INVESTMENTS': ['OIL LTD', 'GOLD LTD', 'GAS LTD', 'GAS LTD'],
'STOCKS' : [100, 200, 300, 400],
'OIL LTD': [0, 0, 0, 0],
'GOLD LTD': [0, 0, 0, 0],
'GAS LTD': [0, 0, 0, 0],
})
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 0 0 0
1 BANANA COMPANY B GOLD LTD 200 0 0 0
2 ORANGE COMPANY C GAS LTD 300 0 0 0
3 APPLE COMPANY A GAS LTD 400 0 0 0
如何根据 NAME
中的值和列名查找列 STOCKS
中的值?例如,对于列 OIL LTD
中的第一个值,它在列 NAME
中搜索 APPLE COMPANY A
并在列 [=18= 中搜索 OIL LTD
(基于同名列) ],它给出了值 100
并且可以在下面看到。因此,它搜索的值来自列名 OIL LTD
、GOLD LTD
、GAS LTD
等,基于 NAME
和 INVESTMENTS
.[=26 的值=]
我希望输出如下所示:
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 100 0 400
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 0 0 300
3 APPLE COMPANY A GAS LTD 400 0 0 400
如果我想查找一个值,我通常会使用 pd.merge()
,但不确定这是否适用于两个值。它适用于 Excel,但每列 运行 函数需要 15 分钟,效率不高。
如果最后一列仅由 0
填充,解决方案是 pivot
,然后删除列并最后加入:
df1 = df.pivot('NAME','INVESTMENTS','STOCKS').fillna(0).astype(int)
df = df.drop(df1.columns, axis=1).join(df1, on='NAME')
print (df)
NAME INVESTMENTS STOCKS GAS LTD GOLD LTD OIL LTD
0 APPLE COMPANY A OIL LTD 100 400 0 100
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 300 0 0
3 APPLE COMPANY A GAS LTD 400 400 0 100
如果需要像原始 DataFrame 中一样的列顺序:
cols = df.columns.drop(['NAME','INVESTMENTS','STOCKS'])
df1 = df.pivot('NAME','INVESTMENTS','STOCKS').fillna(0).astype(int)[cols]
df = df.drop(df1.columns, axis=1).join(df1, on='NAME')
print (df)
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 100 0 400
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 0 0 300
3 APPLE COMPANY A GAS LTD 400 100 0 400
我有一个数据框(我的真实数据框有 50 000 行和 34 列):
df = pd.DataFrame({
'NAME': ['APPLE COMPANY A', 'BANANA COMPANY B', 'ORANGE COMPANY C', 'APPLE COMPANY A'],
'INVESTMENTS': ['OIL LTD', 'GOLD LTD', 'GAS LTD', 'GAS LTD'],
'STOCKS' : [100, 200, 300, 400],
'OIL LTD': [0, 0, 0, 0],
'GOLD LTD': [0, 0, 0, 0],
'GAS LTD': [0, 0, 0, 0],
})
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 0 0 0
1 BANANA COMPANY B GOLD LTD 200 0 0 0
2 ORANGE COMPANY C GAS LTD 300 0 0 0
3 APPLE COMPANY A GAS LTD 400 0 0 0
如何根据 NAME
中的值和列名查找列 STOCKS
中的值?例如,对于列 OIL LTD
中的第一个值,它在列 NAME
中搜索 APPLE COMPANY A
并在列 [=18= 中搜索 OIL LTD
(基于同名列) ],它给出了值 100
并且可以在下面看到。因此,它搜索的值来自列名 OIL LTD
、GOLD LTD
、GAS LTD
等,基于 NAME
和 INVESTMENTS
.[=26 的值=]
我希望输出如下所示:
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 100 0 400
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 0 0 300
3 APPLE COMPANY A GAS LTD 400 0 0 400
如果我想查找一个值,我通常会使用 pd.merge()
,但不确定这是否适用于两个值。它适用于 Excel,但每列 运行 函数需要 15 分钟,效率不高。
如果最后一列仅由 0
填充,解决方案是 pivot
,然后删除列并最后加入:
df1 = df.pivot('NAME','INVESTMENTS','STOCKS').fillna(0).astype(int)
df = df.drop(df1.columns, axis=1).join(df1, on='NAME')
print (df)
NAME INVESTMENTS STOCKS GAS LTD GOLD LTD OIL LTD
0 APPLE COMPANY A OIL LTD 100 400 0 100
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 300 0 0
3 APPLE COMPANY A GAS LTD 400 400 0 100
如果需要像原始 DataFrame 中一样的列顺序:
cols = df.columns.drop(['NAME','INVESTMENTS','STOCKS'])
df1 = df.pivot('NAME','INVESTMENTS','STOCKS').fillna(0).astype(int)[cols]
df = df.drop(df1.columns, axis=1).join(df1, on='NAME')
print (df)
NAME INVESTMENTS STOCKS OIL LTD GOLD LTD GAS LTD
0 APPLE COMPANY A OIL LTD 100 100 0 400
1 BANANA COMPANY B GOLD LTD 200 0 200 0
2 ORANGE COMPANY C GAS LTD 300 0 0 300
3 APPLE COMPANY A GAS LTD 400 100 0 400