有没有更快的方法来根据索引值匹配和乘以 Dataframe 值？

Question

我有两个数据框：大小为 (1113, 7897) 的一个（多索引）在列中包含不同国家和部门的值，在行中包含不同的 ID，示例：

F_Frame:

     AT              BE            ...
     Food   Energy   Food   Energy ...
ID1  
ID2
...

在另一个数据框中（CC_LO）我有相应的国家和 ID 的因子值，我想与前一个数据框匹配（F_frame)，这样我就可以将 F_frame 中的值与 CC_LO 中的因子值相乘（如果它们按国家/地区和 ID 匹配）。如果它们不匹配，我输入零。

到目前为止，我的代码似乎可以工作，但运行速度非常慢。有没有更聪明的方法来匹配基于 index/header 名称的表？（代码在 49 个国家/地区循环，并为该国家/地区内的每 163 个部门乘以相同的系数）

LO_impacts = pd.DataFrame(np.zeros((1113,7987)))

for i in range(0, len(F_frame)): 
    for j in range(0, 49): 
        for k in range(0, len(CF_LO)): 
            if (F_frame.index.get_level_values(1)[i] == CF_LO.iloc[k,1] and 
                F_frame.columns.get_level_values(0)[j*163] == CF_LO.iloc[k,2]): 
                LO_impacts.iloc[i,(j*163):((j+1)*163)] = F_frame.iloc[i,(j*163):((j+1)*163)] * CF_LO.iloc[k,4] 
            else:
                LO_impacts.iloc[i,(j*163):((j+1)*163)] == 0

Answer 1

我做了两个数据帧，然后我为第二个数据帧设置了一个新索引，如下所示：

然后我使用函数 assign() 为 df2 创建了一个新列：

df2=df2.assign(gre_multiply=lambda x: x.gre*df1.gre)

don't forget to make df2=, i forgot it in the picture.

我得到了以下数据帧：

当然它看索引你可以使用计算器检查，它 returns 值是浮点数，现在很容易转换成 int df2.gre_multiply.astype(int) 但在此之前你需要 fillna 因为如果两个数据帧的索引不匹配它将 return Nan

df2.gre_multiply=df2.gre_multiply.fillna(0).astype(int)

Answer 2

import pandas as pd

# Creating dummy data
data = pd.DataFrame([
[2.0, 1.1, 6.7, 4.5],
[4.3, 5.7, 8.6, 9.0],
[5.5, 6.8, 9.0, 4.7],
[5.5, 6.8, 9.0, 4.7],
], index = ["S1", "S1", "S2", "S2"], columns = mindex)

mindex = pd.MultiIndex.from_product([["AT", "DK"], ["Food", "Energy"]])

mul_factor = pd.DataFrame({"Country": ['AT', 'DK', 'AT', 'DK'],
          "Value": [1.0, 0.8, 0.9, 0.6],
         }, index = ['S1', 'S1', 'S2', 'S2'])


new_data = data.copy()
new_data.columns = data.columns.to_frame()[0].to_list()

# Reshaping the second Dataframe
mat = mul_factor.reset_index().pivot(index = 'Country', columns='index')
mat.index.name = None
mat = mat.T.reset_index(0, drop = True)
mat.index.name = None

new_data.multiply(mat) # Required result

如果我误解了您的问题，请告诉我。您可能需要稍微修改代码以适应缺失的国家/地区值。

有没有更快的方法来根据索引值匹配和乘以 Dataframe 值？

Is there a faster way to match and multiply Dataframe values based on index values?

python

loops

match

dataframe

pandas