KeyError: ('1', 'occurred at index 0')

KeyError: ('1', 'occurred at index 0')

我真的是新手 python,我正在使用以下数据帧:

    data1 = {'Store_ID':['1','1','1','1','2','2','2','3','3'],
             'YearMonth':[201801,201802,201805,201904,201812,201902,201906,201904,201907],
             'AVG_Rating':[5.0,4.5,4.0,3.5,3.0,4.5,4.0,2.5,4.0]}

    df1 = pd.DataFrame(data1)
--------------------AVG_Rating
Store_ID    AnoMes  
1           201801  5.0
            201802  4.5
            201805  4.0
            201904  3.5
2           201812  3.0
            201902  4.5
            201906  4.0
3           201904  2.5
            201907  4.0
    data2 = {'Client_ID':['1212','1234','1122','1230'],
             'Store_ID':['1','1','2','3'],
             'YearMonth':[201804,201906,201904,201906]}
------------Client_ID---YearMonth
Store_ID        
1           1212        201804
1           1234        201906
2           1122        201904
3           1230        201906

我通过 Store_ID 列将索引设置为两个 DF。

我必须根据 YearMonth 列合并来自 DF1 的最新 AVG_Rating,这是客户在商店购买的年份月份。我的最终数据框必须是:

--------Client_ID----年月-----AVG_Rating Store_ID
1 1212 201804 4.5(201802 评级)

为此,我尝试使用更多应用函数下面的函数,但出现错误:

    def get_previous_loja_rating(row):
        loja = df1[row['Loja_ID']]
        lst = loja[loja['AnoMes']] < df2[row['AnoMes']]
        return lst[-1]

    df2['PREVIOUS_RATING_MEAN'] = df1['AnoMes'].apply(get_previous_loja_rating,axis=1)

KeyError: ('Loja_ID', 'occurred at index 1')

有人可以帮助我吗?

您似乎在尝试在代码中使用西班牙语键名(Loja_IDAnoMes 等),而您的数据使用英语。您将要将它们更改为 Client_IDYearMonth.

我将使用 YearMonth 作为列名而不是 AnoMes。您的代码功能失败的原因有多种。 据我了解,您希望添加一个 avg rating 列,其中包含相应商店的最近 yearmonth 的值。

df1
Store_ID    YearMonth   AVG_Rating
0   1   201801  5.0
1   1   201802  4.5
2   1   201805  4.0
3   1   201904  3.5
4   2   201812  3.0
df2
Client_ID   Store_ID    YearMonth
0   1212    1   201804
1   1234    1   201906
2   1122    2   201904
3   1230    3   201906


def get_previous_loja_rating(row):
    loja = df1[df1['Store_ID']==row['Store_ID']]
    lst = [i for i in loja['YearMonth'] if i <= row['YearMonth']] #list of all yearmonth values less than or equal to client's yearmonth
    return df1[(df1['YearMonth']==max(lst))&(df1['Store_ID']==row['Store_ID'])]['AVG_Rating'].iloc[0] # avg rating of the most recent yearmonth

df2['AVG_Rating'] = df2.apply(get_previous_loja_rating,axis=1)

df2
Client_ID   Store_ID    YearMonth   AVG_Rating
0   1212    1   201804  4.5
1   1234    1   201906  3.5
2   1122    2   201904  4.5
3   1230    3   201906  2.5

这会将最接近的年月平均评分输入到您的客户数据框中