迭代具有不相等的第二索引级别的多索引数据帧组

Question

我有一个 MultiIndex (Name, Date) DataFrame df 我需要通过 Date 迭代处理以便分配一个基于的值当前和上一个日期的组。

当我在不同的日期设置不同的 Name 时，我不知道该怎么做。我在下面的循环中用评论 HELP!!!:

标记了有问题的行

    df = pd.DataFrame(data=[('A', '20210101', 5.0),  # 1 Jan
                            ('B', '20210101', 3.0),
                            ('C', '20210101', 2.0),
                            ('A', '20210102', 0.0),  # 2 Jan
                            ('C', '20210102', 0.0),
                            ('A', '20210103', 0.0),  # 3 Jan
                            ('C', '20210103', 0.0),
                            ('D', '20210103', 0.0)],
                      columns=('Name', 'Date', 'Dollars')).set_index(['Name', 'Date'])

    # Logic: Each day total wealth sums to .
    # Each day: Each person starts with what he had the previous day; 
    #           excess wealth gets allocated evenly to everyone
    
    dft = df.groupby(df.index.get_level_values('Date'))
    dates = list(dft.groups.keys())
    # Initialize first group:
    previous = dft.get_group(dates[0])

    # Loop over groups in order:
    for date in dates[1:]:
        current = dft.get_group(date)
        current.Dollars = previous.Dollars  # << HELP!!!
        excess = 10.0 - current.Dollars.sum()
        current.Dollars = current.Dollars + excess / current.Dollars.count()
        # Assign the calculated values back to the DataFrame:
        df.loc[current.index] = current
        # Prepare for next iteration:
        previous = current

在 HELP!!! 点，我需要以某种方式完成以下操作：

对 current.Dollars 进行类似左连接的赋值到 previous.Dollars，忽略 Date 索引 current 和 previous。即使我做了 .reset_index(level=1) 鉴于 Name 索引每天都在变化，我也看不出如何完成这项任务 – 请注意，在第二天的示例 df 中输行名B，第三天有新行名D.
在循环中执行逻辑时忽略 Date 级别后，我需要以某种方式恢复它以将结果分配回相应 df 的主控 Date.

_{在此示例中，循环结束后 df 的值应为：
Dollars
Name Date
A 20210101 5.0
B 20210101 3.0
C 20210101 2.0
A 20210102 6.5
C 20210102 3.5
A 20210103 6.5
C 20210103 3.5
D 20210103 0.0}

Answer 1

如简化案例所示，您必须操纵索引才能进行分配，然后将值返回到原始 DataFrame 中：

# Initialize first group, but take Date out of its index:
previous = dft.get_group(dates[0]).reset_index(level=1)

for date in dates[1:]:
    # Take the Date out of the index:
    current = dft.get_group(date).reset_index(level=1)

    # Get the common keys
    commonKeys = current.index.intersection(previous.index)

    # Do the "left assignment"
    current.loc[commonKeys, 'Dollars'] = previous.loc[commonKeys].Dollars

    # Do your calculations:
    excess = 10.0 - current.Dollars.sum()
    current.Dollars = current.Dollars + excess / current.Dollars.count()

    # Restore the Date column to the index:
    current.set_index('Date', append=True, inplace=True)

    # Assign the calculated values back to the DataFrame:
    df.loc[current.index] = current

    # Prepare for next iteration by removing the Date index again:
    previous = current.reset_index(level=1)

迭代具有不相等的第二索引级别的多索引数据帧组

Iterate over multiindex dataframe groups with unequal second index level

multi-index

dataframe

pandas