迭代具有不相等的第二索引级别的多索引数据帧组

Iterate over multiindex dataframe groups with unequal second index level

我有一个 MultiIndex (Name, Date) DataFrame df 我需要通过 Date 迭代处理以便分配一个基于的值当前和上一个日期的组。

当我在不同的日期设置不同的 Name 时,我不知道该怎么做。我在下面的循环中用评论 HELP!!!:

标记了有问题的行
    df = pd.DataFrame(data=[('A', '20210101', 5.0),  # 1 Jan
                            ('B', '20210101', 3.0),
                            ('C', '20210101', 2.0),
                            ('A', '20210102', 0.0),  # 2 Jan
                            ('C', '20210102', 0.0),
                            ('A', '20210103', 0.0),  # 3 Jan
                            ('C', '20210103', 0.0),
                            ('D', '20210103', 0.0)],
                      columns=('Name', 'Date', 'Dollars')).set_index(['Name', 'Date'])

    # Logic: Each day total wealth sums to .
    # Each day: Each person starts with what he had the previous day; 
    #           excess wealth gets allocated evenly to everyone
    
    dft = df.groupby(df.index.get_level_values('Date'))
    dates = list(dft.groups.keys())
    # Initialize first group:
    previous = dft.get_group(dates[0])

    # Loop over groups in order:
    for date in dates[1:]:
        current = dft.get_group(date)
        current.Dollars = previous.Dollars  # << HELP!!!
        excess = 10.0 - current.Dollars.sum()
        current.Dollars = current.Dollars + excess / current.Dollars.count()
        # Assign the calculated values back to the DataFrame:
        df.loc[current.index] = current
        # Prepare for next iteration:
        previous = current

HELP!!! 点,我需要以某种方式完成以下操作:

  1. current.Dollars 进行类似左连接的赋值到 previous.Dollars忽略 Date 索引 currentprevious。即使我做了 .reset_index(level=1) 鉴于 Name 索引每天都在变化,我也看不出如何完成这项任务 – 请注意,在第二天的示例 df行名B,第三天有行名D.

  2. 在循环中执行逻辑时忽略 Date 级别后,我需要以某种方式恢复它以将结果分配回相应 df 的主控 Date.


在此示例中,循环结束后 df 的值应为:
               Dollars
Name Date             
A    20210101      5.0
B    20210101      3.0
C    20210101      2.0
A    20210102      6.5
C    20210102      3.5
A    20210103      6.5
C    20210103      3.5
D    20210103      0.0

如简化案例所示, 您必须操纵索引才能进行分配,然后将值返回到原始 DataFrame 中:

# Initialize first group, but take Date out of its index:
previous = dft.get_group(dates[0]).reset_index(level=1)

for date in dates[1:]:
    # Take the Date out of the index:
    current = dft.get_group(date).reset_index(level=1)

    # Get the common keys
    commonKeys = current.index.intersection(previous.index)

    # Do the "left assignment"
    current.loc[commonKeys, 'Dollars'] = previous.loc[commonKeys].Dollars

    # Do your calculations:
    excess = 10.0 - current.Dollars.sum()
    current.Dollars = current.Dollars + excess / current.Dollars.count()

    # Restore the Date column to the index:
    current.set_index('Date', append=True, inplace=True)

    # Assign the calculated values back to the DataFrame:
    df.loc[current.index] = current

    # Prepare for next iteration by removing the Date index again:
    previous = current.reset_index(level=1)