迭代具有不相等的第二索引级别的多索引数据帧组
Iterate over multiindex dataframe groups with unequal second index level
我有一个 MultiIndex (Name
, Date
) DataFrame df
我需要通过 Date
迭代处理以便分配一个基于的值当前和上一个日期的组。
当我在不同的日期设置不同的 Name
时,我不知道该怎么做。我在下面的循环中用评论 HELP!!!
:
标记了有问题的行
df = pd.DataFrame(data=[('A', '20210101', 5.0), # 1 Jan
('B', '20210101', 3.0),
('C', '20210101', 2.0),
('A', '20210102', 0.0), # 2 Jan
('C', '20210102', 0.0),
('A', '20210103', 0.0), # 3 Jan
('C', '20210103', 0.0),
('D', '20210103', 0.0)],
columns=('Name', 'Date', 'Dollars')).set_index(['Name', 'Date'])
# Logic: Each day total wealth sums to .
# Each day: Each person starts with what he had the previous day;
# excess wealth gets allocated evenly to everyone
dft = df.groupby(df.index.get_level_values('Date'))
dates = list(dft.groups.keys())
# Initialize first group:
previous = dft.get_group(dates[0])
# Loop over groups in order:
for date in dates[1:]:
current = dft.get_group(date)
current.Dollars = previous.Dollars # << HELP!!!
excess = 10.0 - current.Dollars.sum()
current.Dollars = current.Dollars + excess / current.Dollars.count()
# Assign the calculated values back to the DataFrame:
df.loc[current.index] = current
# Prepare for next iteration:
previous = current
在 HELP!!!
点,我需要以某种方式完成以下操作:
对 current.Dollars
进行类似左连接的赋值到 previous.Dollars
,忽略 Date
索引 current
和 previous
。即使我做了 .reset_index(level=1)
鉴于 Name
索引每天都在变化,我也看不出如何完成这项任务 – 请注意,在第二天的示例 df
中输行名B
,第三天有新行名D
.
在循环中执行逻辑时忽略 Date
级别后,我需要以某种方式恢复它以将结果分配回相应 df
的主控 Date
.
在此示例中,循环结束后 df 的值应为:
Dollars
Name Date
A 20210101 5.0
B 20210101 3.0
C 20210101 2.0
A 20210102 6.5
C 20210102 3.5
A 20210103 6.5
C 20210103 3.5
D 20210103 0.0
如简化案例所示,
您必须操纵索引才能进行分配,然后将值返回到原始 DataFrame 中:
# Initialize first group, but take Date out of its index:
previous = dft.get_group(dates[0]).reset_index(level=1)
for date in dates[1:]:
# Take the Date out of the index:
current = dft.get_group(date).reset_index(level=1)
# Get the common keys
commonKeys = current.index.intersection(previous.index)
# Do the "left assignment"
current.loc[commonKeys, 'Dollars'] = previous.loc[commonKeys].Dollars
# Do your calculations:
excess = 10.0 - current.Dollars.sum()
current.Dollars = current.Dollars + excess / current.Dollars.count()
# Restore the Date column to the index:
current.set_index('Date', append=True, inplace=True)
# Assign the calculated values back to the DataFrame:
df.loc[current.index] = current
# Prepare for next iteration by removing the Date index again:
previous = current.reset_index(level=1)
我有一个 MultiIndex (Name
, Date
) DataFrame df
我需要通过 Date
迭代处理以便分配一个基于的值当前和上一个日期的组。
当我在不同的日期设置不同的 Name
时,我不知道该怎么做。我在下面的循环中用评论 HELP!!!
:
df = pd.DataFrame(data=[('A', '20210101', 5.0), # 1 Jan
('B', '20210101', 3.0),
('C', '20210101', 2.0),
('A', '20210102', 0.0), # 2 Jan
('C', '20210102', 0.0),
('A', '20210103', 0.0), # 3 Jan
('C', '20210103', 0.0),
('D', '20210103', 0.0)],
columns=('Name', 'Date', 'Dollars')).set_index(['Name', 'Date'])
# Logic: Each day total wealth sums to .
# Each day: Each person starts with what he had the previous day;
# excess wealth gets allocated evenly to everyone
dft = df.groupby(df.index.get_level_values('Date'))
dates = list(dft.groups.keys())
# Initialize first group:
previous = dft.get_group(dates[0])
# Loop over groups in order:
for date in dates[1:]:
current = dft.get_group(date)
current.Dollars = previous.Dollars # << HELP!!!
excess = 10.0 - current.Dollars.sum()
current.Dollars = current.Dollars + excess / current.Dollars.count()
# Assign the calculated values back to the DataFrame:
df.loc[current.index] = current
# Prepare for next iteration:
previous = current
在 HELP!!!
点,我需要以某种方式完成以下操作:
对
current.Dollars
进行类似左连接的赋值到previous.Dollars
,忽略Date
索引current
和previous
。即使我做了.reset_index(level=1)
鉴于Name
索引每天都在变化,我也看不出如何完成这项任务 – 请注意,在第二天的示例df
中输行名B
,第三天有新行名D
.在循环中执行逻辑时忽略
Date
级别后,我需要以某种方式恢复它以将结果分配回相应df
的主控Date
.
在此示例中,循环结束后 df 的值应为:
Dollars
Name Date
A 20210101 5.0
B 20210101 3.0
C 20210101 2.0
A 20210102 6.5
C 20210102 3.5
A 20210103 6.5
C 20210103 3.5
D 20210103 0.0
如简化案例所示
# Initialize first group, but take Date out of its index:
previous = dft.get_group(dates[0]).reset_index(level=1)
for date in dates[1:]:
# Take the Date out of the index:
current = dft.get_group(date).reset_index(level=1)
# Get the common keys
commonKeys = current.index.intersection(previous.index)
# Do the "left assignment"
current.loc[commonKeys, 'Dollars'] = previous.loc[commonKeys].Dollars
# Do your calculations:
excess = 10.0 - current.Dollars.sum()
current.Dollars = current.Dollars + excess / current.Dollars.count()
# Restore the Date column to the index:
current.set_index('Date', append=True, inplace=True)
# Assign the calculated values back to the DataFrame:
df.loc[current.index] = current
# Prepare for next iteration by removing the Date index again:
previous = current.reset_index(level=1)