创建可变大小列表的 pandas 数据框列

Question

我需要创建一个数据框，其中包含某些机器的调度程序的所有可能开始时间。我的初始数据框 (msDF) 包含三个简单的列：

MachID - 每台机器的ID
开始 - 机器可用于调度的开始日期时间
slots - 从那个时间开始可用的插槽数

msDF 是从主数据帧复制的，但为了便于说明，它可能看起来像这样：

msDF = pd.DataFrame({ 'MachID': [1,2,3,4,5],
                      'Start': ["02/04/2021 9:00","06/04/2021 12:30","09/04/2021 10:00", \
                                "12/04/2021 11:00","15/04/2021 08:00"],
                      'slots': [2, 3, 4, 3, 1]})

	MachID	Start	slots
0	1	02/04/2021 9:00	2
1	2	06/04/2021 12:30	3
2	3	09/04/2021 10:00	4
3	1	12/04/2021 11:00	3
4	1	15/04/2021 08:00	1

我需要分解此数据帧，以便每一行都使用 slotIndex 重复“槽”次。期望的输出是：

	MachID	Start	slots	SlotIndex
0	1	02/04/2021 9:00	2	0
0	1	02/04/2021 9:00	2	1
1	2	06/04/2021 12:30	3	0
1	2	06/04/2021 12:30	3	1
1	2	06/04/2021 12:30	3	2

我的方法有问题。我正在将可变长度列表创建到 SlotIndex 中并展开它们，但这会产生警告。

为此，我使用：

msDF['SlotIndex'] = None
for x in msDF.index:
    msDF.SlotIndex.loc[x] = list(range(msDF.loc[x,'slots']))

有效但有警告： SettingWithCopyWarning：试图在 DataFrame 的切片副本上设置一个值

我后来对msDF进行了爆破得到我想要的结果：

msDF = msDF.explode('SlotIndex')

如何改进？

Answer 1

使用重复。

df.loc[df.index.repeat(df.slots)]

索引将重复。所以你可以用它来设置插槽 ID。

df['slot_id'] = 1
df['slot_id'] = df.groupby(df.index)['slot_id'].transform('cumsum')

创建可变大小列表的 pandas 数据框列

Create a pandas dataframe column of variable sized lists

python

explode

dataframe

pandas