如何将 Pandas DataFrame 拆分为 sub-arrays(详细概述了特定用例)?

How do I split a Pandas DataFrame into sub-arrays (specific use case outlined in detail)?

对于这个标题,我深表歉意,但我的知识还不足以将我的问题正确地浓缩成一行。

这是用例:

我已经完成了 list-of-dataframes,用 While 语句对原始文件进行了分块。但是,我觉得这是低效的,应该有一种方法可以使用 np.split() 或 df.groupby() 以及时间戳列表来更优雅、更高效地执行此操作。再一次,我可能是错的。

所以我想我的问题可以归结为:“实现上述目标的最省时的方法、最优雅的演示文稿是什么”?

@KU99 提到提供示例和输出:

df=

colA colB timestamp
First row 1
Second row 2
First row 3
Second row 4
First row 5
Second row 6
First row 7
Second row 8
First row 9
Second row 10
First row 11
Second row 12

列表 = [3、7、8、9]

输出=

colA colB timestamp
First row 1
Second row 2
colA colB timestamp
First row 3
Second row 4
First row 5
Second row 6
colA colB timestamp
First row 7
colA colB timestamp
Second row 8
colA colB timestamp
First row 9
Second row 10
First row 11
Second row 12

输出类型将取决于方法,但我不关心它是列表、字典还是其他可索引类型。

尝试 pd.cut + .groupby:

bins = [3, 7, 8, 9]

for _, g in df.groupby(
    pd.cut(df.timestamp, [float("-inf")] + bins + [float("+inf")], right=False)
):
    print(g)
    print("-" * 80)

打印:

     colA colB  timestamp
0   First  row          1
1  Second  row          2
--------------------------------------------------------------------------------
     colA colB  timestamp
2   First  row          3
3  Second  row          4
4   First  row          5
5  Second  row          6
--------------------------------------------------------------------------------
    colA colB  timestamp
6  First  row          7
--------------------------------------------------------------------------------
     colA colB  timestamp
7  Second  row          8
--------------------------------------------------------------------------------
      colA colB  timestamp
8    First  row          9
9   Second  row         10
10   First  row         11
11  Second  row         12
--------------------------------------------------------------------------------