如何在 python 中创建数据框的子集?

How to create a subset of dataframe in python?

我有一个大型数据集(pandes 数据框),其中包含以下 headers
RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(´10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]
RCC = ['RUT1_Cycle_Counter']

现在我想从原始数据框中创建许多子集,如下所示。

subset_0
index,RUT1_Cycle_Counter, RUT1_Azi_0, RUT1_Dtctn_Probb_0, RUT1_Dtctn_ID_0, RUT1_Elev_0

subset_1
index,RUT1_Cycle_Counter, RUT1_Azi_1, RUT1_Dtctn_Probb_1, RUT1_Dtctn_ID_1, RUT1_Elev_1
.
.
.
subset_9
index,RUT1_Cycle_Counter, RUT1_Azi_9, RUT1_Dtctn_Probb_9, RUT1_Dtctn_ID_9, RUT1_Elev_9

如何在 python 中执行此操作? 我是 python

的初学者

非常感谢您

使用 pandas 您可以本地调用数据帧的子集 只要 list_of_subset_headers 是数据框列的子集,只需写

sub_df=df[list_of_subset_headers]

或者在这种情况下:

sub_df0=df[['RUT1_Azi_0', 'RUT1_Dtctn_Probb_0', 'RUT1_Dtctn_ID_0', 'RUT1_Elev_0']]

这是一个例子:

RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]

# made up example with the columns above
cols = RAM + RDP + RDI + REM
nrows = 10
df = pd.DataFrame(np.arange(nrows * len(cols)).reshape(nrows, -1), columns=cols)

现在:

subsets = [df[list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]

例如:

>>> subsets[5]
   RUT1_Azi_5  RUT1_Dtctn_Probb_5  RUT1_Dtctn_ID_5  RUT1_Elev_5
0           5                  15               25           35
1          45                  55               65           75
2          85                  95              105          115
3         125                 135              145          155
4         165                 175              185          195
5         205                 215              225          235
6         245                 255              265          275
7         285                 295              305          315
8         325                 335              345          355
9         365                 375              385          395

编辑:修改答案以包括所有子集的通用列列表 (RCC = ['RUT1_Cycle_Counter']):

subsets = [df[RCC + list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]