如何在 python 中创建数据框的子集?
How to create a subset of dataframe in python?
我有一个大型数据集(pandes 数据框),其中包含以下 headers
RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(´10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]
RCC = ['RUT1_Cycle_Counter']
现在我想从原始数据框中创建许多子集,如下所示。
subset_0
index,RUT1_Cycle_Counter, RUT1_Azi_0, RUT1_Dtctn_Probb_0, RUT1_Dtctn_ID_0, RUT1_Elev_0
subset_1
index,RUT1_Cycle_Counter, RUT1_Azi_1, RUT1_Dtctn_Probb_1, RUT1_Dtctn_ID_1, RUT1_Elev_1
.
.
.
subset_9
index,RUT1_Cycle_Counter, RUT1_Azi_9, RUT1_Dtctn_Probb_9, RUT1_Dtctn_ID_9, RUT1_Elev_9
如何在 python 中执行此操作?
我是 python
的初学者
非常感谢您
使用 pandas 您可以本地调用数据帧的子集
只要 list_of_subset_headers
是数据框列的子集,只需写
sub_df=df[list_of_subset_headers]
或者在这种情况下:
sub_df0=df[['RUT1_Azi_0', 'RUT1_Dtctn_Probb_0', 'RUT1_Dtctn_ID_0', 'RUT1_Elev_0']]
这是一个例子:
RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]
# made up example with the columns above
cols = RAM + RDP + RDI + REM
nrows = 10
df = pd.DataFrame(np.arange(nrows * len(cols)).reshape(nrows, -1), columns=cols)
现在:
subsets = [df[list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]
例如:
>>> subsets[5]
RUT1_Azi_5 RUT1_Dtctn_Probb_5 RUT1_Dtctn_ID_5 RUT1_Elev_5
0 5 15 25 35
1 45 55 65 75
2 85 95 105 115
3 125 135 145 155
4 165 175 185 195
5 205 215 225 235
6 245 255 265 275
7 285 295 305 315
8 325 335 345 355
9 365 375 385 395
编辑:修改答案以包括所有子集的通用列列表 (RCC = ['RUT1_Cycle_Counter']
):
subsets = [df[RCC + list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]
我有一个大型数据集(pandes 数据框),其中包含以下 headers
RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(´10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]
RCC = ['RUT1_Cycle_Counter']
现在我想从原始数据框中创建许多子集,如下所示。
subset_0
index,RUT1_Cycle_Counter, RUT1_Azi_0, RUT1_Dtctn_Probb_0, RUT1_Dtctn_ID_0, RUT1_Elev_0
subset_1
index,RUT1_Cycle_Counter, RUT1_Azi_1, RUT1_Dtctn_Probb_1, RUT1_Dtctn_ID_1, RUT1_Elev_1
.
.
.
subset_9
index,RUT1_Cycle_Counter, RUT1_Azi_9, RUT1_Dtctn_Probb_9, RUT1_Dtctn_ID_9, RUT1_Elev_9
如何在 python 中执行此操作? 我是 python
的初学者非常感谢您
使用 pandas 您可以本地调用数据帧的子集
只要 list_of_subset_headers
是数据框列的子集,只需写
sub_df=df[list_of_subset_headers]
或者在这种情况下:
sub_df0=df[['RUT1_Azi_0', 'RUT1_Dtctn_Probb_0', 'RUT1_Dtctn_ID_0', 'RUT1_Elev_0']]
这是一个例子:
RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]
# made up example with the columns above
cols = RAM + RDP + RDI + REM
nrows = 10
df = pd.DataFrame(np.arange(nrows * len(cols)).reshape(nrows, -1), columns=cols)
现在:
subsets = [df[list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]
例如:
>>> subsets[5]
RUT1_Azi_5 RUT1_Dtctn_Probb_5 RUT1_Dtctn_ID_5 RUT1_Elev_5
0 5 15 25 35
1 45 55 65 75
2 85 95 105 115
3 125 135 145 155
4 165 175 185 195
5 205 215 225 235
6 245 255 265 275
7 285 295 305 315
8 325 335 345 355
9 365 375 385 395
编辑:修改答案以包括所有子集的通用列列表 (RCC = ['RUT1_Cycle_Counter']
):
subsets = [df[RCC + list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]