满足 groupby 条件后添加行
Adding rows after groupby condition is met
我正在尝试为数据框中的列查找 20 行或更多行的连续负值的数量。但是,一旦它以 20 个或更多的块分组,我想在每个块之后添加相应的 30 行原始数据帧。
这是我的尝试(从此处发布的问题中获得帮助):
n = df['Slope'].lt(0)
mask = n.ne(n.shift()).cumsum()[n]
dfL = [g for i, g in df.groupby(mask) if (len(g[g['Slope'] < 0]) >= 20)]
df_cn = pd.concat(dfL)
我得到了连续负值的块,但我不知道现在如何在每个块之后添加相应的 30 行。
下次请尝试提供最小的可重现示例和所需输出的小样本
我创建了一个随机的 dfL,效果很好
n = df['Slope'].lt(0)
mask = n.ne(n.shift()).cumsum()[n]
dfL = [g for i, g in df.groupby(mask) if (len(g[g['Slope'] < 0]) >= 20)]
我从这里创建了代码:
for x in range(len(dfL)): # dfL is reaturning a list of dfs with each chunk
if len(dfL)>0: # here I want to be sure, that we have a chunk in the dfL
df_cn= dfL[x] # selecting chunk from dfL
print('Chunk: df_cn_' + str(x) + ' created') # feedback for testing
idx=dfL[x].index # last index from chunk # since chunk size >=20, we need to be sure to get the last index of it.
print('Chunk from ' + str(min(idx)) + ' to ' + str(max(idx)) + ' total ' + str(len(dfL[x]))+' indexes in the chunk') # feedback with size of chunk
df_rest=df.loc[max(idx)+1:max(idx)+31] # get the next 30 rows from original df based on max index from last chunk
df_cn_ext = pd.concat([df_cn, df_rest]) # concatenate (join on Y-Achse) the chunk and 30rows of original df, if the
exec(f'df_cn_ext_{x}=df_cn_ext[:]') # creating separated dataframes trough suffixes for each chunk + 30 rows groups
print('Dataframe df_cn_ext_' + str(x) + ' created from index ' + str(min(idx)) + ' to ' + str(max(idx)+31))
else:
print('no chunks in the df found')
请注意:
1- 我在新的 dfs 中用后缀 (df_cn_ext_suffix)
分隔了每个块+30 行
2-如果chunk的最后一个值接近dfL的末尾,它不会添加30行,而是增加可用的最大行数。
这里是我的代码的一些输出:
Chunk: df_cn_0 created
Chunk from 3 to 39 total 37 indexes in the chunk
Dataframe df_cn_ext_0 created from index 3 to 70
Chunk: df_cn_1 created
Chunk from 41 to 66 total 26 indexes in the chunk
Dataframe df_cn_ext_1 created from index 41 to 97
我正在尝试为数据框中的列查找 20 行或更多行的连续负值的数量。但是,一旦它以 20 个或更多的块分组,我想在每个块之后添加相应的 30 行原始数据帧。
这是我的尝试(从此处发布的问题中获得帮助):
n = df['Slope'].lt(0)
mask = n.ne(n.shift()).cumsum()[n]
dfL = [g for i, g in df.groupby(mask) if (len(g[g['Slope'] < 0]) >= 20)]
df_cn = pd.concat(dfL)
我得到了连续负值的块,但我不知道现在如何在每个块之后添加相应的 30 行。
下次请尝试提供最小的可重现示例和所需输出的小样本
我创建了一个随机的 dfL,效果很好
n = df['Slope'].lt(0)
mask = n.ne(n.shift()).cumsum()[n]
dfL = [g for i, g in df.groupby(mask) if (len(g[g['Slope'] < 0]) >= 20)]
我从这里创建了代码:
for x in range(len(dfL)): # dfL is reaturning a list of dfs with each chunk
if len(dfL)>0: # here I want to be sure, that we have a chunk in the dfL
df_cn= dfL[x] # selecting chunk from dfL
print('Chunk: df_cn_' + str(x) + ' created') # feedback for testing
idx=dfL[x].index # last index from chunk # since chunk size >=20, we need to be sure to get the last index of it.
print('Chunk from ' + str(min(idx)) + ' to ' + str(max(idx)) + ' total ' + str(len(dfL[x]))+' indexes in the chunk') # feedback with size of chunk
df_rest=df.loc[max(idx)+1:max(idx)+31] # get the next 30 rows from original df based on max index from last chunk
df_cn_ext = pd.concat([df_cn, df_rest]) # concatenate (join on Y-Achse) the chunk and 30rows of original df, if the
exec(f'df_cn_ext_{x}=df_cn_ext[:]') # creating separated dataframes trough suffixes for each chunk + 30 rows groups
print('Dataframe df_cn_ext_' + str(x) + ' created from index ' + str(min(idx)) + ' to ' + str(max(idx)+31))
else:
print('no chunks in the df found')
请注意:
1- 我在新的 dfs 中用后缀 (df_cn_ext_suffix)
分隔了每个块+30 行2-如果chunk的最后一个值接近dfL的末尾,它不会添加30行,而是增加可用的最大行数。
这里是我的代码的一些输出:
Chunk: df_cn_0 created
Chunk from 3 to 39 total 37 indexes in the chunk
Dataframe df_cn_ext_0 created from index 3 to 70
Chunk: df_cn_1 created
Chunk from 41 to 66 total 26 indexes in the chunk
Dataframe df_cn_ext_1 created from index 41 to 97