不同数据帧之间的条件
Conditions among different dataframes
A B C
0 2002-01-13 18 120
1 2002-01-13 7 150
2 2002-01-13 11 130
3 2002-01-13 26 140
4 2002-01-14 13 180
5 2002-01-14 25 165
6 2002-01-14 9 150
7 2002-01-14 4 190
我有这个df
。
我应用此代码:
df2 = df.loc[df['B'].sub(10).abs().groupby(df['A']).idxmin()]
这导致 df2
:
A B C
2 2002-01-13 11 130
6 2002-01-14 9 150
现在我想创建一个新的 df3,selecting df
中满足下一个条件的行,每个 A
组:
df["C"] = df2["C"] + 20
(如果是2002-01-13
组,130 + 20 = 150)。
- 如果
df
行中不存在满足df["C"] = df2["C"] + 20
的行,取第一个较低的值(如果是2002-01-14
组,150 + 20 = 170。因为170不存在,select 下一个,表示 165)。
df3
输出应该是:
A B C
1 2002-01-13 7 150
5 2002-01-14 25 165
您可以使用 merge_asof
pd.merge_asof(df1.sort_values('C'),df2.assign(C=df.C+20).sort_values('C'),on='C',by='A',direction ='forward').dropna().drop_duplicates('A',keep='last')
Out[553]:
A B_x C B_y
3 2002-01-13 7 150 11.0
5 2002-01-14 25 165 9.0
更新
pd.merge_asof(df1.sort_values('C').reset_index(),df2.assign(C=df2.C+20).sort_values('C'),on='C',by='A',direction ='forward').dropna().drop_duplicates('A',keep='last').set_index('index')
Out[606]:
A B_x C B_y
index
1 2002-01-13 7 150 11.0
5 2002-01-14 25 165 9.0
使用 lambda 和 if 语句。用于获取索引然后提取值。如果 +20 不匹配,则获取低于 C+20 的最大值。
复制示例的完整代码和改进:
导入 pandas 作为 pd
# build op data frame
df = pd.DataFrame(columns=['A', 'B', 'C'])
A = [pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'),
pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14')]
B = [18, 7, 11, 26, 13, 25, 9, 4]
C = [120, 150, 130, 140, 180, 165, 150, 190]
df['A'] = A
df['B'] = B
df['C'] = C
print(df)
# build df2
df2 = df.loc[df['B'].sub(10).abs().groupby(df['A']).idxmin()]
print(df2)
# find indices in df that meet op criteria
df_ind = df2.apply(lambda row: ((df.A == row.A) & (df.C == row.C+20)).idxmax() if ((df.A == row.A) & (df.C == row.C+20)).sum() > 0 else (df.C.loc[(df.C < row.C+20) & (df.A == row.A)]).idxmax(), axis=1)
print(df_ind)
2 1
6 5
# Build df3
df3 = df.loc[df_ind.tolist(), :]
print(df3)
结果:
A B C
1 2002-01-13 7 150
5 2002-01-14 25 165
A B C
0 2002-01-13 18 120
1 2002-01-13 7 150
2 2002-01-13 11 130
3 2002-01-13 26 140
4 2002-01-14 13 180
5 2002-01-14 25 165
6 2002-01-14 9 150
7 2002-01-14 4 190
我有这个df
。
我应用此代码:
df2 = df.loc[df['B'].sub(10).abs().groupby(df['A']).idxmin()]
这导致 df2
:
A B C
2 2002-01-13 11 130
6 2002-01-14 9 150
现在我想创建一个新的 df3,selecting df
中满足下一个条件的行,每个 A
组:
df["C"] = df2["C"] + 20
(如果是2002-01-13
组,130 + 20 = 150)。- 如果
df
行中不存在满足df["C"] = df2["C"] + 20
的行,取第一个较低的值(如果是2002-01-14
组,150 + 20 = 170。因为170不存在,select 下一个,表示 165)。
df3
输出应该是:
A B C
1 2002-01-13 7 150
5 2002-01-14 25 165
您可以使用 merge_asof
pd.merge_asof(df1.sort_values('C'),df2.assign(C=df.C+20).sort_values('C'),on='C',by='A',direction ='forward').dropna().drop_duplicates('A',keep='last')
Out[553]:
A B_x C B_y
3 2002-01-13 7 150 11.0
5 2002-01-14 25 165 9.0
更新
pd.merge_asof(df1.sort_values('C').reset_index(),df2.assign(C=df2.C+20).sort_values('C'),on='C',by='A',direction ='forward').dropna().drop_duplicates('A',keep='last').set_index('index')
Out[606]:
A B_x C B_y
index
1 2002-01-13 7 150 11.0
5 2002-01-14 25 165 9.0
使用 lambda 和 if 语句。用于获取索引然后提取值。如果 +20 不匹配,则获取低于 C+20 的最大值。
复制示例的完整代码和改进:
导入 pandas 作为 pd
# build op data frame
df = pd.DataFrame(columns=['A', 'B', 'C'])
A = [pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'), pd.Timestamp('2002-01-13'),
pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14'), pd.Timestamp('2002-01-14')]
B = [18, 7, 11, 26, 13, 25, 9, 4]
C = [120, 150, 130, 140, 180, 165, 150, 190]
df['A'] = A
df['B'] = B
df['C'] = C
print(df)
# build df2
df2 = df.loc[df['B'].sub(10).abs().groupby(df['A']).idxmin()]
print(df2)
# find indices in df that meet op criteria
df_ind = df2.apply(lambda row: ((df.A == row.A) & (df.C == row.C+20)).idxmax() if ((df.A == row.A) & (df.C == row.C+20)).sum() > 0 else (df.C.loc[(df.C < row.C+20) & (df.A == row.A)]).idxmax(), axis=1)
print(df_ind)
2 1
6 5
# Build df3
df3 = df.loc[df_ind.tolist(), :]
print(df3)
结果:
A B C
1 2002-01-13 7 150
5 2002-01-14 25 165