如何删除 pandas 中的行,在一列上匹配并在另一列上满足方程式?
How to drop rows in pandas matсhing on a columns and satisfying an equation on another column?
我正在处理一些包含冲销的财务数据。反转基本上是 table 中的修正,通过在 table 中添加等量的相反符号来抵消 table 中的另一个值。我的工作是清除这些价值观。
以这个数据框为例:
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2,7],
"xtra_col":["X","X","X","X","X"]})
a b xtra_col
0 a -2 X
1 b 5 X
2 c 2 X
3 a 2 X
4 a 7 X
在这种情况下,第 3 行是第 0 行的反转,必须删除它们。同时,第 2 行不是第 0 行的反转,尽管值相反,因为它们在 a 列上不匹配。
结果必须如此。
a b xtra_col
0 b 5 X
1 c 2 X
2 a 7 X
问题是,如何从我的 table 中删除此类逆转?我看过 drop_duplicates()
的子集作为 a 和 b,但这行不通,因为它只会匹配相同的值,但不会匹配相反的值。
我觉得我可以用 groupby
实现一些东西,但我不确定如何组织它。
补充说明,它也适用于负值数量为奇数的情况。考虑以下情况,输出应如下所示:
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2.0,-2],
"xtra_col":["X","X","X","X","X"]})
a b xtra_col
0 a -2.0 X
1 b 5.0 X
2 c 2.0 X
3 a 2.0 X
4 a -2.0 X
输出应该是:
a b xtra_col
1 b 5.0 X
2 c 2.0 X
3 1 -2.0 X
如果只有一个数字列 b
是可能的,则创建过滤后的 DataFrame,通过多个 -1
反转 b
并通过 DataFrame.merge
, last filter out index values by Series.isin
and boolean indexing
:
匹配行
df1 = df[df['b'].lt(0)].copy()
df1['b'] *= -1
df2 = df1.reset_index().merge(df.reset_index(), on=['a','b']).filter(like='index_')
print (df2)
index_x index_y
0 0 3
df = df[~df.index.isin(df2.values.ravel())]
print (df)
a b xtra_col
1 b 5 X
2 c 2 X
4 a 7 X
如果可能,另一个 a 2
行,您需要避免删除它(因为不与另一个 a -2
配对)添加 GroupBy.cumcount
用于筛选和原始 DataFrame
中的计数器列:
df = pd.DataFrame({"a":["a","b","c","a","a",'a'],
"b":[-2,5,2,2,7,2],
"xtra_col":["X","X","X","X","X",'X']})
df1 = df[df['b'].lt(0)].copy()
c = df1.select_dtypes(np.number).columns
df1[c] *= -1
df1['g'] = df1.groupby(['a','b']).cumcount()
df['g'] = df.groupby(['a','b']).cumcount()
df2 = df1.reset_index().merge(df.reset_index(), on=['a','b','g']).filter(like='index_')
print (df2)
df = df[~df.index.isin(df2.values.ravel())]
print (df)
a b xtra_col g
1 b 5 X 0
2 c 2 X 0
4 a 7 X 0
5 a 2 X 1
这是使用 apply
查找无效行并将其删除的另一种方法:
# Import module
import pandas as pd
# Your data
df = pd.DataFrame({"a": ["a", "b", "c", "a", "a"],
"b": [-2, 5, 2, 2, 7],
"xtra_col": ["X", "X", "X", "X", "X"]})
# Filtering function
def filter_row(row):
# Your condition comparing the current row with the whole dataframe
if sum((df.a == row.a) & (df.b == -row.b)) == 1:
return row
# Apply the filter method
row_to_remove = df.apply(filter_row, axis=1)
print(row_to_remove) # You can use drop NA to remove NA rows
# a b xtra_col
# 0 a - 2.0 X
# 1 None NaN None
# 2 None NaN None
# 3 a 2.0 X
# 4 None NaN None
# Drop invalid rows
result = df[(df != row_to_remove).any(axis=1)]
print(result)
# a b xtra_col
# 1 b 5 X
# 2 c 2 X
# 4 a 7 X
在python中使用SQL的力量。在这里,您将 table (数据框)连接到自身,同时检查列 a
相同且列 b
颠倒的情况。使用 where
子句,您可以进行过滤。
请参阅下面的模型:
import sqlite3
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2,7],
"xtra_col":["X","X","X","X","X"]})
#Make the db in memory
conn = sqlite3.connect(':memory:')
df.to_sql('tab', conn, index=False)
qry = '''
select
tab1.a,tab1.b,tab1.xtra_col
from
tab as tab1
left join tab as tab2 on
tab1.a =tab2.a
and
tab1.b = -tab2.b
where tab2.a is null
'''
dfres = pd.read_sql_query(qry, conn)
dfres
结果在这里:
a b xtra_col
0 b 5 X
1 c 2 X
2 a 7 X
我正在处理一些包含冲销的财务数据。反转基本上是 table 中的修正,通过在 table 中添加等量的相反符号来抵消 table 中的另一个值。我的工作是清除这些价值观。 以这个数据框为例:
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2,7],
"xtra_col":["X","X","X","X","X"]})
a b xtra_col
0 a -2 X
1 b 5 X
2 c 2 X
3 a 2 X
4 a 7 X
在这种情况下,第 3 行是第 0 行的反转,必须删除它们。同时,第 2 行不是第 0 行的反转,尽管值相反,因为它们在 a 列上不匹配。 结果必须如此。
a b xtra_col
0 b 5 X
1 c 2 X
2 a 7 X
问题是,如何从我的 table 中删除此类逆转?我看过 drop_duplicates()
的子集作为 a 和 b,但这行不通,因为它只会匹配相同的值,但不会匹配相反的值。
我觉得我可以用 groupby
实现一些东西,但我不确定如何组织它。
补充说明,它也适用于负值数量为奇数的情况。考虑以下情况,输出应如下所示:
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2.0,-2],
"xtra_col":["X","X","X","X","X"]})
a b xtra_col
0 a -2.0 X
1 b 5.0 X
2 c 2.0 X
3 a 2.0 X
4 a -2.0 X
输出应该是:
a b xtra_col
1 b 5.0 X
2 c 2.0 X
3 1 -2.0 X
如果只有一个数字列 b
是可能的,则创建过滤后的 DataFrame,通过多个 -1
反转 b
并通过 DataFrame.merge
, last filter out index values by Series.isin
and boolean indexing
:
df1 = df[df['b'].lt(0)].copy()
df1['b'] *= -1
df2 = df1.reset_index().merge(df.reset_index(), on=['a','b']).filter(like='index_')
print (df2)
index_x index_y
0 0 3
df = df[~df.index.isin(df2.values.ravel())]
print (df)
a b xtra_col
1 b 5 X
2 c 2 X
4 a 7 X
如果可能,另一个 a 2
行,您需要避免删除它(因为不与另一个 a -2
配对)添加 GroupBy.cumcount
用于筛选和原始 DataFrame
中的计数器列:
df = pd.DataFrame({"a":["a","b","c","a","a",'a'],
"b":[-2,5,2,2,7,2],
"xtra_col":["X","X","X","X","X",'X']})
df1 = df[df['b'].lt(0)].copy()
c = df1.select_dtypes(np.number).columns
df1[c] *= -1
df1['g'] = df1.groupby(['a','b']).cumcount()
df['g'] = df.groupby(['a','b']).cumcount()
df2 = df1.reset_index().merge(df.reset_index(), on=['a','b','g']).filter(like='index_')
print (df2)
df = df[~df.index.isin(df2.values.ravel())]
print (df)
a b xtra_col g
1 b 5 X 0
2 c 2 X 0
4 a 7 X 0
5 a 2 X 1
这是使用 apply
查找无效行并将其删除的另一种方法:
# Import module
import pandas as pd
# Your data
df = pd.DataFrame({"a": ["a", "b", "c", "a", "a"],
"b": [-2, 5, 2, 2, 7],
"xtra_col": ["X", "X", "X", "X", "X"]})
# Filtering function
def filter_row(row):
# Your condition comparing the current row with the whole dataframe
if sum((df.a == row.a) & (df.b == -row.b)) == 1:
return row
# Apply the filter method
row_to_remove = df.apply(filter_row, axis=1)
print(row_to_remove) # You can use drop NA to remove NA rows
# a b xtra_col
# 0 a - 2.0 X
# 1 None NaN None
# 2 None NaN None
# 3 a 2.0 X
# 4 None NaN None
# Drop invalid rows
result = df[(df != row_to_remove).any(axis=1)]
print(result)
# a b xtra_col
# 1 b 5 X
# 2 c 2 X
# 4 a 7 X
在python中使用SQL的力量。在这里,您将 table (数据框)连接到自身,同时检查列 a
相同且列 b
颠倒的情况。使用 where
子句,您可以进行过滤。
请参阅下面的模型:
import sqlite3
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["a","b","c","a","a"],
"b":[-2,5,2,2,7],
"xtra_col":["X","X","X","X","X"]})
#Make the db in memory
conn = sqlite3.connect(':memory:')
df.to_sql('tab', conn, index=False)
qry = '''
select
tab1.a,tab1.b,tab1.xtra_col
from
tab as tab1
left join tab as tab2 on
tab1.a =tab2.a
and
tab1.b = -tab2.b
where tab2.a is null
'''
dfres = pd.read_sql_query(qry, conn)
dfres
结果在这里:
a b xtra_col
0 b 5 X
1 c 2 X
2 a 7 X