如何删除 pandas 中的行,在一列上匹配并在另一列上满足方程式?

How to drop rows in pandas matсhing on a columns and satisfying an equation on another column?

我正在处理一些包含冲销的财务数据。反转基本上是 table 中的修正,通过在 table 中添加等量的相反符号来抵消 table 中的另一个值。我的工作是清除这些价值观。 以这个数据框为例:

df = pd.DataFrame({"a":["a","b","c","a","a"],
                  "b":[-2,5,2,2,7],
                 "xtra_col":["X","X","X","X","X"]})

    a   b   xtra_col
0   a   -2  X
1   b   5   X
2   c   2   X
3   a   2   X
4   a   7   X

在这种情况下,第 3 行是第 0 行的反转,必须删除它们。同时,第 2 行不是第 0 行的反转,尽管值相反,因为它们在 a 列上不匹配。 结果必须如此。

    a   b   xtra_col
0   b   5   X
1   c   2   X
2   a   7   X

问题是,如何从我的 table 中删除此类逆转?我看过 drop_duplicates() 的子集作为 a 和 b,但这行不通,因为它只会匹配相同的值,但不会匹配相反的值。 我觉得我可以用 groupby 实现一些东西,但我不确定如何组织它。

补充说明,它也适用于负值数量为奇数的情况。考虑以下情况,输出应如下所示:

df = pd.DataFrame({"a":["a","b","c","a","a"],
                  "b":[-2,5,2,2.0,-2],
                 "xtra_col":["X","X","X","X","X"]})


a   b   xtra_col
0   a   -2.0    X
1   b   5.0 X
2   c   2.0 X
3   a   2.0 X
4   a   -2.0    X

输出应该是:

a   b   xtra_col
1   b   5.0 X
2   c   2.0 X
3   1   -2.0    X

如果只有一个数字列 b 是可能的,则创建过滤后的 DataFrame,通过多个 -1 反转 b 并通过 DataFrame.merge, last filter out index values by Series.isin and boolean indexing:

匹配行
df1 = df[df['b'].lt(0)].copy()
df1['b'] *= -1

df2 = df1.reset_index().merge(df.reset_index(), on=['a','b']).filter(like='index_')
print (df2)
   index_x  index_y
0        0        3

df = df[~df.index.isin(df2.values.ravel())]
print (df)
   a  b xtra_col
1  b  5        X
2  c  2        X
4  a  7        X

如果可能,另一个 a 2 行,您需要避免删除它(因为不与另一个 a -2 配对)添加 GroupBy.cumcount 用于筛选和原始 DataFrame 中的计数器列:

df = pd.DataFrame({"a":["a","b","c","a","a",'a'],
                  "b":[-2,5,2,2,7,2],
                 "xtra_col":["X","X","X","X","X",'X']})


df1 = df[df['b'].lt(0)].copy()
c = df1.select_dtypes(np.number).columns
df1[c] *= -1

df1['g'] = df1.groupby(['a','b']).cumcount()
df['g'] = df.groupby(['a','b']).cumcount()
df2 = df1.reset_index().merge(df.reset_index(), on=['a','b','g']).filter(like='index_')
print (df2)


df = df[~df.index.isin(df2.values.ravel())]
print (df)
   a  b xtra_col  g
1  b  5        X  0
2  c  2        X  0
4  a  7        X  0
5  a  2        X  1

这是使用 apply 查找无效行并将其删除的另一种方法:

# Import module
import pandas as pd

# Your data
df = pd.DataFrame({"a": ["a", "b", "c", "a", "a"],
                   "b": [-2, 5, 2, 2, 7],
                   "xtra_col": ["X", "X", "X", "X", "X"]})

# Filtering function
def filter_row(row):
    # Your condition comparing the current row with the whole dataframe
    if sum((df.a == row.a) & (df.b == -row.b)) == 1:
        return row

# Apply the filter method
row_to_remove = df.apply(filter_row, axis=1)
print(row_to_remove)  # You can use drop NA to remove NA rows
#       a    b xtra_col
# 0     a - 2.0       X
# 1  None  NaN     None
# 2  None  NaN     None
# 3     a  2.0        X
# 4  None  NaN     None

# Drop invalid rows
result = df[(df != row_to_remove).any(axis=1)]
print(result)
#    a  b xtra_col
# 1  b  5        X
# 2  c  2        X
# 4  a  7        X

在python中使用SQL的力量。在这里,您将 table (数据框)连接到自身,同时检查列 a 相同且列 b 颠倒的情况。使用 where 子句,您可以进行过滤。

请参阅下面的模型:

import sqlite3
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":["a","b","c","a","a"],
                  "b":[-2,5,2,2,7],
                 "xtra_col":["X","X","X","X","X"]})

#Make the db in memory
conn = sqlite3.connect(':memory:')
df.to_sql('tab', conn, index=False)

qry = '''
    select  
       tab1.a,tab1.b,tab1.xtra_col
    from
        tab as tab1 

        left join tab as tab2 on
            tab1.a =tab2.a
            and
            tab1.b = -tab2.b
        where tab2.a is null
    '''
dfres = pd.read_sql_query(qry, conn)
dfres

结果在这里:

a   b   xtra_col
0   b   5   X
1   c   2   X
2   a   7   X