DataFrame 逻辑删除行不工作

Question

背景 - 我有一个 pandas DataFrame，我对其进行了一些数学运算，以计算填充 Entity ID 和 % Ownership 列与 -

df['Entity ID %'] = df.groupby('Entity ID')['% Ownership'].transform(sum)
df['Account # %'] = df.groupby('Entity ID')['% Ownership'].transform(sum)

然后，我重新保存 DataFrame，其中只有那些值 !=1 或 !=0 的 2x 列的行，正如这行混乱的代码所指定的那样。这形成了我的 'Exception Report' -

df = df[(df['Entity ID %'] != 1.0000000) & (df['Entity ID %'] != 0.0000000) & (df['Account # %'] != 1.0000000) & (df['Account # %'] != 0.0000000)]

然后我要求将两列的值四舍五入到小数点后 7 位 -
df = df.round({'Entity ID %': 7, 'Account # %': 7})

最后，DataFrame被写入.xlsx文件-

with open(filename, 'w') as output_data:
        df.to_excel(filename+'.xlsx', sheet_name='Ownership Exception Report')

问题 - 尽管重新保存 DataFrame 的代码仅包含 != 1 或 !=0 行，但有些行具有 1尽管满足要删除的条件，但两列中的值仍然保留并写入 xlsx。

函数 1 - 这将创建 Entity ID % 和 Account # % 列，以及执行数学运算，这将填充这些列 -

def ownership_qc():
    df = unpack_response()
    df['Entity ID %'] = '-'
    df['Account # %'] = '-'
    df['Reason to Dismiss'] = '-'
    df['Entity ID %'] = df.groupby('Entity ID')['% Ownership'].transform(sum)
    df['Account # %'] = df.groupby('Entity ID')['% Ownership'].transform(sum)
    return df

函数 2 - 这会在四舍五入到小数点后 7 位之前使用上述标准（!=0 或 !=1）重新保存 DataFrame，最后写入 .csv.

def ownership_exceptions():
    df = ownership_qc()
    df = df[(df['Entity ID %'] != 1.0000000) & (df['Entity ID %'] != 0.0000000) & (df['Account # %'] != 1.0000000) & (df['Account # %'] != 0.0000000)]
    df = df.round({'Entity ID %': 7, 'Account # %': 7})
    #   Counting rows in df
    index = df.index
    number_of_rows = len(index)
    timestr = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M")
    filename = 'ownership_exceptions_'+timestr
    with open(filename, 'w') as output_data:
        df.to_excel(filename+'.xlsx', sheet_name='Ownership Exception Report')
    print("---------------------------\n","EXCEPTION REPORT:", number_of_rows, "rows", "\n---------------------------")
    return df

有谁能帮我确定为什么一些符合要删除的条件的行仍然保留在我的 DataFrame 中，因此仍然被写入 .xlsx 文件？

我希望我的逻辑有缺陷并且它很容易修复，但是我在解决这个问题时遇到了死胡同。

Answer 1

仔细查看您的代码后，我发现您正在初始化

 df['Entity ID %'] = '-'
 df['Account # %'] = '-'

这使得它们的数据类型为 object。您不能将对象与整数进行比较。

在 ownership_qc() 中进行以下更改：

 df['Entity ID %'] = 0
 df['Account # %'] = 0

这会有帮助！

DataFrame 逻辑删除行不工作

DataFrame Logic To Remove Rows Not Working

python

rounding

dataframe

pandas