Pandas 计算多个列的匹配项

Question

我有一个包含 A - Z 列的数据框。值为 0,1 or NA。我需要迭代比较列 A 和 N，A 和 O 等等直到 Z，然后循环回来开始与 [= 的比较18=] 和 N、B 和 O，然后再次来自 C。我只需要在要比较的两列中出现 1 的行数。我该如何实现？

Answer 1

SQL 使集合操作更容易，因此下面的示例使用 pandasql 进行您要求的比较：

import pandas as pd
import pandasql as ps
import string

# Create a string consisting of the letters in the English alphabet in alphabetical order
alphabet_string = string.ascii_uppercase

#print(alphabet_string)


# Create a list of data
data = []


# To approximate your data, use the value 0, 1, and None (~null) for each column
data.append([0] * len(alphabet_string))
data.append([1] * len(alphabet_string))
data.append([None] * len(alphabet_string))


# Create the pandas DataFrame  
df = pd.DataFrame(data, columns = [letter for letter in alphabet_string]) 


# Create a list of the letters from A to N
a_to_n = [letter for letter in alphabet_string if letter < "O"]

print(a_to_n)

# And N to O
n_to_o = [letter for letter in alphabet_string if letter > "M"]

print(n_to_o)

# Then perform the comparison in a nested loop over the two lists
for ll in a_to_n:
    for rl in n_to_o:
        cnt = ps.sqldf(f"select count(*) cnt from df where {ll} = 1 and {rl} = 1")["cnt"].iloc[0]
        print(f"Comparing {ll} to {rl}, there were {cnt} rows where the values matched.")

最后打印如下：

Comparing N to U, there were 1 rows where the values matched.
Comparing N to V, there were 1 rows where the values matched.
Comparing N to W, there were 1 rows where the values matched.
Comparing N to X, there were 1 rows where the values matched.
Comparing N to Y, there were 1 rows where the values matched.
Comparing N to Z, there were 1 rows where the values matched.

Pandas 计算多个列的匹配项

Pandas count matches across multiple columns

python

pandas

iterator

comparison