比较 pandas DataFrame 中的列给出无法解决的 ValueError

Question

我有以下 pandas DataFrame：

df = pd.DataFrame({"id": [0, 1, 2, 3, 4, 5, 6],
                   "from": ["A", "B", "B", "D", "B", "C", "B"],
                   "to": ["B", "C", "D", "F", "G", "F", "E"],
                   "cases": [[1, 2, 44], [2, 4, 3], [5, 2], [5], [1, 7], [4], [44, 7]]
                   "start1": [1, 5, 4, 4, 23, 12, 8],
                   "start2": [4, 7, 9, 30, 26, 15, 18],
                   "end1": [5, 7, 11, 32, 15, 17, 21],
                   "end2": [9, 12, 15, 35, 17, 20, 25],})

看起来像：

    id  from    to  cases       start1  start2  end1    end2
0   0   A       B   [1, 2, 44]  1       4       5       9     
1   1   B       C   [2, 4, 3]   5       7       7       12    
2   2   B       D   [5, 2]      4       9       11      15   
3   3   D       F   [5]         4       30      32      35    
4   4   B       G   [1, 7]      23      26      15      17     
5   5   C       F   [4]         12      15      17      20    
6   6   B       E   [44, 7]     8       18      21      25

我正在尝试创建一个列 adjacency_list，其中包含第 i 行的 id 行 j 的值，其中：

i["to"] == j["from"]
i["cases"] 与 j["cases"]
间隔 (i["end1"], i["end2"]) 和 (j["start1"], j["start2"]) 重叠

我正在尝试执行以下代码来实现此目的：

data["adjacency_list"] = data.apply(
        lambda x: [
            row["id"]
            for i, row in data[(x["to"] == data["from"])].iterrows()
            if ((not set(row["cases"]).isdisjoint(x["cases"])) and ((x["end1"] <= test["start1"] <= x["end2"]) or (test["start1"] <= x["end1"] <= test["start2"])))
        ],
        axis=1,
    )

输出应如下所示：

    id  from    to  cases       start1  start2  end1    end2    adjacency_list
0   0   A       B   [1, 2, 44]  1       4       5       9       [1, 2, 6]
1   1   B       C   [2, 4, 3]   5       7       7       12      [5]
2   2   B       D   [5, 2]      4       9       11      15      [3]
3   3   D       F   [5]         4       30      32      35      []
4   4   B       G   [1, 7]      23      26      15      17      []
5   5   C       F   [4]         12      15      17      20      []
6   6   B       E   [44, 7]     8       18      21      25      []

但它给我以下错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我从在不同上下文中遇到此错误的用户那里阅读了很多其他答案，并尝试将 and 和 or 替换为 & 和 |，但这没有用。此外，将双 <= 比较替换为两个单 <= 比较也无济于事。

如何解决？

Answer 1

(test["start1"] <= x["end1"] <= test["start2"]) 正在创建一系列布尔值，因为 test['start1'] 是一个系列，所以每个元素都会进行比较。

尝试将每个 row 与 x 进行比较：

df["adjacency_list"] = df.apply(
    lambda x: [
        row["id"]
        for _, row in df[(x["to"] == df["from"])].iterrows()
        if (
                (
                    not set(row["cases"]).isdisjoint(x["cases"])
                ) and (
                        (x["end1"] <= row["start1"] <= x["end2"])
                        or
                        (row["start1"] <= x["end1"] <= row["start2"])
                )
        )
    ],
    axis=1,
)

输出：

   id from to       cases  start1  start2  end1  end2 adjacency_list
0   0    A  B  [1, 2, 44]       1       4     5     9      [1, 2, 6]
1   1    B  C   [2, 4, 3]       5       7     7    12            [5]
2   2    B  D      [5, 2]       4       9    11    15            [3]
3   3    D  F         [5]       4      30    32    35             []
4   4    B  G      [1, 7]      23      26    15    17             []
5   5    C  F         [4]      12      15    17    20             []
6   6    B  E     [44, 7]       8      18    21    25             []

比较 pandas DataFrame 中的列给出无法解决的 ValueError

Comparing columns in pandas DataFrame gives unsolvable ValueError

python

adjacency-list

logical-operators

dataframe

pandas