比较 pandas DataFrame 中的列给出无法解决的 ValueError

Comparing columns in pandas DataFrame gives unsolvable ValueError

我有以下 pandas DataFrame:

df = pd.DataFrame({"id": [0, 1, 2, 3, 4, 5, 6],
                   "from": ["A", "B", "B", "D", "B", "C", "B"],
                   "to": ["B", "C", "D", "F", "G", "F", "E"],
                   "cases": [[1, 2, 44], [2, 4, 3], [5, 2], [5], [1, 7], [4], [44, 7]]
                   "start1": [1, 5, 4, 4, 23, 12, 8],
                   "start2": [4, 7, 9, 30, 26, 15, 18],
                   "end1": [5, 7, 11, 32, 15, 17, 21],
                   "end2": [9, 12, 15, 35, 17, 20, 25],})

看起来像:

    id  from    to  cases       start1  start2  end1    end2
0   0   A       B   [1, 2, 44]  1       4       5       9     
1   1   B       C   [2, 4, 3]   5       7       7       12    
2   2   B       D   [5, 2]      4       9       11      15   
3   3   D       F   [5]         4       30      32      35    
4   4   B       G   [1, 7]      23      26      15      17     
5   5   C       F   [4]         12      15      17      20    
6   6   B       E   [44, 7]     8       18      21      25    

我正在尝试创建一个列 adjacency_list,其中包含第 i 行的 idj 的值,其中:

我正在尝试执行以下代码来实现此目的:

data["adjacency_list"] = data.apply(
        lambda x: [
            row["id"]
            for i, row in data[(x["to"] == data["from"])].iterrows()
            if ((not set(row["cases"]).isdisjoint(x["cases"])) and ((x["end1"] <= test["start1"] <= x["end2"]) or (test["start1"] <= x["end1"] <= test["start2"])))
        ],
        axis=1,
    )

输出应如下所示:

    id  from    to  cases       start1  start2  end1    end2    adjacency_list
0   0   A       B   [1, 2, 44]  1       4       5       9       [1, 2, 6]
1   1   B       C   [2, 4, 3]   5       7       7       12      [5]
2   2   B       D   [5, 2]      4       9       11      15      [3]
3   3   D       F   [5]         4       30      32      35      []
4   4   B       G   [1, 7]      23      26      15      17      []
5   5   C       F   [4]         12      15      17      20      []
6   6   B       E   [44, 7]     8       18      21      25      []

但它给我以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我从在不同上下文中遇到此错误的用户那里阅读了很多其他答案,并尝试将 andor 替换为 &|,但这没有用。此外,将双 <= 比较替换为两个单 <= 比较也无济于事。

如何解决?

(test["start1"] <= x["end1"] <= test["start2"]) 正在创建一系列布尔值,因为 test['start1'] 是一个系列,所以每个元素都会进行比较。

尝试将每个 rowx 进行比较:

df["adjacency_list"] = df.apply(
    lambda x: [
        row["id"]
        for _, row in df[(x["to"] == df["from"])].iterrows()
        if (
                (
                    not set(row["cases"]).isdisjoint(x["cases"])
                ) and (
                        (x["end1"] <= row["start1"] <= x["end2"])
                        or
                        (row["start1"] <= x["end1"] <= row["start2"])
                )
        )
    ],
    axis=1,
)

输出:

   id from to       cases  start1  start2  end1  end2 adjacency_list
0   0    A  B  [1, 2, 44]       1       4     5     9      [1, 2, 6]
1   1    B  C   [2, 4, 3]       5       7     7    12            [5]
2   2    B  D      [5, 2]       4       9    11    15            [3]
3   3    D  F         [5]       4      30    32    35             []
4   4    B  G      [1, 7]      23      26    15    17             []
5   5    C  F         [4]      12      15    17    20             []
6   6    B  E     [44, 7]       8      18    21    25             []