IF ... THEN 结构以及如何跨多行应用

Question

我有一个包含 3 列的数据框：Postcode、Borough 和 Neighbourhood，共 257 行。你可以暂时忘记 Postcode。

对于 Borough 和 Neighbourhood，任一列都可能已经添加了有效位置或者是 Not assigned，我正在尝试弄清楚如何执行以下操作。 如果一个单元格有一个有效的自治市镇位置（可以是任何东西）并且邻域是 "Not assigned"，那么邻域将被设置为与自治市镇相同。

所以逻辑应该是这样的：

 If Neighbourhood="Not Assigned" AND Borough<>"Not Assigned" then Neighbourhood=Borough
    Repeat for all rows

Answer 1

使用 pandas 库，我们可以对 DataFrame.

使用子集技术

首先，出于测试目的，我重新创建了只有 2 列的数据框：Borough 和 Neighbourhood。我还添加了另一行，因为提供的数据中 none 满足条件。

borough = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Not assigned", "Etobicoke"]
neighbourhood = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Kingsway Park South West", "Mimico NW", "The Queensway West", "Royal York South West", "South of Bloor", "Not assigned", "Not assigned"]

df = pd.DataFrame({"Borough": borough,
                   "Neighbourhood": neighbourhood})
print(df)

然后我们创建条件语句：如果一个单元格有一个有效的自治市镇位置（可以是任何东西）并且 Neighborhood 是 "Not assigned" 那么 Neighborhood 将被设置为相同作为自治市镇。

condition = (df["Borough"] != "Not assigned") & (df["Neighbourhood"] == "Not assigned")
print(condition)

condition 是一个 boolean Series，它只包含 True 和 False，对数据帧的子集很有用。

最后，如果该行满足 condition.

，我们将 Neighbourhood 列中的值替换为 Borough 列中的值

df.loc[condition, "Neighbourhood"] = df.loc[condition, "Borough"]
print(df)

或者，您也可以进行循环，但这不是一个好的做法，因为对于更大的数据，计算速度可能会更慢：

for idx, row in df.iterrows():
    condition = (row["Borough"] != "Not assigned") & (row["Neighbourhood"] == "Not assigned")
    if condition:
        row["Neighbourhood"] = row["Borough"]

IF ... THEN 结构以及如何跨多行应用

IF ... THEN structure and how to apply across multiple rows

python

dataframe

pandas

python-3.6