IF ... THEN 结构以及如何跨多行应用

IF ... THEN structure and how to apply across multiple rows

我有一个包含 3 列的数据框:PostcodeBoroughNeighbourhood,共 257 行。你可以暂时忘记 Postcode

所以逻辑应该是这样的:

 If Neighbourhood="Not Assigned" AND Borough<>"Not Assigned" then Neighbourhood=Borough
    Repeat for all rows

使用 pandas 库,我们可以对 DataFrame.

使用子集技术

首先,出于测试目的,我重新创建了只有 2 列的数据框:BoroughNeighbourhood。我还添加了另一行,因为提供的数据中 none 满足条件。

borough = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Not assigned", "Etobicoke"]
neighbourhood = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Kingsway Park South West", "Mimico NW", "The Queensway West", "Royal York South West", "South of Bloor", "Not assigned", "Not assigned"]

df = pd.DataFrame({"Borough": borough,
                   "Neighbourhood": neighbourhood})
print(df)

然后我们创建条件语句:如果一个单元格有一个有效的自治市镇位置(可以是任何东西)并且 Neighborhood 是 "Not assigned" 那么 Neighborhood 将被设置为相同作为自治市镇。

condition = (df["Borough"] != "Not assigned") & (df["Neighbourhood"] == "Not assigned")
print(condition)

condition 是一个 boolean Series,它只包含 TrueFalse,对数据帧的子集很有用。

最后,如果该行满足 condition.

,我们将 Neighbourhood 列中的值替换为 Borough 列中的值
df.loc[condition, "Neighbourhood"] = df.loc[condition, "Borough"]
print(df)

或者,您也可以进行循环,但这不是一个好的做法,因为对于更大的数据,计算速度可能会更慢:

for idx, row in df.iterrows():
    condition = (row["Borough"] != "Not assigned") & (row["Neighbourhood"] == "Not assigned")
    if condition:
        row["Neighbourhood"] = row["Borough"]