IF ... THEN 结构以及如何跨多行应用
IF ... THEN structure and how to apply across multiple rows
我有一个包含 3 列的数据框:Postcode
、Borough
和 Neighbourhood
,共 257 行。你可以暂时忘记 Postcode
。
- 对于
Borough
和 Neighbourhood
,任一列都可能已经添加了有效位置或者是 Not assigned
,我正在尝试弄清楚如何执行以下操作。 如果一个单元格有一个有效的自治市镇位置(可以是任何东西)并且邻域是 "Not assigned",那么邻域将被设置为与自治市镇相同。
所以逻辑应该是这样的:
If Neighbourhood="Not Assigned" AND Borough<>"Not Assigned" then Neighbourhood=Borough
Repeat for all rows
使用 pandas
库,我们可以对 DataFrame
.
使用子集技术
首先,出于测试目的,我重新创建了只有 2 列的数据框:Borough
和 Neighbourhood
。我还添加了另一行,因为提供的数据中 none 满足条件。
borough = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Not assigned", "Etobicoke"]
neighbourhood = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Kingsway Park South West", "Mimico NW", "The Queensway West", "Royal York South West", "South of Bloor", "Not assigned", "Not assigned"]
df = pd.DataFrame({"Borough": borough,
"Neighbourhood": neighbourhood})
print(df)
然后我们创建条件语句:如果一个单元格有一个有效的自治市镇位置(可以是任何东西)并且 Neighborhood 是 "Not assigned" 那么 Neighborhood 将被设置为相同作为自治市镇。
condition = (df["Borough"] != "Not assigned") & (df["Neighbourhood"] == "Not assigned")
print(condition)
condition
是一个 boolean Series
,它只包含 True
和 False
,对数据帧的子集很有用。
最后,如果该行满足 condition
.
,我们将 Neighbourhood
列中的值替换为 Borough
列中的值
df.loc[condition, "Neighbourhood"] = df.loc[condition, "Borough"]
print(df)
或者,您也可以进行循环,但这不是一个好的做法,因为对于更大的数据,计算速度可能会更慢:
for idx, row in df.iterrows():
condition = (row["Borough"] != "Not assigned") & (row["Neighbourhood"] == "Not assigned")
if condition:
row["Neighbourhood"] = row["Borough"]
我有一个包含 3 列的数据框:Postcode
、Borough
和 Neighbourhood
,共 257 行。你可以暂时忘记 Postcode
。
- 对于
Borough
和Neighbourhood
,任一列都可能已经添加了有效位置或者是Not assigned
,我正在尝试弄清楚如何执行以下操作。 如果一个单元格有一个有效的自治市镇位置(可以是任何东西)并且邻域是 "Not assigned",那么邻域将被设置为与自治市镇相同。
所以逻辑应该是这样的:
If Neighbourhood="Not Assigned" AND Borough<>"Not Assigned" then Neighbourhood=Borough
Repeat for all rows
使用 pandas
库,我们可以对 DataFrame
.
首先,出于测试目的,我重新创建了只有 2 列的数据框:Borough
和 Neighbourhood
。我还添加了另一行,因为提供的数据中 none 满足条件。
borough = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Etobicoke", "Not assigned", "Etobicoke"]
neighbourhood = ["Not assigned", "Not assigned", "Not assigned", "Not assigned", "Kingsway Park South West", "Mimico NW", "The Queensway West", "Royal York South West", "South of Bloor", "Not assigned", "Not assigned"]
df = pd.DataFrame({"Borough": borough,
"Neighbourhood": neighbourhood})
print(df)
然后我们创建条件语句:如果一个单元格有一个有效的自治市镇位置(可以是任何东西)并且 Neighborhood 是 "Not assigned" 那么 Neighborhood 将被设置为相同作为自治市镇。
condition = (df["Borough"] != "Not assigned") & (df["Neighbourhood"] == "Not assigned")
print(condition)
condition
是一个 boolean Series
,它只包含 True
和 False
,对数据帧的子集很有用。
最后,如果该行满足 condition
.
Neighbourhood
列中的值替换为 Borough
列中的值
df.loc[condition, "Neighbourhood"] = df.loc[condition, "Borough"]
print(df)
或者,您也可以进行循环,但这不是一个好的做法,因为对于更大的数据,计算速度可能会更慢:
for idx, row in df.iterrows():
condition = (row["Borough"] != "Not assigned") & (row["Neighbourhood"] == "Not assigned")
if condition:
row["Neighbourhood"] = row["Borough"]