如何使用布尔索引索引 pandas 数据框？

Question

我将在 pandas 中开始一个新的练习模块，我们将在其中处理数据的索引和过滤。我遇到了一种在课程中没有解释的方法链格式，我想知道是否有人可以帮助我理解这一点。数据集来自财富500强公司列表。

df = pd.read_csv('f500.csv', index_col = 0)

问题是我们被教导通过像这样将布尔条件传递给数据框来使用布尔索引；

motor_bool = df["industry"] == "Motor Vehicles and Parts"
motor_countries = df.loc[motor_bool, "country"]

上面的代码是为了找到以"Motor Vehicles and Parts"为产业的国家。本模块的最后一个练习要求我们

" 创建一个系列，industry_usa，其中包含总部位于美国的公司的行业列中两个最常见值的计数。"

答案代码是

industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)

不明白怎么突然之间可以背靠背用df[col]df[col]了？我不应该先传递 bool 条件然后指定我想使用 .loc 将其分配给哪一列吗？链式使用的方法和我们实践的很不一样

请帮忙。我真的很困惑。

一如既往，谢谢你，堆栈社区。

Answer 1

我认为最后一个解决方案不同于 recommended, here better is use DataFrame.loc 通过掩码获取列 industry 然后获取计数的第二个解决方案：

industry_usa = f500.loc[f500["country"] == "USA", "industry"].value_counts().head(2)

Series.nlargest的另一个解决方案：

industry_usa = f500.loc[f500["country"] == "USA", "industry"].nlargest(2)

How do I index an pandas dataframe using boolean indexing?