在链式 loc 和 iloc 之后更改 pandas 中的值

Question

我有以下问题：在 df 中，我想 select 特定的行和特定的列，在这个 selection 中取第一个 n 元素并分配一个给他们新的价值。天真地，我认为下面的代码应该可以完成这项工作：

import seaborn as sns
import pandas as pd

df = sns.load_dataset('tips')
df.loc[df.day=="Sun", "smoker"].iloc[:4] = "Yes"

loc 和 iloc 都应该 return 到 df 的视图并且应该覆盖值。但是，数据框不会更改。为什么？

我知道如何绕过它——首先使用 loc 创建一个新的 df，然后使用 iloc 更改值并更新回原始 df（如下所示）。

但是 a) 我认为它不是最优的，并且 b) 我想知道为什么最佳解决方案不起作用。为什么它 return 是副本而不是视图的视图？

备选方案：

df = sns.load_dataset('tips')
tmp = df.loc[df.day=="Sun", "smoker"]
tmp.iloc[:4] = "Yes"
df.loc[df.day=="Sun", "smoker"] = tmp

注意：我已经阅读了 docs, this really great post and this question 但他们没有解释这一点。他们关心的是 df.loc[mask,"z] 和链式 df["z"][mask].

之间的区别

Answer 1

我相信 df.loc[].iloc[] 是一个链式赋值案例，pandas 不能保证您最终会得到一个视图。来自 docs:

Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called chained assignment and should be avoided.

由于您在 loc 中有过滤条件，pandas 将创建一个新的 pd.Series，然后对其应用分配。例如，以下将起作用，因为您将获得与 df["smoker"]:

相同的系列

df.loc[:, "smoker"].iloc[:4] = 'Yes'

但是你会收到 SettingWithCopyWarning 警告。

您需要重写代码，以便 pandas 将其作为单个 loc 实体处理。

另一种可能的解决方法：

df.loc[df[df.day=="Sun"].index[:4], "smoker"] = 'Yes'

Answer 2

对于您的情况，您可以定义要估算的列

让我们假设以下数据集

df = pd.DataFrame(data={'State':[1,2,3,4,5,6, 7, 8, 9, 10], 
                         'Sno Center': ["Guntur", "Nellore", "Visakhapatnam", "Biswanath", "Nellore", "Guwahati", "Nellore", "Numaligarh", "Sibsagar", "Munger-Jamalpu"], 
                         'Mar-21': [121, 118.8, 131.6, 123.7, 127.8, 125.9, 114.2, 114.2, 117.7, 117.7],
                         'Apr-21': [121.1, 118.3, 131.5, 124.5, 128.2, 128.2, 115.4, 115.1, 117.3, 118.3]})
df
    State   Sno Center      Mar-21  Apr-21
0   1       Guntur          121.0   121.1
1   2       Nellore         118.8   118.3
2   3       Visakhapatnam   131.6   131.5
3   4       Biswanath       123.7   124.5
4   5       Nellore         127.8   128.2
5   6       Guwahati        125.9   128.2
6   7       Nellore         114.2   115.4
7   8       Numaligarh      114.2   115.1
8   9       Sibsagar        117.7   117.3
9   10      Munger-Jamalpu  117.7   118.3

所以，我想将 Sno Center 等于 Nellore

的所有日期更改为 0

mask = df["Sno Center"] == "Nellore"
df.loc[mask, ["Mar-21", "Apr-21"]] = 0

结果

df
State   Sno Center      Mar-21  Apr-21
0   1   Guntur          121.0   121.1
1   2   Nellore         0.0     0.0
2   3   Visakhapatnam   131.6   131.5
3   4   Biswanath       123.7   124.5
4   5   Nellore         0.0     0.0
5   6   Guwahati        125.9   128.2
6   7   Nellore         0.0     0.0
7   8   Numaligarh      114.2   115.1
8   9   Sibsagar        117.7   117.3
9   10  Munger-Jamalpu  117.7   118.3

其他选项是将列定义为列表

COLS = ["Mar-21", "Apr-21"]
df.loc[mask, COLS] = 0

其他选项使用 iloc

COLS = df.iloc[:, 2:4].columns.tolist()
df.loc[mask, COLS] = 0

或

df.loc[mask, df.iloc[:, 2:4].columns.tolist()] = 0

在链式 loc 和 iloc 之后更改 pandas 中的值

Change value in pandas after chained loc and iloc

python

dataframe

pandas

pandas-loc