Pandas/Python 中的 SettingWithCopyWarning 消息和 df.loc

Question

OBS：我花了几个小时在 SO、Pandas 文档和其他一些网站上搜索，但无法理解我的代码在哪里不起作用。

我的 UDF：

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

    dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

重要：

isOutlier 列 不存在 。我现在正在这个函数中创建它。
indice 列 不存在 。我现在正在这个函数中创建它。
valor_unitario 存在并且它是一个浮点数
lb 和 ub 是先前定义的
这个函数在主代码中的一个循环内（但是这个警告是因为 n=0 引发的）

发出警告

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

我在网上和 Whosebug 上找到了一些文章和问题，说使用 loc 可以解决问题。我尝试了但没有成功

1º 尝试 - 使用 loc

def indice(dfb, lb, ub):
->  dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

->  dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

我也试过每次都用loc 其实我试了很多可能的组合...试过用df.loc在 dfb['valor_unitario'] 等等

现在我有两次相同的警告，但有点不同：

self._setitem_single_column(ilocs[0], value, pi) 和 self.obj[key] = value

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)

和

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value

我也试过用copy。第一次出现这个警告，简单使用 copy() 解决了问题，我不知道为什么现在它不起作用（我刚刚加载了更多数据）

2º 尝试 - 使用 copy()

我尝试在三个地方放置 copy()，但没有成功

dfb = dfb[~dfb.isOutlier].copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

我没有更多的想法，非常感谢您的支持。

--------最小可重现示例--------

Main_testing.py

import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02

df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
                   'cod_produto':[21,21,21,55,55,43,26,30],
                   'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
                   'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
                   'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
                   })

nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)

# initializing main DF with no data 
df_nf = pd.DataFrame(columns=list(df.columns.values))

n = 0
while n < len(product_ids):
    item = product_ids[n]
    df_item = df[df[nome_coluna] == item]
    # assigning bounds to each variable
    lb, ub = indice.limites(df_item, 10)
    # calculating index over DF, using LB and UB
    # creating temporary (for each loop) DF
    df_nf_aux = indice.indice(df_item, lb, ub)
    # assigning temporary DF to main DF that will be exported later
    df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
    n += 1

calculoindice_support.py（模块 01）

import pandas as pd

def limites(dfa,n):
    n_sigma = n * dfa.valor_unitario.std()
    mean = dfa.valor_unitario.mean()
    lb: float = mean - n_sigma
    ub: float = mean + n_sigma
    return (lb, ub)


def indice(dfb, lb, ub):
    if lb == ub:
        dfb.loc[:, 'isOutlier'] = False
        dfb.loc[:, 'indice'] = 1
    else:
        dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
        dfb = dfb[~dfb.isOutlier]

        dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
        # df = df.astype({'indice': 'int64'})

    return dfb

getitemsid_support.py（模块 02）

def getitemsid(df, coluna):
    a = df[coluna].tolist()
    return list(set(a))

警告输出：

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

Answer 1

问题出在你的Main_testing.py

while n < len(product_ids):
    df_item = df[df[nome_coluna] == item]

    df_nf_aux = indice.indice(df_item, lb, ub)

首先你用条件 df[nome_coluna] == item 切片你的 df，这将 return 数据帧的副本（你可以通过访问 _is_view 或 _is_copy 属性）。然后将过滤后的数据帧传递给 indice 方法。

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

在 indice 方法中，您将一个新列分配给过滤后的数据框。这是一个隐式链式赋值。 Pandas 不知道您是想将新列添加到原始数据框还是只添加到过滤后的数据框，因此 pandas 给您一个警告。

要抑制此警告，您可以明确告诉 pandas 您想要做什么

def indice(dfb, lb, ub):
    dfb = dfb.copy()
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

在上面的例子中，我创建了一个过滤数据框的副本。这意味着我想将新列添加到过滤后的非原始数据框中。

Pandas/Python 中的 SettingWithCopyWarning 消息和 df.loc

SettingWithCopyWarning message in Pandas/Python with df.loc

python

indexing

warnings

dataframe

pandas