Pandas/Python 中的 SettingWithCopyWarning 消息和 df.loc
SettingWithCopyWarning message in Pandas/Python with df.loc
OBS:我花了几个小时在 SO、Pandas 文档和其他一些网站上搜索,但无法理解我的代码在哪里不起作用。
我的 UDF:
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
重要:
isOutlier
列 不存在 。我现在正在这个函数中创建它。
indice
列 不存在 。我现在正在这个函数中创建它。
valor_unitario
存在并且它是一个浮点数
lb
和 ub
是先前定义的
- 这个函数在主代码中的一个循环内(但是这个警告是因为 n=0 引发的)
发出警告
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
我在网上和 Whosebug 上找到了一些文章和问题,说使用 loc
可以解决问题。我尝试了但没有成功
1º 尝试 - 使用 loc
def indice(dfb, lb, ub):
-> dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
-> dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
我也试过每次都用loc 其实我试了很多可能的组合...试过用df.loc
在 dfb['valor_unitario']
等等
现在我有两次相同的警告,但有点不同:
self._setitem_single_column(ilocs[0], value, pi)
和
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)
和
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value
我也试过用copy。第一次出现这个警告,简单使用 copy()
解决了问题,我不知道为什么现在它不起作用(我刚刚加载了更多数据)
2º 尝试 - 使用 copy()
我尝试在三个地方放置 copy()
,但没有成功
dfb = dfb[~dfb.isOutlier].copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
我没有更多的想法,非常感谢您的支持。
--------最小可重现示例--------
Main_testing.py
import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02
df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
'cod_produto':[21,21,21,55,55,43,26,30],
'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
})
nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)
# initializing main DF with no data
df_nf = pd.DataFrame(columns=list(df.columns.values))
n = 0
while n < len(product_ids):
item = product_ids[n]
df_item = df[df[nome_coluna] == item]
# assigning bounds to each variable
lb, ub = indice.limites(df_item, 10)
# calculating index over DF, using LB and UB
# creating temporary (for each loop) DF
df_nf_aux = indice.indice(df_item, lb, ub)
# assigning temporary DF to main DF that will be exported later
df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
n += 1
calculoindice_support.py(模块 01)
import pandas as pd
def limites(dfa,n):
n_sigma = n * dfa.valor_unitario.std()
mean = dfa.valor_unitario.mean()
lb: float = mean - n_sigma
ub: float = mean + n_sigma
return (lb, ub)
def indice(dfb, lb, ub):
if lb == ub:
dfb.loc[:, 'isOutlier'] = False
dfb.loc[:, 'indice'] = 1
else:
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
# df = df.astype({'indice': 'int64'})
return dfb
getitemsid_support.py(模块 02)
def getitemsid(df, coluna):
a = df[coluna].tolist()
return list(set(a))
警告输出:
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
问题出在你的Main_testing.py
while n < len(product_ids):
df_item = df[df[nome_coluna] == item]
df_nf_aux = indice.indice(df_item, lb, ub)
首先你用条件 df[nome_coluna] == item
切片你的 df
,这将 return 数据帧的副本(你可以通过访问 _is_view
或 _is_copy
属性)。然后将过滤后的数据帧传递给 indice
方法。
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
在 indice
方法中,您将一个新列分配给过滤后的数据框。这是一个隐式链式赋值。 Pandas 不知道您是想将新列添加到原始数据框还是只添加到过滤后的数据框,因此 pandas 给您一个警告。
要抑制此警告,您可以明确告诉 pandas 您想要做什么
def indice(dfb, lb, ub):
dfb = dfb.copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
在上面的例子中,我创建了一个过滤数据框的副本。这意味着我想将新列添加到过滤后的非原始数据框中。
OBS:我花了几个小时在 SO、Pandas 文档和其他一些网站上搜索,但无法理解我的代码在哪里不起作用。
我的 UDF:
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
重要:
isOutlier
列 不存在 。我现在正在这个函数中创建它。indice
列 不存在 。我现在正在这个函数中创建它。valor_unitario
存在并且它是一个浮点数lb
和ub
是先前定义的- 这个函数在主代码中的一个循环内(但是这个警告是因为 n=0 引发的)
发出警告
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
我在网上和 Whosebug 上找到了一些文章和问题,说使用 loc
可以解决问题。我尝试了但没有成功
1º 尝试 - 使用 loc
def indice(dfb, lb, ub):
-> dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
-> dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
我也试过每次都用loc 其实我试了很多可能的组合...试过用df.loc
在 dfb['valor_unitario']
等等
现在我有两次相同的警告,但有点不同:
self._setitem_single_column(ilocs[0], value, pi)
和
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)
和
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value
我也试过用copy。第一次出现这个警告,简单使用 copy()
解决了问题,我不知道为什么现在它不起作用(我刚刚加载了更多数据)
2º 尝试 - 使用 copy()
我尝试在三个地方放置 copy()
,但没有成功
dfb = dfb[~dfb.isOutlier].copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
我没有更多的想法,非常感谢您的支持。
--------最小可重现示例--------
Main_testing.py
import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02
df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
'cod_produto':[21,21,21,55,55,43,26,30],
'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
})
nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)
# initializing main DF with no data
df_nf = pd.DataFrame(columns=list(df.columns.values))
n = 0
while n < len(product_ids):
item = product_ids[n]
df_item = df[df[nome_coluna] == item]
# assigning bounds to each variable
lb, ub = indice.limites(df_item, 10)
# calculating index over DF, using LB and UB
# creating temporary (for each loop) DF
df_nf_aux = indice.indice(df_item, lb, ub)
# assigning temporary DF to main DF that will be exported later
df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
n += 1
calculoindice_support.py(模块 01)
import pandas as pd
def limites(dfa,n):
n_sigma = n * dfa.valor_unitario.std()
mean = dfa.valor_unitario.mean()
lb: float = mean - n_sigma
ub: float = mean + n_sigma
return (lb, ub)
def indice(dfb, lb, ub):
if lb == ub:
dfb.loc[:, 'isOutlier'] = False
dfb.loc[:, 'indice'] = 1
else:
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
# df = df.astype({'indice': 'int64'})
return dfb
getitemsid_support.py(模块 02)
def getitemsid(df, coluna):
a = df[coluna].tolist()
return list(set(a))
警告输出:
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
问题出在你的Main_testing.py
while n < len(product_ids):
df_item = df[df[nome_coluna] == item]
df_nf_aux = indice.indice(df_item, lb, ub)
首先你用条件 df[nome_coluna] == item
切片你的 df
,这将 return 数据帧的副本(你可以通过访问 _is_view
或 _is_copy
属性)。然后将过滤后的数据帧传递给 indice
方法。
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
在 indice
方法中,您将一个新列分配给过滤后的数据框。这是一个隐式链式赋值。 Pandas 不知道您是想将新列添加到原始数据框还是只添加到过滤后的数据框,因此 pandas 给您一个警告。
要抑制此警告,您可以明确告诉 pandas 您想要做什么
def indice(dfb, lb, ub):
dfb = dfb.copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
在上面的例子中,我创建了一个过滤数据框的副本。这意味着我想将新列添加到过滤后的非原始数据框中。