使用 Python pandas 数据框时返回副本与视图警告
Returning a copy versus a view warning when using Python pandas dataframe
我的目的是将date
列从dateframedf
中的object类型转换为datetime类型,但是在运行程序中查看和复制时遇到了很多警告。
我从 link 中找到了一些有用的信息:
并测试了以下三种解决方案,它们都按预期工作,但警告消息不同。任何人都可以帮助解释他们的差异并指出为什么仍然警告消息返回视图而不是副本?谢谢。
方案一:df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a
copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['date'] = df['date'].astype('datetime64')
方案二:df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
test.py:85: SettingWithCopyWarning: A value is
trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
方案三:df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
更改日期时间转换的方式不会解决 SettingWithCopyWarning
。你得到它是因为你正在使用的 df
已经是某个更大数据框的一部分。 Pandas 只是警告您正在处理切片而不是完整数据。请尝试在 df
中创建一个新列 - 您会收到警告,但该列将存在于您的切片中。它不会在原始数据集中。
如果您现在正在使用 pd.options.mode.chained_assignment = None # default='warn'
,则可以关闭这些警告
我最近收到了类似的警告。经过几次尝试,至少在我的情况下,问题与您的 3 个解决方案无关。可能是你的 'df'.
如果您的 df 是另一个 pandas df 的一部分,例如:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
然后,如果你在 df 上做任何事情,就会出现那个警告。请尝试使用 df = dfOrigin[[]].copy()
。
重现此代码:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK
我的目的是将date
列从dateframedf
中的object类型转换为datetime类型,但是在运行程序中查看和复制时遇到了很多警告。
我从 link 中找到了一些有用的信息:
并测试了以下三种解决方案,它们都按预期工作,但警告消息不同。任何人都可以帮助解释他们的差异并指出为什么仍然警告消息返回视图而不是副本?谢谢。
方案一:df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df['date'] = df['date'].astype('datetime64')
方案二:df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self[k1] = value[k2] test.py:85: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
方案三:df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_column(ilocs[0], value, pi)
更改日期时间转换的方式不会解决 SettingWithCopyWarning
。你得到它是因为你正在使用的 df
已经是某个更大数据框的一部分。 Pandas 只是警告您正在处理切片而不是完整数据。请尝试在 df
中创建一个新列 - 您会收到警告,但该列将存在于您的切片中。它不会在原始数据集中。
如果您现在正在使用 pd.options.mode.chained_assignment = None # default='warn'
我最近收到了类似的警告。经过几次尝试,至少在我的情况下,问题与您的 3 个解决方案无关。可能是你的 'df'.
如果您的 df 是另一个 pandas df 的一部分,例如:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
然后,如果你在 df 上做任何事情,就会出现那个警告。请尝试使用 df = dfOrigin[[]].copy()
。
重现此代码:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK