同时按条件和日期范围对 pandas 数据帧进行切片的优雅方法是什么?
What is an elegant way to slice a pandas dataframe by a condition AND a date range at the same time?
所以我正在寻找一种简单的解决方案,用于根据条件和时间范围更改数据框的内容。请参阅下面的代码:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=np.random.rand(15,2), index = pd.DatetimeIndex(start = "2018-01-01 00:00", end = "2018-01-01 00:14", freq="1min"), columns = ["A", "B"])
data.loc[data["A"].between(0.2,0.3), :].loc[:"2018-01-01 00:02", "A"] = 4
# /Users/ap/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py:189: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
# See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
# self._setitem_with_indexer(indexer, value)
# __main__:1: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
# See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
print(data)
# A B
# 2018-01-01 00:00:00 0.146793 0.198634
# 2018-01-01 00:01:00 0.284354 0.422438
# 2018-01-01 00:02:00 0.359768 0.199127
# 2018-01-01 00:03:00 0.306145 0.538669
# 2018-01-01 00:04:00 0.839377 0.299983
# 2018-01-01 00:05:00 0.236554 0.127450
# 2018-01-01 00:06:00 0.262167 0.304692
# 2018-01-01 00:07:00 0.341273 0.099983
# 2018-01-01 00:08:00 0.721702 0.763717
# 2018-01-01 00:09:00 0.196948 0.541878
# 2018-01-01 00:10:00 0.673248 0.421809
# 2018-01-01 00:11:00 0.892244 0.070801
# 2018-01-01 00:12:00 0.354958 0.184147
# 2018-01-01 00:13:00 0.062060 0.840900
# 2018-01-01 00:14:00 0.139046 0.742875
# ==> Nothing happened as indicated by the warning
# non-elegant way to solve the issue:
x = data.loc[data["A"].between(0.2,0.3), :]
x.loc[:"2018-01-01 00:02", "A"] = 4
data.loc[x.index,:] = x
print(data)
# A B
# 2018-01-01 00:00:00 0.146793 0.198634
# 2018-01-01 00:01:00 4.000000 0.422438
# 2018-01-01 00:02:00 0.359768 0.199127
# 2018-01-01 00:03:00 0.306145 0.538669
# 2018-01-01 00:04:00 0.839377 0.299983
# 2018-01-01 00:05:00 0.236554 0.127450
# 2018-01-01 00:06:00 0.262167 0.304692
# 2018-01-01 00:07:00 0.341273 0.099983
# 2018-01-01 00:08:00 0.721702 0.763717
# 2018-01-01 00:09:00 0.196948 0.541878
# 2018-01-01 00:10:00 0.673248 0.421809
# 2018-01-01 00:11:00 0.892244 0.070801
# 2018-01-01 00:12:00 0.354958 0.184147
# 2018-01-01 00:13:00 0.062060 0.840900
# 2018-01-01 00:14:00 0.139046 0.742875
我也知道我可以像这样从中得出两个条件,但我不认为这是一个 "elegant" 解决方案,因为我没有使用 pandas 再:
from datetime import datetime
data.loc[(data["A"].between(0.2,0.3)) & (data.index < datetime.strptime("2018-01-01 00:02", "%Y-%m-%d %H:%M")), "A"] = 4
这将完成工作:
data.loc[:"2018-01-01 00:02","A"][data.loc[:"2018-01-01 00:02", "A"].between(0.2,0.3)]=4
所以我正在寻找一种简单的解决方案,用于根据条件和时间范围更改数据框的内容。请参阅下面的代码:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=np.random.rand(15,2), index = pd.DatetimeIndex(start = "2018-01-01 00:00", end = "2018-01-01 00:14", freq="1min"), columns = ["A", "B"])
data.loc[data["A"].between(0.2,0.3), :].loc[:"2018-01-01 00:02", "A"] = 4
# /Users/ap/anaconda/lib/python3.5/site-packages/pandas/core/indexing.py:189: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
# See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
# self._setitem_with_indexer(indexer, value)
# __main__:1: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
# See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
print(data)
# A B
# 2018-01-01 00:00:00 0.146793 0.198634
# 2018-01-01 00:01:00 0.284354 0.422438
# 2018-01-01 00:02:00 0.359768 0.199127
# 2018-01-01 00:03:00 0.306145 0.538669
# 2018-01-01 00:04:00 0.839377 0.299983
# 2018-01-01 00:05:00 0.236554 0.127450
# 2018-01-01 00:06:00 0.262167 0.304692
# 2018-01-01 00:07:00 0.341273 0.099983
# 2018-01-01 00:08:00 0.721702 0.763717
# 2018-01-01 00:09:00 0.196948 0.541878
# 2018-01-01 00:10:00 0.673248 0.421809
# 2018-01-01 00:11:00 0.892244 0.070801
# 2018-01-01 00:12:00 0.354958 0.184147
# 2018-01-01 00:13:00 0.062060 0.840900
# 2018-01-01 00:14:00 0.139046 0.742875
# ==> Nothing happened as indicated by the warning
# non-elegant way to solve the issue:
x = data.loc[data["A"].between(0.2,0.3), :]
x.loc[:"2018-01-01 00:02", "A"] = 4
data.loc[x.index,:] = x
print(data)
# A B
# 2018-01-01 00:00:00 0.146793 0.198634
# 2018-01-01 00:01:00 4.000000 0.422438
# 2018-01-01 00:02:00 0.359768 0.199127
# 2018-01-01 00:03:00 0.306145 0.538669
# 2018-01-01 00:04:00 0.839377 0.299983
# 2018-01-01 00:05:00 0.236554 0.127450
# 2018-01-01 00:06:00 0.262167 0.304692
# 2018-01-01 00:07:00 0.341273 0.099983
# 2018-01-01 00:08:00 0.721702 0.763717
# 2018-01-01 00:09:00 0.196948 0.541878
# 2018-01-01 00:10:00 0.673248 0.421809
# 2018-01-01 00:11:00 0.892244 0.070801
# 2018-01-01 00:12:00 0.354958 0.184147
# 2018-01-01 00:13:00 0.062060 0.840900
# 2018-01-01 00:14:00 0.139046 0.742875
我也知道我可以像这样从中得出两个条件,但我不认为这是一个 "elegant" 解决方案,因为我没有使用 pandas 再:
from datetime import datetime
data.loc[(data["A"].between(0.2,0.3)) & (data.index < datetime.strptime("2018-01-01 00:02", "%Y-%m-%d %H:%M")), "A"] = 4
这将完成工作:
data.loc[:"2018-01-01 00:02","A"][data.loc[:"2018-01-01 00:02", "A"].between(0.2,0.3)]=4