无法使用日期作为字符串切片 pandas 数据框（以日期为键）

Question

我正在生成一个以一系列日期作为索引的空数据框。稍后会将数据添加到数据框中。

cbd=pd.date_range(start=pd.datetime(2017,01,02),end=pd.datetime(2017,01,30),period=1)

df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)

df.head()
           Test1 Test2
2017-01-02   NaN   NaN
2017-01-03   NaN   NaN
2017-01-04   NaN   NaN
2017-01-05   NaN   NaN
2017-01-06   NaN   NaN

有几个切片方法好像不行。以下 return 是一个 KeyError：

df['2017-01-02']

但是以下任何一项都有效：

df['2017-01-02':'2017-01-02']
df.loc['2017-01-02']

我在这里错过了什么？为什么第一个切片 return 没有结果？

Answer 1

有区别，因为使用不同的方法：

对于select，需要一行loc:

df['2017-01-02']

Docs - partial string indexing:

Warning

The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one):

dft['2013-1-15 12:30:00']

To select a single row, use .loc

In [74]: dft.loc['2013-1-15 12:30:00']
Out[74]: 
A    0.193284
Name: 2013-01-15 12:30:00, dtype: float64

df['2017-01-02':'2017-01-02']

这是纯粹的partial string indexing:

This type of slicing will work on a DataFrame with a DateTimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date.

Answer 2

`[]` 在 `df[]`

中的双重行为

如果您不在 [] 中使用 :，则其中的值将被视为列。
并且当您在 [] 中使用 : 时，其中的值将被视为行。

为什么双重性？

因为大多数时候人们想要对行进行切片而不是对列进行切片。

所以他们决定df[x:y]中的x和y应该对应行，

d[x] 或 x 中的

和 x，df[[x,y]] 中的 y 应对应于列。

示例：

df = pd.DataFrame(data = [[1,2,3], [1,2,3], [1,2,3]],
                                 index = ['A','B','C'], columns = ['A','B','C'])
print df

输出：

现在，当您执行 df['B'] 时，它可能意味着两件事：

取第二个索引B，给你第2行1 2 3
```
                 OR
```
拿第2栏B给你第2栏2 2 2.

因此，为了解决此冲突并使其明确无误，df['B'] 将始终意味着您需要 'B' 列，如果没有这样的列，则会抛出错误。

为什么 `df['2017-01-02']` 失败了？

它会搜索一个列'2017-01-02'，因为没有这样的列，它会抛出一个错误。

为什么 `df.loc['2017-01-02']` 有效？

因为 .loc[] 的语法是 df.loc[row,column] 并且您可以根据需要省略该列，就像您的情况一样，它仅表示 df.loc[row]

Answer 3

首先，我更新了您的测试数据（仅供参考），因为它 returns 一个 'invalid token' 错误。请在此处查看更改：

cbd=pd.date_range(start='2017-01-02',end='2017-01-30',period=1)
df = pd.DataFrame(data=None,columns=['Test1','Test2'],index=cbd)

现在看第一行：

In[1]:

df.head(1)

Out[1]:
          Test1 Test2
2017-01-02  NaN NaN

然后尝试初始切片方法会产生此错误：

In[2]:    

df['2017-01-02']

Out[2]:

KeyError: '2017-01-02'

现在尝试使用 column 名称：

In[3]:    

df.columns

Out[3]:

Index(['Test1', 'Test2'], dtype='object')

In[4]:

我们试试'Test1':

df['Test1']

并从该列获得 NaN 输出。

Out[4]:

2017-01-02    NaN
2017-01-03    NaN
2017-01-04    NaN
2017-01-05    NaN

因此，您使用的格式旨在用于 column 名称，除非您使用此格式 df['2017-01-02':'2017-01-02']。

Pandas docs状态"The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)"。

因此，正如您正确识别的那样，DataFrame.loc 是一个基于标签的索引器，它会产生您正在寻找的输出：

 In[5]:
df.loc['2017-01-02']

 Out[5]:

Test1    NaN
Test2    NaN
Name: 2017-01-02 00:00:00, dtype: object

无法使用日期作为字符串切片 pandas 数据框（以日期为键）

Unable to slice pandas dataframe (with date as key) using date as string

python

dataframe

pandas

datetimeindex

`[]` 在 `df[]`

为什么双重性？

示例：

为什么 `df['2017-01-02']` 失败了？

为什么 `df.loc['2017-01-02']` 有效？

无法使用日期作为字符串切片 pandas 数据框（以日期为键）

Unable to slice pandas dataframe (with date as key) using date as string

python

dataframe

pandas

datetimeindex

[] 在 df[]

为什么双重性？

示例：

为什么 df['2017-01-02'] 失败了？

为什么 df.loc['2017-01-02'] 有效？

`[]` 在 `df[]`

为什么 `df['2017-01-02']` 失败了？

为什么 `df.loc['2017-01-02']` 有效？