在 datetimeindex 数据框中选择具有指定日期的行 - Pandas

Selecting rows with specified days in datetimeindex dataframe - Pandas

我有一个带有日期时间索引的数据框。我只需要索引属于列表中指定日期的那些行,例如[1,2] 表示星期一和星期二。这在 pandas 中可以用一行代码实现吗?

IIUC 然后下面应该工作:

df[df.index.to_series().dt.dayofweek.isin([0,1])]

示例:

In [9]:
df = pd.DataFrame(index=pd.date_range(start=dt.datetime(2015,1,1), end = dt.datetime(2015,2,1)))
df[df.index.to_series().dt.dayofweek.isin([0,1])]

Out[9]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

所以这会将 DateTimeIndex 转换为 Series so that we can call isin to test for membership, using .dt.dayofweek 并传递 0,1 (这对应于星期一和星期二),我们使用布尔掩码来掩码索引

另一种方法是构造一个布尔掩码而不转换为 Series:

In [12]:
df[(df.index.dayofweek == 0) | (df.index.dayofweek == 1)]

Out[12]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

或者事实上这可行:

In [13]:
df[df.index.dayofweek < 2]

Out[13]:
Empty DataFrame
Columns: []
Index: [2015-01-05 00:00:00, 2015-01-06 00:00:00, 2015-01-12 00:00:00, 2015-01-13 00:00:00, 2015-01-19 00:00:00, 2015-01-20 00:00:00, 2015-01-26 00:00:00, 2015-01-27 00:00:00]

计时

In [14]:
%timeit df[df.index.dayofweek < 2]
%timeit df[np.in1d(df.index.dayofweek, [1, 2])]

1000 loops, best of 3: 464 µs per loop
1000 loops, best of 3: 521 µs per loop

所以我最后的方法在这里比 np 方法稍微快一些

你可以试试这个:

In [3]: import pandas as pd
In [4]: import numpy as np

In [5]: index = pd.date_range('11/23/2015', end = '11/30/2015', freq='d')
In [6]: df = pd.DataFrame(np.random.randn(len(index),2),columns=list('AB'),index=index)

In [7]: df
Out[7]:
                   A         B
2015-11-23 -0.673626 -1.009921
2015-11-24 -1.288852 -0.338795
2015-11-25 -1.414042 -0.767050
2015-11-26  0.018223 -0.726230
2015-11-27 -1.288709 -1.144437
2015-11-28  0.121093  1.396825
2015-11-29 -0.791611 -1.014375
2015-11-30  1.223220 -1.223499


In [8]: df[np.in1d(df.index.dayofweek, [1, 2])]
Out[8]:
                   A         B
2015-11-24  0.116678 -0.715655
2015-11-25 -1.494921  0.218176

1 实际上是星期二。但如果需要的话,这应该很容易解释。

之前的回答是写这篇的时候贴出来的,作为对比:

In [15]: %timeit df.loc[df.index.to_series().dt.dayofweek.isin([0,1]).values]
100 loops, best of 3: 2.01 ms per loop

In [16]: %timeit df[np.in1d(df.index.dayofweek, [0, 1])]
1000 loops, best of 3: 393 µs per loop

请注意,此比较是在我创建的测试 DF 上完成的,我不知道它如何必然扩展到更大的数据帧,但性能应该是一致的。