我可以使用日期索引在 pandas 中创建虚拟对象吗?

Can I use date index to create dummies in pandas?

我一直在搜索是否可以使用 pandas 中索引的 date 创建假人,但还没有找到任何东西。

我有一个 dfdate

索引
                        dew    temp   
date
2010-01-02 00:00:00      129.0  -16     
2010-01-02 01:00:00      148.0  -15     
2010-01-02 02:00:00      159.0  -11     
2010-01-02 03:00:00      181.0   -7      
2010-01-02 04:00:00      138.0   -7   
...  

我知道我可以将date设置为一个列,使用

df.reset_index(level=0, inplace=True)

然后使用类似这样的东西来创建假人,

df['main_hours'] = np.where((df['date'] >= '2010-01-02 03:00:00') & (df['date'] <= '2010-01-02 05:00:00')1,0)

但是,我想在不使用 date 作为列的情况下使用索引 date 即时创建虚拟变量。 pandas 有这样的方法吗? 任何建议将不胜感激。

IIUC:

df['main_hours'] = \
    np.where((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'),
             1,
             0)

或:

In [8]: df['main_hours'] = \
            ((df.index >= '2010-01-02 03:00:00') & 
             (df.index <= '2010-01-02 05:00:00')).astype(int)

In [9]: df
Out[9]:
                       dew  temp  main_hours
date
2010-01-02 00:00:00  129.0   -16           0
2010-01-02 01:00:00  148.0   -15           0
2010-01-02 02:00:00  159.0   -11           0
2010-01-02 03:00:00  181.0    -7           1
2010-01-02 04:00:00  138.0    -7           1

时间: 50.000 行 DF:

In [19]: df = pd.concat([df.reset_index()] * 10**4, ignore_index=True).set_index('date')

In [20]: pd.options.display.max_rows = 10

In [21]: df
Out[21]:
                       dew  temp
date
2010-01-02 00:00:00  129.0   -16
2010-01-02 01:00:00  148.0   -15
2010-01-02 02:00:00  159.0   -11
2010-01-02 03:00:00  181.0    -7
2010-01-02 04:00:00  138.0    -7
...                    ...   ...
2010-01-02 00:00:00  129.0   -16
2010-01-02 01:00:00  148.0   -15
2010-01-02 02:00:00  159.0   -11
2010-01-02 03:00:00  181.0    -7
2010-01-02 04:00:00  138.0    -7

[50000 rows x 2 columns]

In [22]: %timeit ((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00')).astype(int)
1.58 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [23]: %timeit np.where((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 1, 0)
1.52 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [24]: df.shape
Out[24]: (50000, 2)

或使用between;

pd.Series(df.index).between('2010-01-02 03:00:00',  '2010-01-02 05:00:00', inclusive=True).astype(int)

Out[1567]: 
0    0
1    0
2    0
3    1
4    1
Name: date, dtype: int32
df = df.assign(main_hours=0)
df.loc[df.between_time(start_time='3:00', end_time='5:00').index, 'main_hours'] = 1
>>> df
                     dew  temp  main_hours
2010-01-02 00:00:00  129   -16           0
2010-01-02 01:00:00  148   -15           0
2010-01-02 02:00:00  159   -11           0
2010-01-02 03:00:00  181    -7           1
2010-01-02 04:00:00  138    -7           1