我可以使用日期索引在 pandas 中创建虚拟对象吗?
Can I use date index to create dummies in pandas?
我一直在搜索是否可以使用 pandas
中索引的 date
创建假人,但还没有找到任何东西。
我有一个 df
被 date
索引
dew temp
date
2010-01-02 00:00:00 129.0 -16
2010-01-02 01:00:00 148.0 -15
2010-01-02 02:00:00 159.0 -11
2010-01-02 03:00:00 181.0 -7
2010-01-02 04:00:00 138.0 -7
...
我知道我可以将date
设置为一个列,使用
df.reset_index(level=0, inplace=True)
然后使用类似这样的东西来创建假人,
df['main_hours'] = np.where((df['date'] >= '2010-01-02 03:00:00') & (df['date'] <= '2010-01-02 05:00:00')1,0)
但是,我想在不使用 date
作为列的情况下使用索引 date
即时创建虚拟变量。 pandas
有这样的方法吗?
任何建议将不胜感激。
IIUC:
df['main_hours'] = \
np.where((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'),
1,
0)
或:
In [8]: df['main_hours'] = \
((df.index >= '2010-01-02 03:00:00') &
(df.index <= '2010-01-02 05:00:00')).astype(int)
In [9]: df
Out[9]:
dew temp main_hours
date
2010-01-02 00:00:00 129.0 -16 0
2010-01-02 01:00:00 148.0 -15 0
2010-01-02 02:00:00 159.0 -11 0
2010-01-02 03:00:00 181.0 -7 1
2010-01-02 04:00:00 138.0 -7 1
时间: 50.000 行 DF:
In [19]: df = pd.concat([df.reset_index()] * 10**4, ignore_index=True).set_index('date')
In [20]: pd.options.display.max_rows = 10
In [21]: df
Out[21]:
dew temp
date
2010-01-02 00:00:00 129.0 -16
2010-01-02 01:00:00 148.0 -15
2010-01-02 02:00:00 159.0 -11
2010-01-02 03:00:00 181.0 -7
2010-01-02 04:00:00 138.0 -7
... ... ...
2010-01-02 00:00:00 129.0 -16
2010-01-02 01:00:00 148.0 -15
2010-01-02 02:00:00 159.0 -11
2010-01-02 03:00:00 181.0 -7
2010-01-02 04:00:00 138.0 -7
[50000 rows x 2 columns]
In [22]: %timeit ((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00')).astype(int)
1.58 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [23]: %timeit np.where((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 1, 0)
1.52 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [24]: df.shape
Out[24]: (50000, 2)
或使用between
;
pd.Series(df.index).between('2010-01-02 03:00:00', '2010-01-02 05:00:00', inclusive=True).astype(int)
Out[1567]:
0 0
1 0
2 0
3 1
4 1
Name: date, dtype: int32
df = df.assign(main_hours=0)
df.loc[df.between_time(start_time='3:00', end_time='5:00').index, 'main_hours'] = 1
>>> df
dew temp main_hours
2010-01-02 00:00:00 129 -16 0
2010-01-02 01:00:00 148 -15 0
2010-01-02 02:00:00 159 -11 0
2010-01-02 03:00:00 181 -7 1
2010-01-02 04:00:00 138 -7 1
我一直在搜索是否可以使用 pandas
中索引的 date
创建假人,但还没有找到任何东西。
我有一个 df
被 date
dew temp
date
2010-01-02 00:00:00 129.0 -16
2010-01-02 01:00:00 148.0 -15
2010-01-02 02:00:00 159.0 -11
2010-01-02 03:00:00 181.0 -7
2010-01-02 04:00:00 138.0 -7
...
我知道我可以将date
设置为一个列,使用
df.reset_index(level=0, inplace=True)
然后使用类似这样的东西来创建假人,
df['main_hours'] = np.where((df['date'] >= '2010-01-02 03:00:00') & (df['date'] <= '2010-01-02 05:00:00')1,0)
但是,我想在不使用 date
作为列的情况下使用索引 date
即时创建虚拟变量。 pandas
有这样的方法吗?
任何建议将不胜感激。
IIUC:
df['main_hours'] = \
np.where((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'),
1,
0)
或:
In [8]: df['main_hours'] = \
((df.index >= '2010-01-02 03:00:00') &
(df.index <= '2010-01-02 05:00:00')).astype(int)
In [9]: df
Out[9]:
dew temp main_hours
date
2010-01-02 00:00:00 129.0 -16 0
2010-01-02 01:00:00 148.0 -15 0
2010-01-02 02:00:00 159.0 -11 0
2010-01-02 03:00:00 181.0 -7 1
2010-01-02 04:00:00 138.0 -7 1
时间: 50.000 行 DF:
In [19]: df = pd.concat([df.reset_index()] * 10**4, ignore_index=True).set_index('date')
In [20]: pd.options.display.max_rows = 10
In [21]: df
Out[21]:
dew temp
date
2010-01-02 00:00:00 129.0 -16
2010-01-02 01:00:00 148.0 -15
2010-01-02 02:00:00 159.0 -11
2010-01-02 03:00:00 181.0 -7
2010-01-02 04:00:00 138.0 -7
... ... ...
2010-01-02 00:00:00 129.0 -16
2010-01-02 01:00:00 148.0 -15
2010-01-02 02:00:00 159.0 -11
2010-01-02 03:00:00 181.0 -7
2010-01-02 04:00:00 138.0 -7
[50000 rows x 2 columns]
In [22]: %timeit ((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00')).astype(int)
1.58 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [23]: %timeit np.where((df.index >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 1, 0)
1.52 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [24]: df.shape
Out[24]: (50000, 2)
或使用between
;
pd.Series(df.index).between('2010-01-02 03:00:00', '2010-01-02 05:00:00', inclusive=True).astype(int)
Out[1567]:
0 0
1 0
2 0
3 1
4 1
Name: date, dtype: int32
df = df.assign(main_hours=0)
df.loc[df.between_time(start_time='3:00', end_time='5:00').index, 'main_hours'] = 1
>>> df
dew temp main_hours
2010-01-02 00:00:00 129 -16 0
2010-01-02 01:00:00 148 -15 0
2010-01-02 02:00:00 159 -11 0
2010-01-02 03:00:00 181 -7 1
2010-01-02 04:00:00 138 -7 1