Pandas reindex 将所有值转换为 NaN
Pandas reindex converts all values to NaN
我有以下数据框:
>>> a = pd.DataFrame({'values':[random.randint(-10,10) for i in range(10)]})
>>> a
values
0 -3
1 -8
2 -2
3 3
4 8
5 6
6 -5
7 0
8 8
9 -4
并想重新索引它,以便索引完全是日期时间。我正在使用以下代码执行此操作:
>>> times = [datetime.datetime(2018,1,2,12,40,0) + datetime.timedelta(seconds=i) for i in range(10)]
>>> times
[datetime.datetime(2018, 1, 2, 12, 40), datetime.datetime(2018, 1, 2, 12, 40, 1), datetime.datetime(2018, 1, 2, 12, 40, 2), datetime.datetime(2018, 1, 2, 12, 40, 3), datetime.datetime(2018, 1, 2, 12, 40, 4), datetime.datetime(2018, 1, 2, 12, 40, 5), datetime.datetime(2018, 1, 2, 12, 40, 6), datetime.datetime(2018, 1, 2, 12, 40, 7), datetime.datetime(2018, 1, 2, 12, 40, 8), datetime.datetime(2018, 1, 2, 12, 40, 9)]
>>> a.reindex(times)
values
2018-01-02 12:40:00 NaN
2018-01-02 12:40:01 NaN
2018-01-02 12:40:02 NaN
2018-01-02 12:40:03 NaN
2018-01-02 12:40:04 NaN
2018-01-02 12:40:05 NaN
2018-01-02 12:40:06 NaN
2018-01-02 12:40:07 NaN
2018-01-02 12:40:08 NaN
2018-01-02 12:40:09 NaN
如您所见,它反而删除了我刚刚拥有的值,并将 NaN 放在原位。我将如何重新索引此数据框以使其看起来像这样:
values
2018-01-02 12:40:00 -3
2018-01-02 12:40:01 -8
2018-01-02 12:40:02 -2
2018-01-02 12:40:03 3
2018-01-02 12:40:04 8
2018-01-02 12:40:05 6
2018-01-02 12:40:06 -5
2018-01-02 12:40:07 0
2018-01-02 12:40:08 8
2018-01-02 12:40:09 -4
只要你的times
和df.size
大小一样,你就可以把它传给set_index
df = df.set_index([times])
Out[64]:
values
2018-01-02 12:40:00 -3
2018-01-02 12:40:01 -8
2018-01-02 12:40:02 -2
2018-01-02 12:40:03 3
2018-01-02 12:40:04 8
2018-01-02 12:40:05 6
2018-01-02 12:40:06 -5
2018-01-02 12:40:07 0
2018-01-02 12:40:08 8
2018-01-02 12:40:09 -4
或者直接赋值给index
In [67]: df.index = times
In [68]: df
Out[68]:
values
2018-01-02 12:40:00 -3
2018-01-02 12:40:01 -8
2018-01-02 12:40:02 -2
2018-01-02 12:40:03 3
2018-01-02 12:40:04 8
2018-01-02 12:40:05 6
2018-01-02 12:40:06 -5
2018-01-02 12:40:07 0
2018-01-02 12:40:08 8
2018-01-02 12:40:09 -4
代码
import random
import datetime
import pandas as pd
a = pd.DataFrame({'values':[random.randint(-10,10) for i in range(10)]})
a['times'] = [datetime.datetime(2018,1,2,12,40,0) + datetime.timedelta(seconds=i) for i in range(10)]
a = a.set_index('times')
结果
times values
2018-01-02 12:40:00 -2
2018-01-02 12:40:01 -3
2018-01-02 12:40:02 5
2018-01-02 12:40:03 -9
2018-01-02 12:40:04 -6
2018-01-02 12:40:05 2
2018-01-02 12:40:06 1
2018-01-02 12:40:07 -1
2018-01-02 12:40:08 5
2018-01-02 12:40:09 3
我有以下数据框:
>>> a = pd.DataFrame({'values':[random.randint(-10,10) for i in range(10)]})
>>> a
values
0 -3
1 -8
2 -2
3 3
4 8
5 6
6 -5
7 0
8 8
9 -4
并想重新索引它,以便索引完全是日期时间。我正在使用以下代码执行此操作:
>>> times = [datetime.datetime(2018,1,2,12,40,0) + datetime.timedelta(seconds=i) for i in range(10)]
>>> times
[datetime.datetime(2018, 1, 2, 12, 40), datetime.datetime(2018, 1, 2, 12, 40, 1), datetime.datetime(2018, 1, 2, 12, 40, 2), datetime.datetime(2018, 1, 2, 12, 40, 3), datetime.datetime(2018, 1, 2, 12, 40, 4), datetime.datetime(2018, 1, 2, 12, 40, 5), datetime.datetime(2018, 1, 2, 12, 40, 6), datetime.datetime(2018, 1, 2, 12, 40, 7), datetime.datetime(2018, 1, 2, 12, 40, 8), datetime.datetime(2018, 1, 2, 12, 40, 9)]
>>> a.reindex(times)
values
2018-01-02 12:40:00 NaN
2018-01-02 12:40:01 NaN
2018-01-02 12:40:02 NaN
2018-01-02 12:40:03 NaN
2018-01-02 12:40:04 NaN
2018-01-02 12:40:05 NaN
2018-01-02 12:40:06 NaN
2018-01-02 12:40:07 NaN
2018-01-02 12:40:08 NaN
2018-01-02 12:40:09 NaN
如您所见,它反而删除了我刚刚拥有的值,并将 NaN 放在原位。我将如何重新索引此数据框以使其看起来像这样:
values
2018-01-02 12:40:00 -3
2018-01-02 12:40:01 -8
2018-01-02 12:40:02 -2
2018-01-02 12:40:03 3
2018-01-02 12:40:04 8
2018-01-02 12:40:05 6
2018-01-02 12:40:06 -5
2018-01-02 12:40:07 0
2018-01-02 12:40:08 8
2018-01-02 12:40:09 -4
只要你的times
和df.size
大小一样,你就可以把它传给set_index
df = df.set_index([times])
Out[64]:
values
2018-01-02 12:40:00 -3
2018-01-02 12:40:01 -8
2018-01-02 12:40:02 -2
2018-01-02 12:40:03 3
2018-01-02 12:40:04 8
2018-01-02 12:40:05 6
2018-01-02 12:40:06 -5
2018-01-02 12:40:07 0
2018-01-02 12:40:08 8
2018-01-02 12:40:09 -4
或者直接赋值给index
In [67]: df.index = times
In [68]: df
Out[68]:
values
2018-01-02 12:40:00 -3
2018-01-02 12:40:01 -8
2018-01-02 12:40:02 -2
2018-01-02 12:40:03 3
2018-01-02 12:40:04 8
2018-01-02 12:40:05 6
2018-01-02 12:40:06 -5
2018-01-02 12:40:07 0
2018-01-02 12:40:08 8
2018-01-02 12:40:09 -4
代码
import random
import datetime
import pandas as pd
a = pd.DataFrame({'values':[random.randint(-10,10) for i in range(10)]})
a['times'] = [datetime.datetime(2018,1,2,12,40,0) + datetime.timedelta(seconds=i) for i in range(10)]
a = a.set_index('times')
结果
times values
2018-01-02 12:40:00 -2
2018-01-02 12:40:01 -3
2018-01-02 12:40:02 5
2018-01-02 12:40:03 -9
2018-01-02 12:40:04 -6
2018-01-02 12:40:05 2
2018-01-02 12:40:06 1
2018-01-02 12:40:07 -1
2018-01-02 12:40:08 5
2018-01-02 12:40:09 3