pandas重采样但不进行统计
pandas resampling without performing statistics
我有一个五分钟的数据帧:
rng = pd.date_range('1/1/2011', periods=60, freq='5Min')
df = pd.DataFrame(np.random.randn(60, 4), index=rng, columns=['A', 'B', 'C', 'D'])
A B C D
2011-01-01 00:00:00 1.287045 -0.621473 0.482130 1.886648
2011-01-01 00:05:00 0.402645 -1.335942 -0.609894 -0.589782
2011-01-01 00:10:00 -0.311789 0.342995 -0.875089 -0.781499
2011-01-01 00:15:00 1.970683 0.471876 1.042425 -0.128274
2011-01-01 00:20:00 -1.900357 -0.718225 -3.168920 -0.355735
2011-01-01 00:25:00 1.128843 -0.097980 1.130860 -1.045019
2011-01-01 00:30:00 -0.261523 0.379652 -0.385604 -0.910902
我想仅对 15 分钟间隔内的数据重新采样,但不汇总到统计数据中(我不想要均值、中值、标准差)。我想子采样并获取 15 分钟内的实际数据interval.Is 有内置方法可以做到这一点吗?
我的输出是:
A B C D
2011-01-01 00:00:00 1.287045 -0.621473 0.482130 1.886648
2011-01-01 00:15:00 1.970683 0.471876 1.042425 -0.128274
2011-01-01 00:30:00 -0.261523 0.379652 -0.385604 -0.910902
您可以重新采样到 15 分钟并取每组的 'first':
In [40]: df.resample('15min').first()
Out[40]:
A B C D
2011-01-01 00:00:00 -0.415637 -1.345454 1.151189 -0.834548
2011-01-01 00:15:00 0.221777 -0.866306 0.932487 -1.243176
2011-01-01 00:30:00 -0.690039 0.778672 -0.527087 -0.156369
...
另一种方法是构建新的所需索引并重新索引(在这种情况下这需要更多工作,但在不规则时间序列的情况下,这确保它恰好每 15 分钟获取一次数据):
In [42]: new_rng = pd.date_range('1/1/2011', periods=20, freq='15min')
In [43]: df.reindex(new_rng)
Out[43]:
A B C D
2011-01-01 00:00:00 -0.415637 -1.345454 1.151189 -0.834548
2011-01-01 00:15:00 0.221777 -0.866306 0.932487 -1.243176
2011-01-01 00:30:00 -0.690039 0.778672 -0.527087 -0.156369
...
函数 asfreq() 不做任何聚合:
df.asfreq('15min')
我有一个五分钟的数据帧:
rng = pd.date_range('1/1/2011', periods=60, freq='5Min')
df = pd.DataFrame(np.random.randn(60, 4), index=rng, columns=['A', 'B', 'C', 'D'])
A B C D
2011-01-01 00:00:00 1.287045 -0.621473 0.482130 1.886648
2011-01-01 00:05:00 0.402645 -1.335942 -0.609894 -0.589782
2011-01-01 00:10:00 -0.311789 0.342995 -0.875089 -0.781499
2011-01-01 00:15:00 1.970683 0.471876 1.042425 -0.128274
2011-01-01 00:20:00 -1.900357 -0.718225 -3.168920 -0.355735
2011-01-01 00:25:00 1.128843 -0.097980 1.130860 -1.045019
2011-01-01 00:30:00 -0.261523 0.379652 -0.385604 -0.910902
我想仅对 15 分钟间隔内的数据重新采样,但不汇总到统计数据中(我不想要均值、中值、标准差)。我想子采样并获取 15 分钟内的实际数据interval.Is 有内置方法可以做到这一点吗?
我的输出是:
A B C D
2011-01-01 00:00:00 1.287045 -0.621473 0.482130 1.886648
2011-01-01 00:15:00 1.970683 0.471876 1.042425 -0.128274
2011-01-01 00:30:00 -0.261523 0.379652 -0.385604 -0.910902
您可以重新采样到 15 分钟并取每组的 'first':
In [40]: df.resample('15min').first()
Out[40]:
A B C D
2011-01-01 00:00:00 -0.415637 -1.345454 1.151189 -0.834548
2011-01-01 00:15:00 0.221777 -0.866306 0.932487 -1.243176
2011-01-01 00:30:00 -0.690039 0.778672 -0.527087 -0.156369
...
另一种方法是构建新的所需索引并重新索引(在这种情况下这需要更多工作,但在不规则时间序列的情况下,这确保它恰好每 15 分钟获取一次数据):
In [42]: new_rng = pd.date_range('1/1/2011', periods=20, freq='15min')
In [43]: df.reindex(new_rng)
Out[43]:
A B C D
2011-01-01 00:00:00 -0.415637 -1.345454 1.151189 -0.834548
2011-01-01 00:15:00 0.221777 -0.866306 0.932487 -1.243176
2011-01-01 00:30:00 -0.690039 0.778672 -0.527087 -0.156369
...
函数 asfreq() 不做任何聚合:
df.asfreq('15min')