填写缺失日期的快捷方式
shortcut for filling missing dates
我有以下例子:
import numpy as np
import pandas as pd
idx1 = pd.period_range('2015-01-01', freq='10T', periods=1000)
idx2 = pd.period_range('2016-01-01', freq='10T', periods=1000)
df1 = pd.DataFrame(np.random.randn(1000), index=idx1,
columns=['A'])
df2 = pd.DataFrame(np.random.randn(1000), index=idx2,
columns=['A'])
frames = [df1, df2]
df_concat = pd.concat(frames)
现在,我想知道 df_concat
中缺失日期的数量
所以我填写了日期并重新索引了数据框:
start_total = df1.index[0]
end_total = df2.index[-1]
idx_total = pd.period_range(start=start_total, end=end_total, freq='10T')
df_total = df_concat.reindex(idx_total, fill_value=np.nan)
df_miss = df_total[df_total.isnull()]
最后的代码段是否有更短的版本?
类似df_concat.fill_missing_dates
?
这是随时间序列 scikit 一起提供的:
scikits.timeseries.TimeSeries.fill_missing_dates
我想你可以使用 resample
:
df_total = df_concat.resample('10T')
print df_total[df_total.isnull()]
A
2015-01-01 00:00:00 NaN
2015-01-01 00:10:00 NaN
2015-01-01 00:20:00 NaN
2015-01-01 00:30:00 NaN
2015-01-01 00:40:00 NaN
2015-01-01 00:50:00 NaN
2015-01-01 01:00:00 NaN
2015-01-01 01:10:00 NaN
2015-01-01 01:20:00 NaN
2015-01-01 01:30:00 NaN
2015-01-01 01:40:00 NaN
2015-01-01 01:50:00 NaN
2015-01-01 02:00:00 NaN
2015-01-01 02:10:00 NaN
2015-01-01 02:20:00 NaN
我有以下例子:
import numpy as np
import pandas as pd
idx1 = pd.period_range('2015-01-01', freq='10T', periods=1000)
idx2 = pd.period_range('2016-01-01', freq='10T', periods=1000)
df1 = pd.DataFrame(np.random.randn(1000), index=idx1,
columns=['A'])
df2 = pd.DataFrame(np.random.randn(1000), index=idx2,
columns=['A'])
frames = [df1, df2]
df_concat = pd.concat(frames)
现在,我想知道 df_concat
中缺失日期的数量所以我填写了日期并重新索引了数据框:
start_total = df1.index[0]
end_total = df2.index[-1]
idx_total = pd.period_range(start=start_total, end=end_total, freq='10T')
df_total = df_concat.reindex(idx_total, fill_value=np.nan)
df_miss = df_total[df_total.isnull()]
最后的代码段是否有更短的版本?
类似df_concat.fill_missing_dates
?
这是随时间序列 scikit 一起提供的:
scikits.timeseries.TimeSeries.fill_missing_dates
我想你可以使用 resample
:
df_total = df_concat.resample('10T')
print df_total[df_total.isnull()]
A
2015-01-01 00:00:00 NaN
2015-01-01 00:10:00 NaN
2015-01-01 00:20:00 NaN
2015-01-01 00:30:00 NaN
2015-01-01 00:40:00 NaN
2015-01-01 00:50:00 NaN
2015-01-01 01:00:00 NaN
2015-01-01 01:10:00 NaN
2015-01-01 01:20:00 NaN
2015-01-01 01:30:00 NaN
2015-01-01 01:40:00 NaN
2015-01-01 01:50:00 NaN
2015-01-01 02:00:00 NaN
2015-01-01 02:10:00 NaN
2015-01-01 02:20:00 NaN