如何创建具有不同 Period 索引的 Dataframe

Question

我有一个Dataframe，每一行代表一条由pbs产生的记录。现在我想知道每个时间段（30 分钟）的运行个核心。我的前 4 行 table:

datetime    walltime    ncores
2019-07-18 11:18:27 2:05:10     2
2019-07-18 11:18:45 00:50:27    1
2019-07-18 11:18:46 00:07:20    1
2019-07-18 11:18:50 00:31:34    1

我发现用Peroid的元素做一个PeriodIndex是不可能的（每条记录中的used-walltime不一致）

我想我可以创建一个频率为30 minutes的PeriodIndex，然后将一个certian Period中所有记录的核心数分配给相应的Period .但是我不知道该怎么做。

我期望的是：

    datetime cputime    ncores
    2019-07-18 11:0:00      5
    2019-07-18 11:30:00     4
    2019-07-18 12:00:00     3
    2019-07-18 12:30:00     2

Answer 1

我认为你需要：

#convert to datetimes and timedeltas
df['datetime'] = pd.to_datetime(df['datetime'])
df['walltime'] = pd.to_timedelta(df['walltime'])

#create end time with flooring by 30min
df['end'] = df['datetime'].dt.floor('30min') + df['walltime']

#list by 30minutes period
zipped = zip(df['datetime'], df['end'], df['ncores'])
L = [(i, n) for s, e, n in zipped for i in pd.period_range(s, e, freq='30min')]

#DataFrame is aggregated by sum
df1 = (pd.DataFrame(L, columns=['datetime cputime', 'summed'])
        .groupby('datetime cputime', as_index=False)['summed']
        .sum())
print (df1)
   datetime cputime  summed
0  2019-07-18 11:00       5
1  2019-07-18 11:30       4
2  2019-07-18 12:00       3
3  2019-07-18 12:30       2
4  2019-07-18 13:00       2

如何创建具有不同 Period 索引的 Dataframe

How can I create a Dataframe with indexs of different Period

timespan

numpy

time-series

dataframe

pandas