如何在动态 Pandas DataFrame 中将秒数转换为自定义间隔格式?

How to convert seconds to a custom interval format in a dynamic Pandas DataFrame?

tl;博士

是否可以将此 DataFrame 枢轴 table 中的所有浮点值从秒转换为 HH:MM:SS 格式的间隔?

Name        Student A  Tutor
Date
2021-04-12       52.0   86.0
2021-04-13       13.0  113.0
2021-04-14        NaN   34.0
Average          32.5   77.6

期望的最终结果:

Name         Student A         Tutor
Date
2021-04-12    00:00:52      00:01:26
2021-04-13    00:00:13      00:01:53
2021-04-14                  00:00:34
Average       00:00:32      00:01:17

注意事项:


长读

支点 table 是这样构建的:

import pandas as pd

df = pd.DataFrame({
    'Date': ['2021-04-12', '2021-04-12', '2021-04-13', '2021-04-13', '2021-04-14'],
    'Name': ['Tutor', 'Student A', 'Student A', 'Tutor', 'Tutor'],
    'duration_seconds': [86, 52, 13, 113, 34]
})

df 看起来像这样:

         Date       Name  duration_seconds
0  2021-04-12      Tutor                86
1  2021-04-12  Student A                52
2  2021-04-13  Student A                13
3  2021-04-13      Tutor               113
4  2021-04-14      Tutor                34

分组

grouped = df.groupby(['Name', 'Date']).sum()

产量:

                       duration_seconds
Name        Date
Student A   2021-04-12               52
            2021-04-13               13
Tutor       2021-04-12               86
            2021-04-13              113
            2021-04-14               34

正在将数据转换为所需格式

pivoted = grouped.pivot_table(
    values='duration_seconds',
    index='Date',
    columns='Name'
)

产量:

Name        Student A  Tutor
Date
2021-04-12       52.0   86.0
2021-04-13       13.0  113.0
2021-04-14        NaN   34.0

添加平均值行

pivoted.loc['Average'] = pivoted.mean()

产量:

Name        Student A       Tutor
Date
2021-04-12       52.0   86.000000
2021-04-13       13.0  113.000000
2021-04-14        NaN   34.000000
Average          32.5   77.666667

这就是我在这方面的进展。最后一步是将学生和导师列下的所有值转换为 HH:MM:SS 格式的间隔。

这将是实现它的好方法,如果它没有为所有内容添加前缀 0 days :

pivoted.iloc[:].apply(pd.to_timedelta, unit='s')
def strfdelta(tdelta, fmt):
    d = {"days": tdelta.days}
    d["hours"], rem = divmod(tdelta.seconds, 3600)
    d["minutes"], d["seconds"] = divmod(rem, 60)
    return fmt.format(**d)


df = df.apply(
    lambda x: [
        strfdelta(
            pd.Timedelta(seconds=v),
            "{hours:02d}:{minutes:02d}:{seconds:02d}",
        )
        if pd.notna(v)
        else ""
        for v in x
    ],
)
print(df)

打印:

           Student A     Tutor
2021-04-12  00:00:52  00:01:26
2021-04-13  00:00:13  00:01:53
2021-04-14            00:00:34
Average     00:00:32  00:01:17

strfdelta函数来自Formatting timedelta objects