如何将 pandas 数据框中多列的汇总汇总信息作为字符串列表?

how to make summary aggregated information from multiple columns in pandas dataframe as list of strings?

我有一个像这样的数据框:

df = 
                               time_id gt_class  num_missed_base  num_missed_feature  num_objects_base  num_objects_feature
   5G21A6P00L4100023:1566617404450336      CAR               11                   4                27                   30
   5G21A6P00L4100023:1566617404450336  BICYCLE                4                   6                27                   30
   5G21A6P00L4100023:1566617404450336   PERSON                2                   3                27                   30
   5G21A6P00L4100023:1566617404450336    TRUCK                1                   0                27                   30
   5G21A6P00L4100023:1566617428450689      CAR               25                  14                60                   67
   5G21A6P00L4100023:1566617428450689   PERSON                7                   6                60                   67
   5G21A6P00L4100023:1566617515950900  BICYCLE                1                   1                59                   65
   5G21A6P00L4100023:1566617515950900      CAR               20                   9                59                   65
   5G21A6P00L4100023:1566617515950900   PERSON               10                   2                59                   65
   5G21A6P00L4100037:1567169649450046      CAR                8                   0                29                   32
   5G21A6P00L4100037:1567169649450046   PERSON                1                   0                29                   32
   5G21A6P00L4100037:1567169649450046    TRUCK                1                   0                29                   32

在每个 time_id 它显示基础模型中遗漏了多少对象 num_missed_base,特征模型中遗漏了多少对象 num_missed_feature,以及当时存在多少对象在 num_objects_basenum_objects_feature

的基础和功能中

我需要制作以下数据框:

    time_id                             gt_class                    num_missed_base     num_missed_feature   hover_base                                                  hover_feature
0   5G21A6P00L4100023:1566617404450336  CAR,BICYCLE,PERSON,TRUCK    18                  13           ['CAR: 11', 'BICYCLE: 4', 'PERSON: 2', 'TRUCK:1]          ['CAR: 4', 'BICYCLE: 6', 'PERSON: 3', 'TRUCK: 0']
1   5G21A6P00L4100023:1566617428450689  CAR,PERSON                  32                  20           ['CAR: 25', 'PERSON: 7']                                ['CAR: 14', 'PERSON: 6']
2   5G21A6P00L4100023:1566617515950900  BICYCLE,CAR,PERSON          31                  12      ['BICYCLE: 1', 'CAR: 20', 'PERSON: 10']                    ['BICYCLE: 1', 'CAR: 9', 'PERSON: 2']
3   5G21A6P00L4100037:1567169649450046  CAR,PERSON,TRUCK            10                   0      ['CAR: 8', 'PERSON: 1', 'TRUCK: 1']                 ['CAR: 0', 'PERSON: 0', 'TRUCK: 0']

您可以按time_id分组,然后应用相关的聚合函数 参考:https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html 注意:这是一个类似的更简单的例子。

import pandas as pd

df = pd.DataFrame(data={
    'time_id': ['2020-01-01','2020-01-01','2020-01-01','2020-01-02','2020-01-02','2020-01-02'],
    'val1': ['car', 'bicycle', 'person', 'truck', 'aeroplane', 'train'],
    'val2': [0,1,2,9,8,7],
    'val3': [9,2,3,4,5,6]
})

mylist = []
def func(row):
    return ','.join(row.tolist())

def multi_column1(row):
    l = []
    for n in row.index:
        x = df.loc[n, 'val1']
        y = df.loc[n, 'val3']
        w = '{} : {}'.format(x, y)
        l.append(w)
    return l
ans = df.groupby('time_id').agg({'val1':func, 'val2': sum, 'val3': multi_column1})