如何将 pandas 数据框中多列的汇总汇总信息作为字符串列表?
how to make summary aggregated information from multiple columns in pandas dataframe as list of strings?
我有一个像这样的数据框:
df =
time_id gt_class num_missed_base num_missed_feature num_objects_base num_objects_feature
5G21A6P00L4100023:1566617404450336 CAR 11 4 27 30
5G21A6P00L4100023:1566617404450336 BICYCLE 4 6 27 30
5G21A6P00L4100023:1566617404450336 PERSON 2 3 27 30
5G21A6P00L4100023:1566617404450336 TRUCK 1 0 27 30
5G21A6P00L4100023:1566617428450689 CAR 25 14 60 67
5G21A6P00L4100023:1566617428450689 PERSON 7 6 60 67
5G21A6P00L4100023:1566617515950900 BICYCLE 1 1 59 65
5G21A6P00L4100023:1566617515950900 CAR 20 9 59 65
5G21A6P00L4100023:1566617515950900 PERSON 10 2 59 65
5G21A6P00L4100037:1567169649450046 CAR 8 0 29 32
5G21A6P00L4100037:1567169649450046 PERSON 1 0 29 32
5G21A6P00L4100037:1567169649450046 TRUCK 1 0 29 32
在每个 time_id
它显示基础模型中遗漏了多少对象 num_missed_base
,特征模型中遗漏了多少对象 num_missed_feature
,以及当时存在多少对象在 num_objects_base
、num_objects_feature
的基础和功能中
我需要制作以下数据框:
time_id gt_class num_missed_base num_missed_feature hover_base hover_feature
0 5G21A6P00L4100023:1566617404450336 CAR,BICYCLE,PERSON,TRUCK 18 13 ['CAR: 11', 'BICYCLE: 4', 'PERSON: 2', 'TRUCK:1] ['CAR: 4', 'BICYCLE: 6', 'PERSON: 3', 'TRUCK: 0']
1 5G21A6P00L4100023:1566617428450689 CAR,PERSON 32 20 ['CAR: 25', 'PERSON: 7'] ['CAR: 14', 'PERSON: 6']
2 5G21A6P00L4100023:1566617515950900 BICYCLE,CAR,PERSON 31 12 ['BICYCLE: 1', 'CAR: 20', 'PERSON: 10'] ['BICYCLE: 1', 'CAR: 9', 'PERSON: 2']
3 5G21A6P00L4100037:1567169649450046 CAR,PERSON,TRUCK 10 0 ['CAR: 8', 'PERSON: 1', 'TRUCK: 1'] ['CAR: 0', 'PERSON: 0', 'TRUCK: 0']
您可以按time_id分组,然后应用相关的聚合函数
参考:https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
注意:这是一个类似的更简单的例子。
import pandas as pd
df = pd.DataFrame(data={
'time_id': ['2020-01-01','2020-01-01','2020-01-01','2020-01-02','2020-01-02','2020-01-02'],
'val1': ['car', 'bicycle', 'person', 'truck', 'aeroplane', 'train'],
'val2': [0,1,2,9,8,7],
'val3': [9,2,3,4,5,6]
})
mylist = []
def func(row):
return ','.join(row.tolist())
def multi_column1(row):
l = []
for n in row.index:
x = df.loc[n, 'val1']
y = df.loc[n, 'val3']
w = '{} : {}'.format(x, y)
l.append(w)
return l
ans = df.groupby('time_id').agg({'val1':func, 'val2': sum, 'val3': multi_column1})
我有一个像这样的数据框:
df =
time_id gt_class num_missed_base num_missed_feature num_objects_base num_objects_feature
5G21A6P00L4100023:1566617404450336 CAR 11 4 27 30
5G21A6P00L4100023:1566617404450336 BICYCLE 4 6 27 30
5G21A6P00L4100023:1566617404450336 PERSON 2 3 27 30
5G21A6P00L4100023:1566617404450336 TRUCK 1 0 27 30
5G21A6P00L4100023:1566617428450689 CAR 25 14 60 67
5G21A6P00L4100023:1566617428450689 PERSON 7 6 60 67
5G21A6P00L4100023:1566617515950900 BICYCLE 1 1 59 65
5G21A6P00L4100023:1566617515950900 CAR 20 9 59 65
5G21A6P00L4100023:1566617515950900 PERSON 10 2 59 65
5G21A6P00L4100037:1567169649450046 CAR 8 0 29 32
5G21A6P00L4100037:1567169649450046 PERSON 1 0 29 32
5G21A6P00L4100037:1567169649450046 TRUCK 1 0 29 32
在每个 time_id
它显示基础模型中遗漏了多少对象 num_missed_base
,特征模型中遗漏了多少对象 num_missed_feature
,以及当时存在多少对象在 num_objects_base
、num_objects_feature
我需要制作以下数据框:
time_id gt_class num_missed_base num_missed_feature hover_base hover_feature
0 5G21A6P00L4100023:1566617404450336 CAR,BICYCLE,PERSON,TRUCK 18 13 ['CAR: 11', 'BICYCLE: 4', 'PERSON: 2', 'TRUCK:1] ['CAR: 4', 'BICYCLE: 6', 'PERSON: 3', 'TRUCK: 0']
1 5G21A6P00L4100023:1566617428450689 CAR,PERSON 32 20 ['CAR: 25', 'PERSON: 7'] ['CAR: 14', 'PERSON: 6']
2 5G21A6P00L4100023:1566617515950900 BICYCLE,CAR,PERSON 31 12 ['BICYCLE: 1', 'CAR: 20', 'PERSON: 10'] ['BICYCLE: 1', 'CAR: 9', 'PERSON: 2']
3 5G21A6P00L4100037:1567169649450046 CAR,PERSON,TRUCK 10 0 ['CAR: 8', 'PERSON: 1', 'TRUCK: 1'] ['CAR: 0', 'PERSON: 0', 'TRUCK: 0']
您可以按time_id分组,然后应用相关的聚合函数 参考:https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html 注意:这是一个类似的更简单的例子。
import pandas as pd
df = pd.DataFrame(data={
'time_id': ['2020-01-01','2020-01-01','2020-01-01','2020-01-02','2020-01-02','2020-01-02'],
'val1': ['car', 'bicycle', 'person', 'truck', 'aeroplane', 'train'],
'val2': [0,1,2,9,8,7],
'val3': [9,2,3,4,5,6]
})
mylist = []
def func(row):
return ','.join(row.tolist())
def multi_column1(row):
l = []
for n in row.index:
x = df.loc[n, 'val1']
y = df.loc[n, 'val3']
w = '{} : {}'.format(x, y)
l.append(w)
return l
ans = df.groupby('time_id').agg({'val1':func, 'val2': sum, 'val3': multi_column1})