如何按相同字段合并字典列表并在此过程中对另一个字段求​​和?

How do I merge a list of dictionaries by an identical field and sum another field in the process?

试图通过url字段合并字典列表,如果列表中有相同的字典项目,将通过该字段合并相同的字典,同时添加另一个字段的总和时间.

我试过使用 'setdefault',但它并不总能按预期工作。 运行 循环后我仍然得到重复的结果。

这是我试图用添加的第二个字段的总和来压缩的字典列表,以获得存在相同 url 的总和:

[
  ['https://www.website.com/directory/link-1',
  21,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-1',
  185,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  'String 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],

  ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  'String 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],

  ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  'String 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]
]

这是我想要得到的结果:

[
 ['https://www.website.com/directory/link-1',
  206,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  'String 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],

  ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  'String 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],

  ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  'String 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]
]

我在努力

for url, long_text, number_to_count, another_field, ..., ... in list:
    d = {}
    d.setdefault(url, {}).setdefault("long text", []).append(long_text)
    d[url].setdefault("number_to_count",[]).append(number_to_count)
    d[url].setdefault("another_field",[]).append(another_field)

您可以尝试以下方法。它基本上将来自 lst 的子列表按第一个 URL 分组到列表的 defaultdict 中,然后仅对第二个项目编号求和来构建新结果。

from collections import defaultdict
from pprint import pprint

lst = ...

d = defaultdict(list)
for item in lst:
    d[item[0]].append(item)

result = [[v[0][0]] + [sum(x[1] for x in v)] + v[0][2:] for v in d.values()]

pprint(result)

输出:

[['https://www.website.com/directory/link-1',
  206,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],
 ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],
 ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],
 ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]]

如果你想使用 pandas 你可以得到类似下面的东西:

                                       Page  Count               Text    String                                         Url  Magic
0  https://www.website.com/directory/link-1     21  Long Text Field 1  String 1  https://www.website.com/images/image-1.jpg    255
1  https://www.website.com/directory/link-1    185  Long Text Field 1  String 1  https://www.website.com/images/image-1.jpg    255
2  https://www.website.com/directory/link-2    296  Long Text Field 2      None  https://www.website.com/images/image-2.jpg    303
3  https://www.website.com/directory/link-3    354  Long Text Field 3      None  https://www.website.com/images/image-3.jpg    388
4  https://www.website.com/directory/link-4    606  Long Text Field 4      None  https://www.website.com/images/image-4.jpg    624

----

                                       Page  Count  Magic    String                                         Url               Text
0  https://www.website.com/directory/link-1    206    255  String 1  https://www.website.com/images/image-1.jpg  Long Text Field 1
1  https://www.website.com/directory/link-2    296    303      None  https://www.website.com/images/image-2.jpg  Long Text Field 2
2  https://www.website.com/directory/link-3    354    388      None  https://www.website.com/images/image-3.jpg  Long Text Field 3
3  https://www.website.com/directory/link-4    606    624      None  https://www.website.com/images/image-4.jpg  Long Text Field 4

通过 运行 下面的代码。请注意,我必须为缺失的字符串添加虚拟值,因为您的数据格式有些不一致。

import pandas as pd

data = [
  ['https://www.website.com/directory/link-1',
  21,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-1',
  185,
  'Long Text Field 1',
  'String 1',
  {'url': 'https://www.website.com/images/image-1.jpg'},
  255],

  ['https://www.website.com/directory/link-2',
  296,
  'Long Text Field 2',
  {'url': 'https://www.website.com/images/image-2.jpg'},
  303],

  ['https://www.website.com/directory/link-3',
  354,
  'Long Text Field 3',
  {'url': 'https://www.website.com/images/image-3.jpg'},
  388],

  ['https://www.website.com/directory/link-4',
  606,
  'Long Text Field 4',
  {'url': 'https://www.website.com/images/image-4.jpg'},
  624]
]
columns = ['Page', 'Count', 'Text', 'String', 'Url', 'Magic']

for d in data:
    if len(d) != 6:
        d.insert(3, None)
    d[4] = d[4]['url']
df = pd.DataFrame(data, columns=columns)


agg = dict.fromkeys(columns, 'first')
agg.update({'Count': 'sum'})
del agg['Page']
df2 = df.groupby(['Page'], as_index=False).agg(agg)

pd.options.display.width = 0
print df
print '\n----\n'
print df2