在列中查找重复项,return 唯一项并在 python 中的另一列中列出其对应值

find duplicates in a column, return the unique item and list its corresponding values from another column in python

我想从第 1 列和 return 中删除重复项,在第 2 列中删除与使用 python 的每个唯一项目关联的值的相关列表。

输入是

1 2
Jack London 'Son of the Wolf'
Jack London 'Chris Farrington'
Jack London 'The God of His Fathers'
Jack London 'Children of the Frost'
William Shakespeare  'Venus and Adonis' 
William Shakespeare 'The Rape of Lucrece'
Oscar Wilde 'Ravenna'
Oscar Wilde 'Poems'

而输出应该是

1 2
Jack London 'Son of the Wolf, Chris Farrington, Able Seaman, The God of His Fathers,Children of the Frost'
William Shakespeare 'The Rape of Lucrece,Venus and Adonis' 
Oscar Wilde 'Ravenna,Poems'

其中第二列包含与每个项目关联的值的总和。 我在字典

上尝试了 set() 函数
dic={'Jack London': 'Son of the Wolf', 'Jack London': 'Chris Farrington', 'Jack London': 'The God of His Fathers'}
set(dic)

但它return只编辑了字典的第一个键

set(['Jack London'])

在Python中,一个字典的每个键只能包含一个值。但该值可以是项目的集合:

>>> d = {'Jack London': ['Son of the Wolf', 'Chris Farrington']}
>>> d['Jack London']
['Son of the Wolf', 'Chris Farrington']

要从一系列键值对构建这样的字典,您可以这样做:

dct = {}
for author, title in items:
    if author not in dct:
        # Create a new entry for the author
        dct[author] = [title]
    else:
        # Add another item to the existing entry
        dct[author].append(title)

循环体可以像这样更简洁:

dct = {}
for author, title in items:
    dct.setdefault(author, []).append(title)

您应该使用 itertools.groupby,因为您的列表已排序。

rows = [('1', '2'),
        ('Jack London', 'Son of the Wolf'),
        ('Jack London', 'Chris Farrington'),
        ('Jack London', 'The God of His Fathers'),
        ('Jack London', 'Children of the Frost'),
        ('William Shakespeare', 'Venus and Adonis'),
        ('William Shakespeare', 'The Rape of Lucrece'),
        ('Oscar Wilde', 'Ravenna'),
        ('Oscar Wilde', 'Poems')]
# I'm not sure how you get here, but that's where you get

from itertools import groupby
from operator import itemgetter

grouped = groupby(rows, itemgetter(0))
result = {group:', '.join([value[1] for value in values]) for group, values in grouped}

这为您提供了以下结果:

In [1]: pprint(result)
{'1': '2',
 'Jack London': 'Son of the Wolf, Chris Farrington, The God of His Fathers, '
                'Children of the Frost',
 'Oscar Wilde': 'Ravenna, Poems',
 'William Shakespeare': 'Venus and Adonis, The Rape of Lucrece'}