如何使用项目内的标签对列表中的项目进行分组？

Question

有如下列表：

list_a = [('B-DATE', '07'),('I-DATE', '/'),('I-DATE', '08'),('I-DATE', '/'),('I-DATE', '20'),('B-LAW', 'Abc'),('I-LAW', 'def'),('I-LAW', 'ghj'),('I-LAW', 'klm')]

我需要根据 list_a[x][0] 标签加入 list_a[x][1] 项："start with letter B" 并全部添加到下一个 "B-started"-标签 (list_a[x][0])：

list_b = ['07/08/20','Abcdefghjklm']

就像在 Oracle 中使用 stringagg + groupby :)

Answer 1

这是一种使用 str.startswith 进行简单迭代的方法。

例如：

list_a = [('B-DATE', '07'),('I-DATE', '/'),('I-DATE', '08'),('I-DATE', '/'),('I-DATE', '20'),('B-LAW', 'Abc'),('I-LAW', 'def'),('I-LAW', 'ghj'),('I-LAW', 'klm')]
res = []
for k, v in list_a:
    if k.startswith("B"):   #Check starts with `B`
        res.append(v)
    else:
        res[-1]+= v
print(res)

输出：

['07/08/20', 'Abcdefghjklm']

Answer 2

你可以试试下面这个：

        output = []
        for obj in list_a:
            if obj[0].startswith('B'):
                output.append(obj[1])
            else:
                output[-1] += obj[1]
        print(output)

Answer 3

这是我的变体，但我希望在 python 上有更多 "modern" 方法：

list_b = []
for i in range(len(list_a)):
    if list_a[i][0][0] == 'B':
      list_b += [list_a[i][1]]
    else:
      list_b[len(list_b)-1] += list_a[i][1]    
print(list_b)

Answer 4

一行解决方案

这是使用 列表理解 的单行答案。技巧是使用一个明显可识别的分隔符（我使用 '|||'）添加到 value 之前，每次新出现 'B'.

str(''.join([f'|||{v}' if k.startswith("B") else v for (k, v) in list_a])).split('|||')[1:]

输出:

['07/08/20', 'Abcdefghjklm']

Algorithm

Create a list of values where the values corresponding to each new occurrence of 'B' are preceded by '|||'.

Join all the items in the list into a single string.

Split the string by the separator, '|||'.

Keep all but the first element for the str.split().

如何使用项目内的标签对列表中的项目进行分组？

How to group items in list using lables inside items?

python

list

named-entity-recognition

一行解决方案