PYTHON 2.7 - 修改列表列表并重新组装而不改变

Question

我目前有一个列表列表，如下所示：

My_List = [[This, Is, A, Sample, Text, Sentence] [This, too, is, a, sample, text] [finally, so, is, this, one]]

现在我需要做的是 "tag" 这些词中的每一个都带有 3 个标签之一，在本例中是任意标签，例如 "EE"、"FF" 或 "GG" 根据单词所在的列表，然后将它们重新组合成它们进入时的相同顺序。我的最终代码需要如下所示：

GG_List = [This, Sentence]
FF_List = [Is, A, Text]
EE_List = [Sample]

My_List = [[(This, GG), (Is, FF), (A, FF), (Sample, "EE), (Text, FF), (Sentence, GG)] [*same with this sentence*] [*and this one*]]

我通过使用 for 循环将每个项目变成一个字典来尝试这个，但是这些字典随后被它们的标签重新排列，遗憾的是由于这件事的性质而无法发生......实验需要一切保持相同的顺序，因为最终我需要测量标签相对于其他标签的接近度，但只能在同一个句子（列表）中。

~~我想过用 NLTK 来做这个（我对它没有什么经验），但它看起来比我需要的要复杂得多，而且标签不容易被新手定制，比如我自己。~~

我认为这可以通过迭代这些项目中的每一个来完成，使用 if 语句，因为我必须确定它们应该有什么标签，然后用单词及其关联的标签制作一个元组，这样它就不会' 在其列表中移动。

我设计了这个..但我不知道如何重建我的列表列表并保持它们的顺序:(.

for i in My_List: #For each list in the list of lists
    for h in i:   #For each item in each list
         if h in GG_List:  # Check for the tag
            MyDicts = {"GG":h for h in i}  #Make Dict from tag + word

非常感谢您的帮助！

Answer 1

将标签放入字典中可行：

My_List = [['This', 'Is', 'A', 'Sample', 'Text', 'Sentence'],
           ['This', 'too', 'is', 'a', 'sample', 'text'],
           ['finally', 'so', 'is', 'this', 'one']]
GG_List = ['This', 'Sentence']
FF_List = ['Is', 'A', 'Text']
EE_List = ['Sample']

zipped = zip((GG_List, FF_List, EE_List), ('GG', 'FF', 'EE'))
tags = {item: tag for tag_list, tag in zipped for item in tag_list}
res = [[(word, tags[word]) for word in entry if word in tags] for entry in My_List]

现在：

>>> res
[[('This', 'GG'),
  ('Is', 'FF'),
  ('A', 'FF'),
  ('Sample', 'EE'),
  ('Text', 'FF'),
  ('Sentence', 'GG')],
 [('This', 'GG')],
 []]

Answer 2

字典按键值对工作。每个键都被分配了一个值。要搜索字典，您可以通过关键字搜索索引，例如

>>> d = {1:'a', 2:'b', 3:'c'}
>>> d[1]
'a'

在上述情况下，我们总是通过关键字（即整数）搜索字典。

如果您想为每个词分配 tag/label，您将通过关键字搜索并找到 "value"，即 tag/label，因此您的字典必须看起来像这样（假设字符串是单词和数字 tag/label）：

>>> d = {'a':1, 'b':1, 'c':3}
>>> d['a']
1
>>> sent = 'a b c a b'.split()
>>> sent
['a', 'b', 'c', 'a', 'b']
>>> [d[word] for word in sent]
[1, 1, 3, 1, 1]

这样，当您使用列表推导式遍历单词并找到合适的标签时，标签的顺序将遵循单词的顺序。

所以当你用错误的方式索引初始字典时，问题就来了，即 key -> labels，value -> words，例如：

>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']}
>>> [d[word] for word in sent]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'a'

然后你将不得不反转你的字典，假设你的值列表中的所有元素都是唯一的，你可以这样做：

>>> from collections import ChainMap
>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']}
>>> d_inv = dict(ChainMap(*[{value:key for value in values} for key, values in d.items()]))
>>> d_inv
{'h': 2, 'c': 3, 'a': 1, 'x': 3, 'b': 2, 'd': 1}

但需要注意的是 ChainMap 仅在 Python3.5 中可用（这是升级 Python 的另一个原因；P）。对于Python <3.5，解决方案见How do I merge a list of dicts into a single dict?.

所以回到将 labels/tags 分配给单词的问题，假设我们有这些输入：

>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']}
>>> sent = 'a b c a b'.split()

首先，我们反转字典（假设每个单词及其 tag/label:

都有一对一的映射

>>> d_inv = dict(ChainMap(*[{value:key for value in values} for key, values in d.items()]))

然后，我们通过列表理解将标签应用于单词：

>>> [d_inv[word] for word in sent]
[1, 2, 3, 1, 2]

对于多个句子：

>>> sentences = ['a b c'.split(), 'h a x'.split()]
>>> [[d_inv[word] for word in sent] for sent in sentences]
[[1, 2, 3], [2, 1, 3]]

PYTHON 2.7 - 修改列表列表并重新组装而不改变

PYTHON 2.7 - Modifying List of Lists and Re-Assembling Without Mutating

dictionary

iterator

nested-lists

python-2.7

listiterator