如何链接,然后 "unchain" 嵌套列表?

How to chain, and then "unchain" a nested list?

我有一个需要链接的嵌套列表,然后是 运行 指标,然后“解除链接”回到其原始嵌套格式。下面是示例数据来说明:

from itertools import chain

nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))
print("chained_list: \n", chained_list)
metrics_list = [str(chained_list[x]) +'_score' \
    for x in range(len(chained_list))]
print("metrics_list: \n", metrics_list) 
zipped_scores = list(zip(chained_list, metrics_list))
print("zipped_scores: \n", zipped_scores)

unchain_function = '????'

chained_list: 
 ['x', 'xx', 'xxx', 'yy', 'yyy', 'y', 'yyyy', 'zz', 'z']
metrics_list: 
 ['x_score', 'xx_score', 'xxx_score', 'yy_score', 'yyy_score', 'y_score', 'yyyy_score', 'zz_score', 'z_score']
zipped_scores: 
 [('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]

是否有 python 函数或 pythonic 方法来编写“unchain_function”以获得所需的输出?

[
    [
        ('x', 'x_score'), 
        ('xx', 'xx_score'), 
        ('xxx', 'xxx_score')
    ],
    [
        ('yy', 'yy_score'), 
        ('yyy', 'yyy_score'), 
        ('y', 'y_score'),
        ('yyyy', 'yyyy_score')
    ],
    [
        ('zz', 'zz_score'), 
        ('z', 'z_score')
    ]
]

(背景:这是针对长度大于 100,000 的列表的 运行ning 指标)

我不知道这有多 pythonic,但这应该可行。长话短说,我们正在使用 Wrapper class 将不可变原语(如果不替换就无法更改)变成可变变量(因此我们可以对同一个变量有多个引用,每个组织方式不同)。

我们创建了一个相同的嵌套列表,只是每个值都是原始列表中相应值的 Wrapper。然后,我们应用相同的转换来解开包装列表。将更改从已处理的链表复制到链式包装器列表,然后从嵌套包装器列表访问这些更改并解包它们。

我认为使用名为 Wrapper 的明确而简单的 class 更容易理解,但是您可以通过使用单例列表来包含变量而不是实例来做本质上相同的事情Wrapper.

from itertools import chain

nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))

metrics_list = [str(chained_list[x]) +'_score' for x in range(len(chained_list))]
zipped_scores = list(zip(chained_list, metrics_list))

# create a simple Wrapper class, so we can essentially have a mutable primitive.
# We can put the Wrapper into two different lists, and modify its value without
# overwriting it.
class Wrapper:
    def __init__(self, value):
        self.value = value

# create a 'duplicate list' of the nested and chained lists, respectively, 
# such that each element of these lists is a Wrapper of the corresponding
# element in the above lists
nested_wrappers = [[Wrapper(elem) for elem in sublist] for sublist in nested_list]
chained_wrappers = list(chain(*nested_wrappers))

# now we have two references to the same MUTABLE Wrapper for each element of 
# the original lists - one nested, and one chained. If we change a property
# of the chained Wrapper, the change will reflect on the corresponding nested
# Wrapper. Copy the changes from the zipped scores onto the chained wrappers
for score, wrapper in zip(zipped_scores, chained_wrappers):
    wrapper.value = score

# then extract the values in the unchained list of the same wrappers, thus
# preserving both the changes and the original nested organization
unchained_list = [[wrapper.value for wrapper in sublist] for sublist in nested_wrappers]

这以 unchained_list 等于以下内容结束:

[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')], [('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score')], [('zz', 'zz_score'), ('z', 'z_score')]]

我认为您只是想根据某些条件对数据进行分组,即每个元组中第一个索引的第一个字母。

给定

您的扁平压缩数据:

data = [
    ('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'),
    ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'),
    ('zz', 'zz_score'), ('z', 'z_score')
]

代码

[list(g) for _, g in itertools.groupby(data, key=lambda x: x[0][0])]

输出

[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')],
 [('yy', 'yy_score'),
  ('yyy', 'yyy_score'),
  ('y', 'y_score'),
  ('yyyy', 'yyyy_score')],
 [('zz', 'zz_score'), ('z', 'z_score')]]

另请参阅

  • post 关于此工具的工作原理

你已经把算法弄得很复杂了,你可以通过下面显示的简单步骤来完成:

  • 首先创建一个所需大小的空嵌套列表

    formatted_list = [[] for _ in range(3)]

  • 只需遍历列表并相应地格式化

    对于范围 (0,3) 中的 K:

          for i in nested_list[K]:
    
              formatted_list[K].append(i + '_score')
    
          print([formatted_list])
    

这是获得所需输出的简单方法。

nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
zipped_scores = 
 [('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]


zipped_scores_iter = iter(zipped_scores)
unchained_list = [[next(zipped_scores_iter) for x in sublist] for sublist in nested_list]

注意:通过以下列表理解,我们可以精确地复制 nested_list

[[x for x in sublist] for sublist in nested_list]

我们有结构。我们要做的就是将原来的 x 换成新值:

[[corresponding_value_for(x) for x in sublist] for sublist in nested_list]

我认为接受的答案采用了相同的方法,但使用了更复杂的方法来获取相应的值。

输入 (nested_list) 和所需值 (zipped_scores) 之间已经存在 one-to-one 对应关系,由它们的顺序给出。因此,我们可以通过从迭代器中拉出下一个项目,将 x 替换为 zipped_scores 中的相应元素。

[[next(zipped_scores_iter) for x in sublist] for sublist in nested_list]

顺便说一句,虽然在这种情况下似乎不需要展平列表来获得所需的输出,但我遇到了类似的问题,展平然后 re-grouping 很有用(发送外部过程的一批输入)。这是我的方法。