使用列表理解的元组解包失败但适用于 for 循环

Question

总结

我使用半复杂的正则表达式从网站检索数据。我遇到的问题是我必须对匹配的数据集进行一些 post 处理。

我已经将数据处理到我想要的大概 95% 以上，但是，我收到了这个我无法推理的简单错误消息；真奇怪。

我可以绕过它，但这不是重点。 我想弄清楚这是一个错误还是我从根本上忽略了一些东西我的元组拆包

背景信息

我必须克服的一件事是我每 "true match" 得到 4 个匹配项。这意味着我的 1 个单项数据分布在 4 个匹配项中。

以简单的图形形式 （稍微过于简化）:

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
   1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
   2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
   3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
   4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)

   5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
   6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
   7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
   8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)

   9: | ...
        ...
 615: | ...

我可以得到所有的数据，但我想压缩它，像这样...

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
   1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
   2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)

   3: | ...
        ...
 154: | ...

代码

有效

记下变量 abcd、e、f 和 ghij 以及我必须如何解压他们在底部的 for-loop

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]

f = [
    f
    for index, (_, _, _, _, _, f, *_)
    in enumerate(matches)
    if index % 4 == 0
]
abcd = [
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
]
e = [
    e
    for index, (_, _, _, _, e, *_)
    in enumerate(matches)
    if index % 4 == 2
]
ghij = [
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3
]

abcdefghij = zip(abcd, e, f, ghij)

for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
    print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)

#

失败

请注意，我正在尝试使用变量 a、b、c、d、e 立即解压相同的元组、f、g、h、i 和 j

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]

f = [
    f
    if f == "stable" else "preview"
    for index, (_, _, _, _, _, f, *_)
    in enumerate(matches)
    if index % 4 == 0
]
a, b, c, d = [
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
]
e = [
    e
    for index, (_, _, _, _, e, *_)
    in enumerate(matches)
    if index % 4 == 2
]
g, h, i, j = [
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3]

abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)

for a, b, c, d, e, f, g, h, i, j in abcdefghij:
    print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)

#

使用此代码，我收到以下错误消息...

... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]`
ValueError: too many values to unpack (expected 4)`

预期

我原以为这两种方法执行完全相同的逻辑，最终结果应该完全相同。

他们不是！为什么？

Answer 1

您的列表 [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1] 没有 excatly 4 个元素，这意味着尝试仅使用四个变量解压缩它会失败。

Answer 2

@PaulPanzer That appears to work. I will have to verify that everything lines up correctly. But why do I need that?

说 q 是一个可迭代对象（？）您的理解生成一个包含 26 个元组的列表，每个元组有 4 个项目。

z = [(a,b,c,d) for i, (a,b,c,d,*e) in enumerate(q)]


In [6]: len(z)
Out[6]: 26

In [7]: len(z[0])
Out[7]: 4

In [17]: z[:3]
Out[17]: [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]

当您尝试解压时，您是在尝试将 26 件物品塞入四个 names/variables

In [8]: a,b,c,d = z
Traceback (most recent call last):

  File "<ipython-input-8-64277b78f273>", line 1, in <module>
    a,b,c,d = z

ValueError: too many values to unpack (expected 4)

zip(*list_of_4_item_tuples) 会将 list_of_4_item_tuples 转置为 4 个元组，每个元组有 26 个项目

In [9]: 

In [9]: a,b,c,d = zip(*z)    # z is the result of the list comprehension shown above

In [11]: len(a),len(b),len(c),len(d)
Out[11]: (26, 26, 26, 26)

测试内容

import string
a = string.ascii_lowercase
b = string.ascii_lowercase
c = string.ascii_lowercase
d = string.ascii_lowercase
e = string.ascii_lowercase
f = string.ascii_lowercase
q = zip (a,b,c,d,e,f)

Answer 3

解决方案

当列表推导式创建元组列表，并且您想解压这些元组时，您需要使用 zip(*...)

执行以下操作

x, y, z = zip(*list_comprehension)

# To be more clear
x, y, z = zip(*[(i, j, k) for (i, j, k) in tuple_list])

# For my code, this change must be made this code
a, b, c, d = zip(*[
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
    if index % 4 == 1
])

...

# And this code
g, h, i, j = zip(*[
    (g, h, i, j)
    for index, (*_, g, h, i, j)
    in enumerate(matches)
    if index % 4 == 3
])

为什么

我们来看看下面的代码。

matches = [
    ("a1", "b1", "c1", "d1", "e1"),
    ("a2", "b2", "c2", "d2", "e2"),
    ("a3", "b3", "c3", "d3", "e3"),
    ("a4", "b4", "c4", "d4", "e4"),
    ("a5", "b5", "c5", "d5", "e5")
]

# I want a tuple of a's, b's, and c's
abc = [
    (a, b, c)
    for (a, b, c, *_)  # Ignore elements `d` and `e`
    in matches
]

print("abc =", abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# NOTE: This is a list of tuples of ones, twos, threes, fours, and fives
#       Not a's, b's, and c's!!

# I want a list of e's
e = [
    e
    for (*_, e) 
    in matches
]

print("e =", e)
# e = ['e1', 'e2', 'e3', 'e4', 'e5']
# NOTE: This is a list of e's

使用 abc 的事实是，我得到了一个、两个、三个、四个和五个的列表，而不是 a、b 和 c。

深入探讨

错误消息 ValueError: too many values to unpack 的原因是您的元组列表中有太多或太少的元组需要解包。

请记住，您有一个列表，其中包含 1、2、3、4 和 5（每个元组 5 个元素），而不是 a、b 和 c（每个元组 3 个元素）

所以这总是会失败

a, b, c = [
    (a, b, c)
    for (a, b, c, *_) 
    in matches
]

# ERROR
#    Traceback (most recent call last):
#      File "...*.py", line 11, in <module>
#        for (a, b, c, *_) in matches
#    ValueError: too many values to unpack (expected 3)

您正在尝试将这些值 [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')] 放入 3 个元组中。你不能！ list comprehension

内外需要5个元组

但是这样会成功。会出错。但是不会报错。

# This will assign 5 variables with the tuples (a, b, c) from the original tuples (a, b, c, d, e)
ones, twos, threes, fours, fives = [
    (a, b, c)
    for (a, b, c, *_) in matches
]

print("ones =", ones)
print("twos =", twos)
print("threes =", threes)
print("fours =", fours)
print("fives =", fives)

# Output
# ones = ('a1', 'b1', 'c1')
# twos = ('a2', 'b2', 'c2')
# threes = ('a3', 'b3', 'c3')
# fours = ('a4', 'b4', 'c4')
# fives = ('a5', 'b5', 'c5')

记住我们想要 ('a1', 'a2', 'a3', 'a4', 'a5')，不是 ('a1', 'b1', 'c1')

如果元组的大小为 20，那么您需要 ...sixs, sevens, .... , nineteens, twenties = [ ... ]

第一次尝试

好吧，我们希望每个元组中的所有第一个元素都放在一起。第 2 次和第 3 次也一样。所以 zip(...) 似乎是一个不错的候选人。让我们看看结果。

result = list(zip(abc))
print(result)

# list(zip(abc)) = [(('a1', 'b1', 'c1'),), (('a2', 'b2', 'c2'),), (('a3', 'b3', 'c3'),), (('a4', 'b4', 'c4'),), (('a5', 'b5', 'c5'),)]

# Let's look at what one element looks like
print(result[0])
# result[0] = (('a1', 'b1', 'c1'),)

这是错误的！

如您所见，有几件事。

奇怪的元组结构！元组内的元组。当您 zip 一个元组列表时。这是结果。
每个元组中都有错误的元素！我们得到了 ones 的列表，而不是 a

第二次尝试

嗯，zip 不适用于元组列表（按原样）。我们必须先对元组列表做一些事情

让我们看看这个...

abc = [(a, b, c) for (a, b, c, *_) in matches]

print(abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# Again, we cannot zip these

print(*abc)
# *abc = ('a1', 'b1', 'c1') ('a2', 'b2', 'c2') ('a3', 'b3', 'c3') ('a4', 'b4', 'c4') ('a5', 'b5', 'c5')
# Wait, here we have a sequence of tuples. Not a list of tuples. Just tuple after tuple after tuple.

# What happens when we zip this "sequence" of tuples?
print(list(zip(*abc)))
# list(zip(*abc)) = [('a1', 'a2', 'a3', 'a4', 'a5'), ('b1', 'b2', 'b3', 'b4', 'b5'), ('c1', 'c2', 'c3', 'c4', 'c5')]

# Great, so let's try this
a, b, c = zip(*abc)

这就是我们想要的！！

因此

因为我们可以做到以下几点。

a, b, c, d = zip(*abcd)

print("a =", a)
print("b =", b)
print("c =", c)

# Output
# a = ('a1', 'a2', 'a3', 'a4', 'a5')
# b = ('b1', 'b2', 'b3', 'b4', 'b5')
# c = ('c1', 'c2', 'c3', 'c4', 'c5')

这意味着我们可以做到这一点...

a, b, c, d = zip(*[
    (a, b, c, d)
    for index, (a, b, c, d, *_)
    in enumerate(matches)
])

使用列表理解的元组解包失败但适用于 for 循环

Tuple Unpacking with List Comprehension fails but works with for-loop

python

tuples

python-3.x

iterable-unpacking

总结

背景信息

代码

有效

失败

预期

解决方案

为什么

深入探讨

第一次尝试

第二次尝试

因此