正则表达式 python index:count

Question

我有值列表作为字符串"index:count"我想提取索引并在字符串中计数，如下面的代码所示：

          string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
          values=[v for v in re.findall('.+?:.+?.', string)]
          for g in values:
              index=g[:g.index(":")]
              count=g[g.index(":")+1:]
              print(int(index)+" "+str(count))

但我收到错误消息

ValueError: invalid literal for int() with base 10: '2 1550'

看来我把正则表达式的操作写错了。知道如何解决这个问题吗？

Answer 1

您正在尝试连接字符串和整数。

替换

print(int(index)+" "+str(count))

有

print(str(index)+" "+str(count))

您还可以简化代码。

例如：

import re
string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
values=[v for v in re.findall('.+?:.+?.', string)]
for g in values:
  index, count =g.split(":")
  print(index, count)

Answer 2

您已经在使用正则表达式 - 为什么不简单地使用分组并从中创建字典？

import re

s="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"

values= dict(re.findall('(\d+):(\d+) ?', s)) # use capturing groups

for g in values:
    print(g, values[g])

输出：

您可以方便地将 key/value 对放在字典中（全部为字符串）。你失去了你的顺序，但对于 key/values 应该没问题。

如果需要这种排序，使用findall返回的列表即可：

values = re.findall('(\d+):(\d+) ?', s) # use capturing groups

这会为您提供包含返回的匹配项的元组列表：

[('358', '6'), ('1260', '2'), ('1533', '7'), ('1548', '292'),
 ('1550', '48'), ('1561', '3'), ('1564', '186')]

Answer 3

我认为您不需要 ? 正则表达式模式末尾的惰性修饰符。 ? 你放在那里的懒惰修饰符实际上会产生比捕获正确数据更多的噪音

编辑注意： 我在之前的编辑中介绍的模式 .+:.+ 是错误的，甚至是错误的正则表达式模式，无法捕获所需的模式。请改用 \d+:\d+ 模式。但是，我保留它是因为它仍然可以使用另一种解决方法解决 OP 的问题。

只要您的数据没有格式错误或包含噪音并且用空格整齐地分隔，我认为 '.+:.+' 足以找到您的 index:count 格式。可能最好的方法是使用 \d+:\d+，因为您知道它至少是一个 digit，由 : 分隔，然后是另一个 digit.

这里有很好的链接 regexr and regex101 可以更好地 design/visualize 您的正则表达式模式。

如果您使用 .+:.+ 模式，它将 return 将整个字符串作为一个整体匹配，因为它与整个字符串相匹配。您需要对结果进行预处理，因为 re.findall return 是一个 list，在此示例中，它 return 只有 1 个元素。

In [  ]: string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
    ...: values=[v for v in re.findall('.+:.+', string)]
    ...: print(values)
['358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186']

由于它 return 是一个只有一个元素的列表，您可以使用 pop() to take the only str element out and print it nicely with str function split()。

In [  ]: print(values.pop().split())
['358:6', '1260:2', '1533:7', '1548:292', '1550:48', '1561:3', '1564:186']

如果您使用 \d+:\d+ 模式，它会直接 return 您一个很好地分隔的列表，因为它正确地找到了它们。因此，你可以直接打印它的值。

In [  ]: string="358:6 1260:2 1533:7 1548:292 1550:48 1561:3 1564:186"
    ...: values=[v for v in re.findall('\d+:\d+', string)]
    ...: print(values)
['358:6', '1260:2', '1533:7', '1548:292', '1550:48', '1561:3', '1564:186']

最后，您可以使用内置 string formatting 很好地打印结果。 免责声明：我不拥有该网站，我只是发现它对初学者很有用 :)

In [  ]: for s in values:
    ...:     index, count = s.split(":")
    ...:     print("Index: {:>8} Count: {:>8}".format(index, count))
    ...:     
Index:      358 Count:        6
Index:     1260 Count:        2
Index:     1533 Count:        7
Index:     1548 Count:      292
Index:     1550 Count:       48
Index:     1561 Count:        3
Index:     1564 Count:      186

正则表达式 python index:count

regular expression python index:count

python

regex

python-3.x

regex-greedy