从一行中获取子字符串

Question

通常我用C#写

如何切割字符串？我有这一行：

Line 58: Oct  6 16:58:03 INTEG_245 sia_server[6830]: DbsinkConsumer.cc:262: (D) <video> 07920E: Got msg_idx=28 for evt_id=436752

我需要剪掉 28 吗？

这是我使用的代码：

  if (str(line).find("msg_idx=") > 0):  
    msg_id = line[line.index("Got"):line.index("For")]

出现错误：

sg_id = line[line.index("Got"):line.index("For")]
ValueError: substring not found

很高兴举个例子

Answer 1

您可以使用 regular expressions:

>>> import re
>>> s= 'Line 58: Oct  6 16:58:03 INTEG_245 sia_server[6830]: DbsinkConsumer.cc:262: (D) <video> 07920E: Got msg_idx=28 for evt_id=436752'
>>> print int(re.search(r'msg_idx=(\d+)', s).group(1))
28

...其中 re.search() 搜索表达式 'msg_idx='，它前面是 r 表示它是一个带有转义序列的 RE，后面是捕获组 ( )，以后可以参考。里面的 \d+ 表示至少一个数字字符。那么group(1)指的是位置1的指定捕获组。

Answer 2

这不是使用 line.index(example_word) 的好方法，因为您的 txext 中可能有很多 example_word，索引只是 return 第一个匹配项的索引。您可以使用 re.sub 和积极的 look-behind 作为更有效的方式：

>>> s="Line 58: Oct  6 16:58:03 INTEG_245 sia_server[6830]: DbsinkConsumer.cc:262: (D) <video> 07920E: Got msg_idx=28 for evt_id=436752"
>>> re.sub(r'(?<=msg_idx=)\d+','',s)
'Line 58: Oct  6 16:58:03 INTEG_245 sia_server[6830]: DbsinkConsumer.cc:262: (D) <video> 07920E: Got msg_idx= for evt_id=436752'

如果你想得到 28 你可以使用 re.search :

>>> s="Line 58: Oct  6 16:58:03 INTEG_245 sia_server[6830]: DbsinkConsumer.cc:262: (D) <video> 07920E: Got msg_idx=28 for evt_id=436752"
>>> re.search(r'(?<=msg_idx=)\d+',s).group(0)
'28'
#or just use grouping :
>>> re.search(r'msg_idx=(\d+)',s).group(1)
'28'

从一行中获取子字符串

Getting a sub string from a line

python

python-2.7