如何编写正则表达式从特定单词中提取前几个字符，没有或没有结束定界符？

Question

我有以下字符串，想提取前几个字符直到单词结尾或直到 "Response"

<ns2:GetJobStatus
<ns10:JobIDResponse
<ns2:JobStatusResponse
<ns3:GetJobId

我想要正则表达式，以便我可以从以上所有行中提取 GetJobStatus 和 GetJobID。我想从结果中删除 "Response"，这样在上面的示例中我会得到 2 个。这是在 splunk 中，所以我不能使用 awk 或 sed 或任何其他 unix /linux 命令。

这是我到目前为止所做的

<ns\d+:(?P<ws_name>.+?)(?:Response)

使用上面的方法，我只能在有 "Response"

的地方提取

Answer 1

您的开端不错。在 ws_name 组之后，您需要找到的是单词 Response 或 a word boundary。因此，您需要做的就是在您的非捕获组中添加 |\b：

<ns\d+:(?P<ws_name>.+?)(?:Response|\b)

这是 demo。

参考文献：

Alternation in Regular Expressions.

Answer 2

有了lookbehind和lookahead，你应该能够通过模式得到你想要的结果

(?<=:)(\w+?)(?=Response|\b|$)

您会对捕获组 (\w+?) 感兴趣，因为它会出现在“:”字符之后和单词 "Response" 之前。 "\b|$" 设置单词边界或行尾。

测试

How to write Regex to extract first few characters from specific word without or without ending delimiters?