如何调试传递给 grep 的 (PCRE) 正则表达式?
How to debug a (PCRE) regex passed to grep?
我正在尝试调试传递给 grep
的正则表达式,它似乎只在我的系统上不起作用。
这是应该 return 最新 terraform 发布版本的完整命令:
wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": "v\K.*?(?=")'
这似乎对其他人有效,但对我无效。
在 "tag_name":
之后添加一个 *
量词以匹配额外的空格使其适用于我:
wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": *"v\K.*?(?=")'
这是 wget
的响应,没有管道到 grep
:
{
"url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583",
"assets_url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583/assets",
"upload_url": "https://uploads.github.com/repos/hashicorp/terraform/releases/20814583/assets{?name,label}",
"html_url": "https://github.com/hashicorp/terraform/releases/tag/v0.12.12",
"id": 20814583,
"node_id": "MDc6UmVsZWFzZTIwODE0NTgz",
"tag_name": "v0.12.12",
"target_commitish": "master",
"name": "",
"draft": false,
"author": {
"login": "apparentlymart",
"id": 20180,
"node_id": "MDQ6VXNlcjIwMTgw",
"avatar_url": "https://avatars1.githubusercontent.com/u/20180?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/apparentlymart",
"html_url": "https://github.com/apparentlymart",
"followers_url": "https://api.github.com/users/apparentlymart/followers",
"following_url": "https://api.github.com/users/apparentlymart/following{/other_user}",
"gists_url": "https://api.github.com/users/apparentlymart/gists{/gist_id}",
"starred_url": "https://api.github.com/users/apparentlymart/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/apparentlymart/subscriptions",
"organizations_url": "https://api.github.com/users/apparentlymart/orgs",
"repos_url": "https://api.github.com/users/apparentlymart/repos",
"events_url": "https://api.github.com/users/apparentlymart/events{/privacy}",
"received_events_url": "https://api.github.com/users/apparentlymart/received_events",
"type": "User",
"site_admin": false
},
"prerelease": false,
"created_at": "2019-10-18T18:39:16Z",
"published_at": "2019-10-18T18:45:33Z",
"assets": [],
"tarball_url": "https://api.github.com/repos/hashicorp/terraform/tarball/v0.12.12",
"zipball_url": "https://api.github.com/repos/hashicorp/terraform/zipball/v0.12.12",
"body": "BUG FIXES:\r\n\r\n* backend/remote: Don't do local validation of whether variables are set prior to submitting, because only the remote system knows the full set of configured stored variables and environment variables that might contribute. This avoids erroneous error messages about unset required variables for remote runs when those variables will be set by stored variables in the remote workspace. ([#23122](https://github.com/hashicorp/terraform/issues/23122))"
}
并且使用 https://regex101.com 我可以看到 "tag_name": "v\K.*?(?=")
和 "tag_name": *"v\K.*?(?=")
都正确匹配版本号。
所以我的系统一定有问题,我很好奇为什么原来的系统对我不起作用以及如何(如果可能)在这种情况下进行调试。
很有可能你的RegExp引擎不理解\K。正则表达式有很多方言。
使用标准 PCRE 正则表达式术语通常会在所有引擎上产生良好的结果。
$ curl -s "https://api.github.com/repos/hashicorp/terraform/releases/latest" | egrep -oe '"tag_name": "v(.*)"'
"tag_name": "v0.12.12"
现在,如果您只想要版本号,则需要在之后获取版本号(因为使用 ?! 忽略模式可能并不总是有效)。
curl -s "https://api.github.com/repos/hashicorp/terraform/releases/latest" | egrep -oe '"tag_name": "v(.*)"' | egrep -oe '([0-9]+\.?)+'
0.12.12
我已经能够将其缩小为以下内容。如果我在没有管道 grep 且没有格式化 json 响应的情况下执行 wget
命令:
wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest"
然后我得到一个 json 没有任何白色 spaces(我将 post 只有一部分响应):
"html_url":"https://github.com/hashicorp/terraform/releases/tag/v0.12.12","id":20814583,"node_id":"MDc6UmVsZWFzZTIwODE0NTgz","tag_name":"v0.12.12","target_commitish":"master","name":"","draft":false
所以自然地原始正则表达式 "tag_name": "v\K.*?(?=")
失败了,因为 :
之后没有 space
这显然与传递给 grep 的正则表达式或 grep 本身无关。我看不出在这里深入研究响应本身有什么意义,因此可以认为原始问题已解决(尽管如果有人知道可能导致此问题的原因,请 post 发表评论。)
我正在尝试调试传递给 grep
的正则表达式,它似乎只在我的系统上不起作用。
这是应该 return 最新 terraform 发布版本的完整命令:
wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": "v\K.*?(?=")'
这似乎对其他人有效,但对我无效。
在 "tag_name":
之后添加一个 *
量词以匹配额外的空格使其适用于我:
wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": *"v\K.*?(?=")'
这是 wget
的响应,没有管道到 grep
:
{
"url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583",
"assets_url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583/assets",
"upload_url": "https://uploads.github.com/repos/hashicorp/terraform/releases/20814583/assets{?name,label}",
"html_url": "https://github.com/hashicorp/terraform/releases/tag/v0.12.12",
"id": 20814583,
"node_id": "MDc6UmVsZWFzZTIwODE0NTgz",
"tag_name": "v0.12.12",
"target_commitish": "master",
"name": "",
"draft": false,
"author": {
"login": "apparentlymart",
"id": 20180,
"node_id": "MDQ6VXNlcjIwMTgw",
"avatar_url": "https://avatars1.githubusercontent.com/u/20180?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/apparentlymart",
"html_url": "https://github.com/apparentlymart",
"followers_url": "https://api.github.com/users/apparentlymart/followers",
"following_url": "https://api.github.com/users/apparentlymart/following{/other_user}",
"gists_url": "https://api.github.com/users/apparentlymart/gists{/gist_id}",
"starred_url": "https://api.github.com/users/apparentlymart/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/apparentlymart/subscriptions",
"organizations_url": "https://api.github.com/users/apparentlymart/orgs",
"repos_url": "https://api.github.com/users/apparentlymart/repos",
"events_url": "https://api.github.com/users/apparentlymart/events{/privacy}",
"received_events_url": "https://api.github.com/users/apparentlymart/received_events",
"type": "User",
"site_admin": false
},
"prerelease": false,
"created_at": "2019-10-18T18:39:16Z",
"published_at": "2019-10-18T18:45:33Z",
"assets": [],
"tarball_url": "https://api.github.com/repos/hashicorp/terraform/tarball/v0.12.12",
"zipball_url": "https://api.github.com/repos/hashicorp/terraform/zipball/v0.12.12",
"body": "BUG FIXES:\r\n\r\n* backend/remote: Don't do local validation of whether variables are set prior to submitting, because only the remote system knows the full set of configured stored variables and environment variables that might contribute. This avoids erroneous error messages about unset required variables for remote runs when those variables will be set by stored variables in the remote workspace. ([#23122](https://github.com/hashicorp/terraform/issues/23122))"
}
并且使用 https://regex101.com 我可以看到 "tag_name": "v\K.*?(?=")
和 "tag_name": *"v\K.*?(?=")
都正确匹配版本号。
所以我的系统一定有问题,我很好奇为什么原来的系统对我不起作用以及如何(如果可能)在这种情况下进行调试。
很有可能你的RegExp引擎不理解\K。正则表达式有很多方言。
使用标准 PCRE 正则表达式术语通常会在所有引擎上产生良好的结果。
$ curl -s "https://api.github.com/repos/hashicorp/terraform/releases/latest" | egrep -oe '"tag_name": "v(.*)"'
"tag_name": "v0.12.12"
现在,如果您只想要版本号,则需要在之后获取版本号(因为使用 ?! 忽略模式可能并不总是有效)。
curl -s "https://api.github.com/repos/hashicorp/terraform/releases/latest" | egrep -oe '"tag_name": "v(.*)"' | egrep -oe '([0-9]+\.?)+'
0.12.12
我已经能够将其缩小为以下内容。如果我在没有管道 grep 且没有格式化 json 响应的情况下执行 wget
命令:
wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest"
然后我得到一个 json 没有任何白色 spaces(我将 post 只有一部分响应):
"html_url":"https://github.com/hashicorp/terraform/releases/tag/v0.12.12","id":20814583,"node_id":"MDc6UmVsZWFzZTIwODE0NTgz","tag_name":"v0.12.12","target_commitish":"master","name":"","draft":false
所以自然地原始正则表达式 "tag_name": "v\K.*?(?=")
失败了,因为 :
这显然与传递给 grep 的正则表达式或 grep 本身无关。我看不出在这里深入研究响应本身有什么意义,因此可以认为原始问题已解决(尽管如果有人知道可能导致此问题的原因,请 post 发表评论。)