通过 sed 排除特定字符串 '[[' 的正则表达式
A regular expression to exclude a specific string '[[' via sed
我需要在文件中使用 sed 获取介于“[[”和“]]”之间的字符串:response.txt
x-content-type-options: nosniff
x-server-response-time: 63
x-dropbox-request-id: 84e52618f83eda15cb6d96eb4f601f45
pragma: no-cache
cache-control: no-cache
x-dropbox-http-protocol: None
x-frame-options: SAMEORIGIN
{"has_more": false, "cursor": "AAEynx2q5KMgkcOwL2dKZ4MCYxNTtsdA950A5kYOdjWFln_RYuAokMnJCOb85B7idOHjycS8LJye3BhWfezTkkoprVxhgMNni_Bg04A-JO9fLmqIGO3CYInBQPmNUXL57S32ECWwA-CYu1CiLi5ujTDz", "entries": [["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]], "reset": true}
并且想使用命令sed
来获取字符串,结果如下:
[["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]]
我运行终端中的命令:
$ sed -n 's/.*"entries": *\(\[\[.*\]\]\)//p' /tmp/response.txt
并得到结果:
[["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]], "reset": true}
然后,我运行终端中的命令:
$ sed -n 's/.*"entries": *\(\[\[(?!\]\].)*\]\]\)//p' /tmp/response.txt
而return什么都没有。
好像我写错了正则表达式?我能怎么做?谢谢!
避免使用正则表达式解析 JSON。使用合适的解析器。
如果您jq
安装了:
awk -v RS="" "END {print}" response.txt | jq -c '.["entries"]'
[["/test",{"revision":45545,"root":"dropbox","size":"0 bytes","modified":"Fri, 22 May 2015 05:53:27 +0000","rev":"b1e9026cf6f4","thumb_exists":false,"path":"/TEST","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0}],["/TEST/test-file-01",{"revision":45549,"root":"dropbox","size":"0 bytes","modified":"Fri, 22 May 2015 06:15:33 +0000","rev":"b1ed026cf6f4","thumb_exists":false,"path":"/test/test-file-01","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0}]]
或ruby:
ruby -rjson -e '
data = (File.readlines(ARGV.shift))[-1]
json = JSON.parse(data)
puts JSON.generate(json["entries"])
' response.txt
[["/test",{"rev":"b1e9026cf6f4","thumb_exists":false,"path":"/TEST","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0,"modified":"Fri, 22 May 2015 05:53:27 +0000","size":"0 bytes","root":"dropbox","revision":45545}],["/TEST/test-file-01",{"rev":"b1ed026cf6f4","thumb_exists":false,"path":"/test/test-file-01","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0,"modified":"Fri, 22 May 2015 06:15:33 +0000","size":"0 bytes","root":"dropbox","revision":45549}]]
或您选择的任何实现 JSON 解析器的语言。
这可能适合您 (GNU sed):
sed '/\n/!{s/\[\[/\n&/g;s/\]\]/&\n/g};/^\[\[/P;D' file
如果模式 space 不包含 \n
则将 \n
添加到所有 [[
字符串并将 \n
添加到所有 ]]
字符串。如果模式 space 以 [[
开头,则打印到以下 \n
(或模式结尾 space)。删除下一个 \n
(或模式末尾 space)并重复直到模式 space 为空。
N.B。这只会打印以所需字符串([[
或]]
)开始和结束的换行符之间的字符串。
sed 识别 Posix 正则表达式,它不包括像 (?!
.
这样的环视断言
幸运的是,为这个简单的案例编写正则表达式很容易(像往常一样,它不太容易阅读):
sed -n 's/.*"entries": *\(\[\[\(]\?[^]]\)*]]\)//p' /tmp/response.txt
但是,并不是贪婪匹配导致了您最初尝试的问题。问题是您没有丢弃匹配项之后的行的内容。你想要的是:
sed -n 's/.*"entries": *\(\[\[\(]\?[^]]\)*]]\).*//p' /tmp/response.txt
sed
使用“基本”Posix 正则表达式 (BRE) 这一事实意味着您最终会遇到很多反斜杠。我试图至少删除其中一些,使用 ]
在正则表达式中 不是 特殊的事实,除非它关闭字符 class。但总的来说,我认为使用 grep
会更好地满足您的需求,它有一个 Posix 标准选项来使用“扩展”(正常)正则表达式(ERE),以及一个仅打印的选项输出匹配的字符串:
grep -oE '"entries": \[\[(]?[^]])*]]' /tmp/response.txt | cut -d ' ' -f2-
(最后的cut
是去掉"entries":
)
正则表达式的解释
正则表达式(ERE 形式)包括:
\[\[ match [[
(
]? possibly a single ]
[^]] anything but a ]
)* repeated as many times as necessary
]] match ]]
重复组将匹配 ]
后跟 anthing 但 ]
,或者它将匹配除 ]
以外的任何内容。实际上,它(几乎)是 ]]
.
的否定
(这不完全是否定,因为它不会匹配字符串末尾的单个 ]
,但这在这里无关紧要,因为我们坚持要在它后面跟上结束]]
,所以不会出现到达字符串末尾的情况。)
尝试:
sed -n 's/.*"entries": *\(\[\[.*\]\]\).*//p'
相反(注意模式末尾的 .*
)。
我需要在文件中使用 sed 获取介于“[[”和“]]”之间的字符串:response.txt
x-content-type-options: nosniff
x-server-response-time: 63
x-dropbox-request-id: 84e52618f83eda15cb6d96eb4f601f45
pragma: no-cache
cache-control: no-cache
x-dropbox-http-protocol: None
x-frame-options: SAMEORIGIN
{"has_more": false, "cursor": "AAEynx2q5KMgkcOwL2dKZ4MCYxNTtsdA950A5kYOdjWFln_RYuAokMnJCOb85B7idOHjycS8LJye3BhWfezTkkoprVxhgMNni_Bg04A-JO9fLmqIGO3CYInBQPmNUXL57S32ECWwA-CYu1CiLi5ujTDz", "entries": [["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]], "reset": true}
并且想使用命令sed
来获取字符串,结果如下:
[["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]]
我运行终端中的命令:
$ sed -n 's/.*"entries": *\(\[\[.*\]\]\)//p' /tmp/response.txt
并得到结果:
[["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]], "reset": true}
然后,我运行终端中的命令:
$ sed -n 's/.*"entries": *\(\[\[(?!\]\].)*\]\]\)//p' /tmp/response.txt
而return什么都没有。
好像我写错了正则表达式?我能怎么做?谢谢!
避免使用正则表达式解析 JSON。使用合适的解析器。
如果您jq
安装了:
awk -v RS="" "END {print}" response.txt | jq -c '.["entries"]'
[["/test",{"revision":45545,"root":"dropbox","size":"0 bytes","modified":"Fri, 22 May 2015 05:53:27 +0000","rev":"b1e9026cf6f4","thumb_exists":false,"path":"/TEST","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0}],["/TEST/test-file-01",{"revision":45549,"root":"dropbox","size":"0 bytes","modified":"Fri, 22 May 2015 06:15:33 +0000","rev":"b1ed026cf6f4","thumb_exists":false,"path":"/test/test-file-01","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0}]]
或ruby:
ruby -rjson -e '
data = (File.readlines(ARGV.shift))[-1]
json = JSON.parse(data)
puts JSON.generate(json["entries"])
' response.txt
[["/test",{"rev":"b1e9026cf6f4","thumb_exists":false,"path":"/TEST","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0,"modified":"Fri, 22 May 2015 05:53:27 +0000","size":"0 bytes","root":"dropbox","revision":45545}],["/TEST/test-file-01",{"rev":"b1ed026cf6f4","thumb_exists":false,"path":"/test/test-file-01","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0,"modified":"Fri, 22 May 2015 06:15:33 +0000","size":"0 bytes","root":"dropbox","revision":45549}]]
或您选择的任何实现 JSON 解析器的语言。
这可能适合您 (GNU sed):
sed '/\n/!{s/\[\[/\n&/g;s/\]\]/&\n/g};/^\[\[/P;D' file
如果模式 space 不包含 \n
则将 \n
添加到所有 [[
字符串并将 \n
添加到所有 ]]
字符串。如果模式 space 以 [[
开头,则打印到以下 \n
(或模式结尾 space)。删除下一个 \n
(或模式末尾 space)并重复直到模式 space 为空。
N.B。这只会打印以所需字符串([[
或]]
)开始和结束的换行符之间的字符串。
sed 识别 Posix 正则表达式,它不包括像 (?!
.
幸运的是,为这个简单的案例编写正则表达式很容易(像往常一样,它不太容易阅读):
sed -n 's/.*"entries": *\(\[\[\(]\?[^]]\)*]]\)//p' /tmp/response.txt
但是,并不是贪婪匹配导致了您最初尝试的问题。问题是您没有丢弃匹配项之后的行的内容。你想要的是:
sed -n 's/.*"entries": *\(\[\[\(]\?[^]]\)*]]\).*//p' /tmp/response.txt
sed
使用“基本”Posix 正则表达式 (BRE) 这一事实意味着您最终会遇到很多反斜杠。我试图至少删除其中一些,使用 ]
在正则表达式中 不是 特殊的事实,除非它关闭字符 class。但总的来说,我认为使用 grep
会更好地满足您的需求,它有一个 Posix 标准选项来使用“扩展”(正常)正则表达式(ERE),以及一个仅打印的选项输出匹配的字符串:
grep -oE '"entries": \[\[(]?[^]])*]]' /tmp/response.txt | cut -d ' ' -f2-
(最后的cut
是去掉"entries":
)
正则表达式的解释
正则表达式(ERE 形式)包括:
\[\[ match [[
(
]? possibly a single ]
[^]] anything but a ]
)* repeated as many times as necessary
]] match ]]
重复组将匹配 ]
后跟 anthing 但 ]
,或者它将匹配除 ]
以外的任何内容。实际上,它(几乎)是 ]]
.
(这不完全是否定,因为它不会匹配字符串末尾的单个 ]
,但这在这里无关紧要,因为我们坚持要在它后面跟上结束]]
,所以不会出现到达字符串末尾的情况。)
尝试:
sed -n 's/.*"entries": *\(\[\[.*\]\]\).*//p'
相反(注意模式末尾的 .*
)。