Shell 将文本解析为两个单独字符串的脚本

Question

我的目标是使用 shell 文件来解析来自 wit.ai 的文本，但我似乎无法正确处理，因为字符串（名为 data）可能有很大的不同。我一直在尝试使用 sed 命令，但没有成功。来自服务器的响应看起来像这样（但请记住它的大小可能不同）：

data=
    {"status":"ok"}{"_text":"testing","msg_id":"56a26ccf-f324-455f-ba9b-db21c8c7ed50","outcomes":[{"_text":"testing","confidence":0.289,"entities":{},"intent":"weather"}]}

我想解析成两个名为 text 和 intent 的字符串。

想要的结果应该是两个字符串如下

text=      "testing"
intent=     "weather"

我目前的代码是：

data='{"status":"ok"}{"_text":"testing","msg_id":"56a26ccf-f324-455f-ba9b-db21c8c7ed50","outcomes":[{"_text":"testing","confidence":0.289,"entities":{},"intent":"weather"}$
text=$(echo $data | cut -d"," -f1 )     #removes text down to testing but leaves a quote at the end
text=$(echo "${text::-1}")              # this line removes the quote
echo $data
echo $text

目前的结果是： {"status":"ok"}{"_text":"testing

我很接近我只需要删除 {"status":"ok"}{"_text":" 所以我剩下 testing。我很接近，但我无法弄清楚最后一部分。

Answer 1

好吧，这不是很优雅，但这似乎可行

data='{"status":"ok"}{"_text":"testing","msg_id":"56a26ccf-f324-455f-ba9b-db21c8c7ed50","outcomes":[{"_text":"testing","confidence":0.289,"entities":{},"intent":"weather"}$
text=$(echo $data | cut -d"," -f1 )     #removes text down to testing but leaves a quote at the end
text=$(echo "${text::-1}")              # this line removes the quote
text=$(echo $text | cut -d"_" -f2 )     # removes beginning but still leaves "text":""
text=$(echo $text | cut -d":" -f2 )     # removes beginning but still leaves """ atr the beginning
text=$(echo ${text:1} )
echo $data
echo $text

Answer 2

处理JSON的正确方法是使用解析器。有很多选择，例如：

jq, "grep, sed & awk for JSON"
JSON.sh，Bash写的解析器（官方在www.json.org上推荐）
json_pp, 一个漂亮的 Perl 打印机

所有这些以及您的 data 的问题是他们抱怨它格式不正确；如果它们会工作，您可以直接查询您的数据，如上述链接工具的所有教程中所示。

既然你不能，我们又回到直接摆弄文本。我们可以使用 grep -o 提取感兴趣的数据，其中 return 只有匹配的数据：

$ grep -o -e '"_text":"[^"]*"' -e '"intent":"[^"]*"'<<< "$data"
"_text":"testing"
"_text":"testing"
"intent":"weather"

正则表达式位 "[^"]*" 表示 "a quote, then zero or more non-quotes, then another quote" – 一种匹配两个引号之间所有内容的方法，non-greedily.

为了进一步处理，我们可以使用 uniq 删除重复行，然后使用 sed 删除引号和下划线，最后用等号和制表符替换冒号：

$ grep -o -e '"_text":"[^"]*"' -e '"intent":"[^"]*"'<<< "$data" |
uniq | sed -r 's/"_?(.*)":(.*)/=\t/'
text=   "testing"
intent= "weather"

Shell 将文本解析为两个单独字符串的脚本

Shell Script to Parse text into two separate strings

string

shell

parsing

wit.ai