使用 bash、sed、grep 或 awk 从无效的 JSON 中提取数据?
Extract data from invalid JSON using bash, sed, grep or awk?
我正在尝试解析 bash
中的无效 JSON
x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
使用以下脚本
for each in $(echo $x | sed 's/{componentId: /\n/g' ); do
echo "Each: $each"
echo [[ $each == 0Rb* ]]
if [[ $each == 0Rb* ]]; then
component=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }'
reference=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }'
echo "component: $component"
echo "reference: $component"
fi
done
但它不起作用。我不明白为什么它不起作用。当我在控制台中执行这一行时,
echo $x | sed 's/{componentId: /\n/g'
我可以看到这个无效的 json 被正确地分成几行,但是当我尝试将它传递给 for 循环时,每个变量都会收到更小的块到它的值
Each: 00N5E000005vm9e,
我很困惑。
我想要做的是从无效 json 当 componentId
不是以 00N
开头时。有办法实现吗?
我也尝试过使用 jq -n $x
但它失败了 jq: error: syntax error, unexpected IDENT, expecting '}' (Unix shell quoting issues?) at <top-level>, line 1:
感谢评论,看来我已经弄明白了。
echo $x | sed 's/{componentId: /\n/g' | while IFS=\n read -r each; do
#echo "Each: $each"
#echo [[ $each == 0Rb* ]]
if [[ $each == 0Rb* ]]; then
component=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }')
reference=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }')
echo "component: $component"
echo "reference: $reference"
fi
done
将数据视为 JSON
使用 sed
将其转换回有效的 json,例如:
# Remove redundant space (assuming the text is in the `x` variable)
<<<"$x" sed 's/: /:/g; s/, /,/g' |
# Quote all "words"
sed -E 's/[^"{}:,]+/"&"/g' |
# Separate objects
sed 's/},{/}\n{/g' |
# Parse json
jq .
输出:
{
"componentId": "00N5E000005vm9e",
"componentName": "Field",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "00N5E000005vm9e",
"componentName": "Field",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "00N5E000005vm9e",
"componentName": "Field",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVi",
"componentName": "Versions",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVj",
"componentName": "Approves",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVe",
"componentName": "activityThreads",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVf",
"componentName": "Attachments",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVh",
"componentName": "Details",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
要遍历 componentId
和 referenceId
,您可以使用 jq 的 @tsv
格式运算符,例如:
... | jq -r '[ .componentId, .referenceId ] | @tsv'
输出:
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
0Rb5E000000BGVi 0M05E0000002XbV
0Rb5E000000BGVj 0M05E0000002XbV
0Rb5E000000BGVe 0M05E0000002XbV
0Rb5E000000BGVf 0M05E0000002XbV
0Rb5E000000BGVh 0M05E0000002XbV
将数据视为 YAML
如@léa 所述,您可以使用 yq
将此字符串解析为 YAML 数组。这里
是我使用 Mike Farah 的 4.13.2 版对这种方法的看法
yq:
<<<"[$x]" yq e '.[] | .componentId + " " + .referenceId' -
输出:
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
0Rb5E000000BGVi 0M05E0000002XbV
0Rb5E000000BGVj 0M05E0000002XbV
0Rb5E000000BGVe 0M05E0000002XbV
0Rb5E000000BGVf 0M05E0000002XbV
0Rb5E000000BGVh 0M05E0000002XbV
在bash循环
中解析变量
您可以将上述解决方案的结果传递给 while read
循环,例如:
... | while read componentId referenceId; do
: Do your processing here with $componentId and $referenceId
done
此输入字符串是 YAML 对象数组容器的一部分。所以用 YAML 解析器解析它。
与Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import yaml
import json
# Your input invalid JSON but valid YAML elements part of an array
x = "{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
# Compose yamlstring from x by adding the missing data array container
yamlstring = "data: [" + x + "]"
# Load data from the yamlstring
data = yaml.load(yamlstring, yaml.SafeLoader)
# Output data as JSON
json.dump(data, sys.stdout, indent=2)
或者从 shell 使用 yq
作为解析器:
#!/usr/bin/env sh
x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
yamlstring="data: [$x]"
printf %s "$yamlstring" | yq -I 4 -o json e '.' -
我正在尝试解析 bash
中的无效 JSONx="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
使用以下脚本
for each in $(echo $x | sed 's/{componentId: /\n/g' ); do
echo "Each: $each"
echo [[ $each == 0Rb* ]]
if [[ $each == 0Rb* ]]; then
component=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }'
reference=echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }'
echo "component: $component"
echo "reference: $component"
fi
done
但它不起作用。我不明白为什么它不起作用。当我在控制台中执行这一行时,
echo $x | sed 's/{componentId: /\n/g'
我可以看到这个无效的 json 被正确地分成几行,但是当我尝试将它传递给 for 循环时,每个变量都会收到更小的块到它的值
Each: 00N5E000005vm9e,
我很困惑。
我想要做的是从无效 json 当 componentId
不是以 00N
开头时。有办法实现吗?
我也尝试过使用 jq -n $x
但它失败了 jq: error: syntax error, unexpected IDENT, expecting '}' (Unix shell quoting issues?) at <top-level>, line 1:
感谢评论,看来我已经弄明白了。
echo $x | sed 's/{componentId: /\n/g' | while IFS=\n read -r each; do
#echo "Each: $each"
#echo [[ $each == 0Rb* ]]
if [[ $each == 0Rb* ]]; then
component=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }')
reference=$(echo $each | awk -v FS="(componentName: |,|referenceName: |,)" '{print }')
echo "component: $component"
echo "reference: $reference"
fi
done
将数据视为 JSON
使用 sed
将其转换回有效的 json,例如:
# Remove redundant space (assuming the text is in the `x` variable)
<<<"$x" sed 's/: /:/g; s/, /,/g' |
# Quote all "words"
sed -E 's/[^"{}:,]+/"&"/g' |
# Separate objects
sed 's/},{/}\n{/g' |
# Parse json
jq .
输出:
{
"componentId": "00N5E000005vm9e",
"componentName": "Field",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "00N5E000005vm9e",
"componentName": "Field",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "00N5E000005vm9e",
"componentName": "Field",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVi",
"componentName": "Versions",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVj",
"componentName": "Approves",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVe",
"componentName": "activityThreads",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVf",
"componentName": "Attachments",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
{
"componentId": "0Rb5E000000BGVh",
"componentName": "Details",
"referenceId": "0M05E0000002XbV",
"referenceName": "RecordPageName1",
"referenceUrl": "null",
"message": "Component is in use by another component in your organization.",
"reasonCode": "10"
}
要遍历 componentId
和 referenceId
,您可以使用 jq 的 @tsv
格式运算符,例如:
... | jq -r '[ .componentId, .referenceId ] | @tsv'
输出:
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
0Rb5E000000BGVi 0M05E0000002XbV
0Rb5E000000BGVj 0M05E0000002XbV
0Rb5E000000BGVe 0M05E0000002XbV
0Rb5E000000BGVf 0M05E0000002XbV
0Rb5E000000BGVh 0M05E0000002XbV
将数据视为 YAML
如@léa 所述,您可以使用 yq
将此字符串解析为 YAML 数组。这里
是我使用 Mike Farah 的 4.13.2 版对这种方法的看法
yq:
<<<"[$x]" yq e '.[] | .componentId + " " + .referenceId' -
输出:
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
00N5E000005vm9e 0M05E0000002XbV
0Rb5E000000BGVi 0M05E0000002XbV
0Rb5E000000BGVj 0M05E0000002XbV
0Rb5E000000BGVe 0M05E0000002XbV
0Rb5E000000BGVf 0M05E0000002XbV
0Rb5E000000BGVh 0M05E0000002XbV
在bash循环
中解析变量您可以将上述解决方案的结果传递给 while read
循环,例如:
... | while read componentId referenceId; do
: Do your processing here with $componentId and $referenceId
done
此输入字符串是 YAML 对象数组容器的一部分。所以用 YAML 解析器解析它。
与Python:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import sys
import yaml
import json
# Your input invalid JSON but valid YAML elements part of an array
x = "{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
# Compose yamlstring from x by adding the missing data array container
yamlstring = "data: [" + x + "]"
# Load data from the yamlstring
data = yaml.load(yamlstring, yaml.SafeLoader)
# Output data as JSON
json.dump(data, sys.stdout, indent=2)
或者从 shell 使用 yq
作为解析器:
#!/usr/bin/env sh
x="{componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 00N5E000005vm9e, componentName: Field, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVi, componentName: Versions, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVj, componentName: Approves, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVe, componentName: activityThreads, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVf, componentName: Attachments, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}, {componentId: 0Rb5E000000BGVh, componentName: Details, referenceId: 0M05E0000002XbV, referenceName: RecordPageName1, referenceUrl: null, message: Component is in use by another component in your organization., reasonCode: 10}"
yamlstring="data: [$x]"
printf %s "$yamlstring" | yq -I 4 -o json e '.' -