使用 Excel Power Query 提取 JSON 数据
Use Excel Power Query to extract JSON data
我有一个包含 5 列的电子表格。其中一列包含一个 json 数组(该数组也有嵌套数组)。是否可以使用 Power Query Editor 来解析每一行中的 json 数组?这样每一行都有四个原始列加上 Json 数组和子数组中每个值的新列?虽然每个 JSON 数组中的数据可能是唯一的,但结构总是相同的,尽管 elements/sub 数组的数量会波动。下面是我想要转换的数据示例以及我想如何转换它。
要转换的数据
response_id
response_json
a476f978-430c-47fa-a2a4-7be0d863a69e
[{"sublist":false,"aggregate_category":"null_options","question":"some question."},{"sublist":[{"option_1":"some val","option_2":"another value","option_3":"more values"}],"aggregate_category":"has_options","question":"another question"},{"sublist":[],"aggregate_category":"empty_options","question":"another question"}]
输出
ID
aggregate_category
question
option_1
option_2
option_3
a476f978-430c-47fa-a2a4-7be0d863a69e
null_options
some question.
null
null
null
a476f978-430c-47fa-a2a4-7be0d863a69e
has_options
some question.
some val
another value
more values
a476f978-430c-47fa-a2a4-7be0d863a69e
empty_options
some question.
null
null
null
编辑
但是,发生的事情是,我可以转换初始 json 数组并扩展行。但是,嵌入的 json 数组(标题为“子列表”)无法进一步转换,因为值为 null 或 false。
我认为一个问题是虽然所有行都有一个 json 字符串,但 json 数组中标题为 sublist
的字段可以是
sublist:[]
sublist:[{"option_1":1, "option_2: "aaa", option_3:""}]
sublist:false
我认为问题是当值为 :false 时无法解析。所以,也许我必须输入脚本,如果值为 false,我必须输入假数据?我试过这个但失败了:
=if [response_json.sublist] = "false" 那么
“没有数据”
否则 Json.Document([response_json.sublist])
我认为这可能是因为脚本正在尝试解析不存在的内容?
编辑 - 添加高级编辑器代码
let
Source = Excel.Workbook(File.Contents("D:\Downloads\ExampleDataSet.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"response_id", type text}, {"response_json", type text}}),
#"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),
#"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
fixSublist = Table.TransformColumns(response_json,{"sublist", each if _ = false then {} else _ }),
#"Expanded response_json1" = Table.ExpandRecordColumn(fixSublist, "response_json", {"sublist", "aggregate_category", "question"}, {"response_json.sublist", "response_json.aggregate_category", "response_json.question"})
in
#"Expanded response_json1"
编辑其他解决方案
除了罗恩的大力帮助外,我发现添加自定义列和添加此公式也有效:
=if [response_json.sublist] <> false then
Table.ToRecords(Table.FromRecords([response_json.sublist]))
else
Table.ToRecords(
Table.FromRecords({
[option_1 = "null", option_2 = "null", option_3 = "null"]
})
)
- 添加自定义列:
Formula: =Json.Document([JSON Field])
将结果列表列扩展到新行;展开结果记录列(和子列表)
根据需要排列和重命名您的列
编辑:回复您的 M 代码发布
- 您的引用有误,并且“fixSublist”行在错误的位置
- 请逐步执行应用步骤并阅读代码中的注释,以更好地理解每一步的执行情况。
- 由于我是从打开的工作簿而不是文件中获取初始 table,因此我的 M 代码的前几行不同。
数据
M码
let
//Next two lines different due to my source being a table in an open workbook and not a file
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type1" = Table.TransformColumnTypes(Source,{{"response_id", type text}, {"response_json", type text}}),
#"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),
//expand the subtables/records/lists
#"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
#"Expanded response_json2" = Table.ExpandRecordColumn(#"Expanded response_json", "response_json",
{"sublist", "aggregate_category", "question"}, {"sublist", "aggregate_category", "question"}),
//Now that the sublists are exposed we can fix the entries that are not Lists
fixSublist = Table.TransformColumns(#"Expanded response_json2",{"sublist", each if _ = false then {} else _ }),
//Now expand them
#"Expanded sublist" = Table.ExpandListColumn(fixSublist, "sublist"),
#"Expanded sublist1" = Table.ExpandRecordColumn(#"Expanded sublist", "sublist", {"option_1", "option_2", "option_3"}, {"option_1", "option_2", "option_3"})
in
#"Expanded sublist1"
结果
我有一个包含 5 列的电子表格。其中一列包含一个 json 数组(该数组也有嵌套数组)。是否可以使用 Power Query Editor 来解析每一行中的 json 数组?这样每一行都有四个原始列加上 Json 数组和子数组中每个值的新列?虽然每个 JSON 数组中的数据可能是唯一的,但结构总是相同的,尽管 elements/sub 数组的数量会波动。下面是我想要转换的数据示例以及我想如何转换它。
要转换的数据
response_id | response_json |
---|---|
a476f978-430c-47fa-a2a4-7be0d863a69e | [{"sublist":false,"aggregate_category":"null_options","question":"some question."},{"sublist":[{"option_1":"some val","option_2":"another value","option_3":"more values"}],"aggregate_category":"has_options","question":"another question"},{"sublist":[],"aggregate_category":"empty_options","question":"another question"}] |
输出
ID | aggregate_category | question | option_1 | option_2 | option_3 |
---|---|---|---|---|---|
a476f978-430c-47fa-a2a4-7be0d863a69e | null_options | some question. | null | null | null |
a476f978-430c-47fa-a2a4-7be0d863a69e | has_options | some question. | some val | another value | more values |
a476f978-430c-47fa-a2a4-7be0d863a69e | empty_options | some question. | null | null | null |
编辑 但是,发生的事情是,我可以转换初始 json 数组并扩展行。但是,嵌入的 json 数组(标题为“子列表”)无法进一步转换,因为值为 null 或 false。
我认为一个问题是虽然所有行都有一个 json 字符串,但 json 数组中标题为 sublist
的字段可以是
sublist:[]
sublist:[{"option_1":1, "option_2: "aaa", option_3:""}]
sublist:false
我认为问题是当值为 :false 时无法解析。所以,也许我必须输入脚本,如果值为 false,我必须输入假数据?我试过这个但失败了:
=if [response_json.sublist] = "false" 那么 “没有数据” 否则 Json.Document([response_json.sublist])
我认为这可能是因为脚本正在尝试解析不存在的内容?
编辑 - 添加高级编辑器代码
let
Source = Excel.Workbook(File.Contents("D:\Downloads\ExampleDataSet.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"response_id", type text}, {"response_json", type text}}),
#"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),
#"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
fixSublist = Table.TransformColumns(response_json,{"sublist", each if _ = false then {} else _ }),
#"Expanded response_json1" = Table.ExpandRecordColumn(fixSublist, "response_json", {"sublist", "aggregate_category", "question"}, {"response_json.sublist", "response_json.aggregate_category", "response_json.question"})
in
#"Expanded response_json1"
编辑其他解决方案
除了罗恩的大力帮助外,我发现添加自定义列和添加此公式也有效:
=if [response_json.sublist] <> false then
Table.ToRecords(Table.FromRecords([response_json.sublist]))
else
Table.ToRecords(
Table.FromRecords({
[option_1 = "null", option_2 = "null", option_3 = "null"]
})
)
- 添加自定义列:
Formula: =Json.Document([JSON Field])
将结果列表列扩展到新行;展开结果记录列(和子列表)
根据需要排列和重命名您的列
编辑:回复您的 M 代码发布
- 您的引用有误,并且“fixSublist”行在错误的位置
- 请逐步执行应用步骤并阅读代码中的注释,以更好地理解每一步的执行情况。
- 由于我是从打开的工作簿而不是文件中获取初始 table,因此我的 M 代码的前几行不同。
数据
M码
let
//Next two lines different due to my source being a table in an open workbook and not a file
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type1" = Table.TransformColumnTypes(Source,{{"response_id", type text}, {"response_json", type text}}),
#"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),
//expand the subtables/records/lists
#"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
#"Expanded response_json2" = Table.ExpandRecordColumn(#"Expanded response_json", "response_json",
{"sublist", "aggregate_category", "question"}, {"sublist", "aggregate_category", "question"}),
//Now that the sublists are exposed we can fix the entries that are not Lists
fixSublist = Table.TransformColumns(#"Expanded response_json2",{"sublist", each if _ = false then {} else _ }),
//Now expand them
#"Expanded sublist" = Table.ExpandListColumn(fixSublist, "sublist"),
#"Expanded sublist1" = Table.ExpandRecordColumn(#"Expanded sublist", "sublist", {"option_1", "option_2", "option_3"}, {"option_1", "option_2", "option_3"})
in
#"Expanded sublist1"
结果