使用 Excel Power Query 提取 JSON 数据

Use Excel Power Query to extract JSON data

我有一个包含 5 列的电子表格。其中一列包含一个 json 数组(该数组也有嵌套数组)。是否可以使用 Power Query Editor 来解析每一行中的 json 数组?这样每一行都有四个原始列加上 Json 数组和子数组中每个值的新列?虽然每个 JSON 数组中的数据可能是唯一的,但结构总是相同的,尽管 elements/sub 数组的数量会波动。下面是我想要转换的数据示例以及我想如何转换它。

要转换的数据

response_id response_json
a476f978-430c-47fa-a2a4-7be0d863a69e [{"sublist":false,"aggregate_category":"null_options","question":"some question."},{"sublist":[{"option_1":"some val","option_2":"another value","option_3":"more values"}],"aggregate_category":"has_options","question":"another question"},{"sublist":[],"aggregate_category":"empty_options","question":"another question"}]

输出

ID aggregate_category question option_1 option_2 option_3
a476f978-430c-47fa-a2a4-7be0d863a69e null_options some question. null null null
a476f978-430c-47fa-a2a4-7be0d863a69e has_options some question. some val another value more values
a476f978-430c-47fa-a2a4-7be0d863a69e empty_options some question. null null null

编辑 但是,发生的事情是,我可以转换初始 json 数组并扩展行。但是,嵌入的 json 数组(标题为“子列表”)无法进一步转换,因为值为 null 或 false。

我认为一个问题是虽然所有行都有一个 json 字符串,但 json 数组中标题为 sublist 的字段可以是

sublist:[]
sublist:[{"option_1":1, "option_2: "aaa", option_3:""}]
sublist:false

我认为问题是当值为 :false 时无法解析。所以,也许我必须输入脚本,如果值为 false,我必须输入假数据?我试过这个但失败了:

=if [response_json.sublist] = "false" 那么 “没有数据” 否则 Json.Document([response_json.sublist])

我认为这可能是因为脚本正在尝试解析不存在的内容?

编辑 - 添加高级编辑器代码

let
    Source = Excel.Workbook(File.Contents("D:\Downloads\ExampleDataSet.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"response_id", type text}, {"response_json", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),
    #"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
    fixSublist = Table.TransformColumns(response_json,{"sublist", each if  _ = false then {} else _ }),
    #"Expanded response_json1" = Table.ExpandRecordColumn(fixSublist, "response_json", {"sublist", "aggregate_category", "question"}, {"response_json.sublist", "response_json.aggregate_category", "response_json.question"})
in
    #"Expanded response_json1"

编辑其他解决方案

除了罗恩的大力帮助外,我发现添加自定义列和添加此公式也有效:

=if [response_json.sublist] <> false then
Table.ToRecords(Table.FromRecords([response_json.sublist]))
else
Table.ToRecords(
    Table.FromRecords({
        [option_1 = "null", option_2 = "null", option_3 = "null"]    
    })
)
  • 添加自定义列:

Formula:  =Json.Document([JSON Field])
  • 将结果列表列扩展到新行;展开结果记录列(和子列表)

  • 根据需要排列和重命名您的列

编辑:回复您的 M 代码发布

  • 您的引用有误,并且“fixSublist”行在错误的位置
  • 请逐步执行应用步骤并阅读代码中的注释,以更好地理解每一步的执行情况。
  • 由于我是从打开的工作簿而不是文件中获取初始 table,因此我的 M 代码的前几行不同。

数据

M码

let

//Next two lines different due to my source being a table in an open workbook and not a file
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type1" = Table.TransformColumnTypes(Source,{{"response_id", type text}, {"response_json", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),

//expand the subtables/records/lists
    #"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
    #"Expanded response_json2" = Table.ExpandRecordColumn(#"Expanded response_json", "response_json", 
        {"sublist", "aggregate_category", "question"}, {"sublist", "aggregate_category", "question"}),

//Now that the sublists are exposed we can fix the entries that are not Lists
    fixSublist = Table.TransformColumns(#"Expanded response_json2",{"sublist", each if  _ = false then {} else _ }),

//Now expand them
    #"Expanded sublist" = Table.ExpandListColumn(fixSublist, "sublist"),
    #"Expanded sublist1" = Table.ExpandRecordColumn(#"Expanded sublist", "sublist", {"option_1", "option_2", "option_3"}, {"option_1", "option_2", "option_3"})
in
    #"Expanded sublist1"

结果