如何在 Dataweave 2.0 中压缩和嵌套 (CSV) 有效载荷?
How to Condense & Nest a (CSV) Payload in Dataweave 2.0?
我有一个 CSV 负载电视节目和剧集,我想将其转换(嵌套和压缩)为 JSON,条件如下:
- 合并连续的节目线(后面没有情节线),使其成为具有第一个实例的开始日期和持续时间总和的 1 个节目。
- 节目线之后的情节线嵌套在节目下
输入
Channel|Name|Start|Duration|Type
ACME|Broke Girls|2018-02-01T00:00:00|600|Program
ACME|Broke Girls|2018-02-01T00:10:00|3000|Program
ACME|S03_8|2018-02-01T00:13:05|120|Episode
ACME|S03_9|2018-02-01T00:29:10|120|Episode
ACME|S04_1|2018-02-01T00:44:12|120|Episode
ACME|Lost In Translation|2018-02-01T02:01:00|1800|Program
ACME|Lost In Translation|2018-02-01T02:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:30:00|1800|Program
ACME|Photon|2018-02-01T05:00:00|1800|Program
ACME|Photon|2018-02-01T05:30:00|1800|Program
ACME|Miles & Smiles|2018-02-01T06:00:00|3600|Program
ACME|S015_1|2018-02-01T06:13:53|120|Episode
ACME|S015_2|2018-02-01T06:29:22|120|Episode
ACME|S015_3|2018-02-01T06:46:28|120|Episode
ACME|Ice Age|2018-02-01T07:00:00|300|Program
ACME|Ice Age|2018-02-01T07:05:00|600|Program
ACME|Ice Age|2018-02-01T07:15:00|2700|Program
ACME|S01_4|2018-02-01T07:17:17|120|Episode
ACME|S01_5|2018-02-01T07:32:11|120|Episode
ACME|S01_6|2018-02-01T07:47:20|120|Episode
ACME|My Girl Friday|2018-02-01T08:00:00|3600|Program
ACME|S05_7|2018-02-01T08:17:28|120|Episode
ACME|S05_8|2018-02-01T08:31:59|120|Episode
ACME|S05_9|2018-02-01T08:44:42|120|Episode
ACME|Pirate Bay|2018-02-01T09:00:00|3600|Program
ACME|S01_1|2018-02-01T09:33:12|120|Episode
ACME|S01_2|2018-02-01T09:46:19|120|Episode
ACME|Broke Girls|2018-02-01T10:00:00|1200|Program
ACME|S05_3|2018-02-01T10:13:05|120|Episode
ACME|S05_4|2018-02-01T10:29:10|120|Episode
输出
{
"programs": [
{
"StartTime": "2018-02-01T00:00:00",
"Duration": 3600,
"Name": "Broke Girls",
"episode": [
{
"name": "S03_8",
"startDateTime": "2018-02-01T00:13:05",
"duration": 120
},
{
"name": "S03_9",
"startDateTime": "2018-02-01T00:29:10",
"duration": 120
},
{
"name": "S04_1",
"startDateTime": "2018-02-01T00:44:12",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T06:00:00",
"Duration": 3600,
"Name": "Miles & Smiles",
"episode": [
{
"name": "S015_1",
"startDateTime": "2018-02-01T06:13:53",
"duration": 120
},
{
"name": "S015_2",
"startDateTime": "2018-02-01T06:29:22",
"duration": 120
},
{
"name": "S015_3",
"startDateTime": "2018-02-01T06:46:28",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T07:00:00",
"Duration": 3600,
"Name": "Ice Age",
"episode": [
{
"name": "S01_4",
"startDateTime": "2018-02-01T07:17:17",
"duration": 120
},
{
"name": "S01_5",
"startDateTime": "2018-02-01T07:32:11",
"duration": 120
},
{
"name": "S01_6",
"startDateTime": "2018-02-01T07:47:20",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T08:00:00",
"Duration": 3600,
"Name": "My Girl Friday",
"episode": [
{
"name": "S05_7",
"startDateTime": "2018-02-01T08:17:28",
"duration": 120
},
{
"name": "S05_8",
"startDateTime": "2018-02-01T08:31:59",
"duration": 120
},
{
"name": "S05_9",
"startDateTime": "2018-02-01T08:44:42",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T09:00:00",
"Duration": 3600,
"Name": "Pirate Bay",
"episode": [
{
"name": "S01_1",
"startDateTime": "2018-02-01T09:33:12",
"duration": 120
},
{
"name": "S01_2",
"startDateTime": "2018-02-01T09:46:19",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T10:00:00",
"Duration": 1200,
"Name": "Broke Girls",
"episode": [
{
"name": "S05_3",
"startDateTime": "2018-02-01T10:13:05",
"duration": 120
},
{
"name": "S05_4",
"startDateTime": "2018-02-01T10:29:10",
"duration": 120
}
]
}
]
}
试一试,已嵌入评论:
%dw 2.0
output application/dw
var data = readUrl("classpath://data.csv","application/csv",{separator:"|"})
var firstProgram = data[0].Name
---
// Identify the programs by adding a field
(data reduce (e,acc={l: firstProgram, c:0, d: []}) -> do {
var next = acc.l != e.Name and e.Type == "Program"
var counter = if (next) acc.c+1 else acc.c
---
{
l: if (next) e.Name else acc.l,
c: counter,
d: acc.d + {(e), pc: counter}
}
}).d
// group by the identifier of individual programs
groupBy $.pc
// Get just the programs throw away the program identifiers
pluck $
// Throw away the programs with no episodes
filter ($.*Type contains "Episode")
// Iterate over the programs
map do {
// sum the program duration
var d = $ dw::core::Arrays::sumBy (e) -> if (e.Type == "Program") e.Duration else 0
// Get the episodes and do a little cleanup
var es = $ map $-"pc" filter ($.Type == "Episode")
---
// Form the desired structure
{
($[0] - "pc" - "Duration"),
Duration: d,
Episode: es
}
}
注意 1:我将内容存储在文件中并使用 readUrl
读取它,您需要进行调整以适应从何处获取数据。
注意 2:也许您需要重新考虑您的输入,如果可能的话,更好地组织它们。
注意 3:Studio 会显示错误(至少 Studio 7.5.1 会显示)。它们是误报,代码运行
注意 4:由于输入不平凡,所以步骤很多。 Potentialy 代码可以优化,但我确实花了足够的时间——我会让你处理优化或者社区中的其他人可以提供帮助。
我有一个 CSV 负载电视节目和剧集,我想将其转换(嵌套和压缩)为 JSON,条件如下:
- 合并连续的节目线(后面没有情节线),使其成为具有第一个实例的开始日期和持续时间总和的 1 个节目。
- 节目线之后的情节线嵌套在节目下
输入
Channel|Name|Start|Duration|Type
ACME|Broke Girls|2018-02-01T00:00:00|600|Program
ACME|Broke Girls|2018-02-01T00:10:00|3000|Program
ACME|S03_8|2018-02-01T00:13:05|120|Episode
ACME|S03_9|2018-02-01T00:29:10|120|Episode
ACME|S04_1|2018-02-01T00:44:12|120|Episode
ACME|Lost In Translation|2018-02-01T02:01:00|1800|Program
ACME|Lost In Translation|2018-02-01T02:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:30:00|1800|Program
ACME|Photon|2018-02-01T05:00:00|1800|Program
ACME|Photon|2018-02-01T05:30:00|1800|Program
ACME|Miles & Smiles|2018-02-01T06:00:00|3600|Program
ACME|S015_1|2018-02-01T06:13:53|120|Episode
ACME|S015_2|2018-02-01T06:29:22|120|Episode
ACME|S015_3|2018-02-01T06:46:28|120|Episode
ACME|Ice Age|2018-02-01T07:00:00|300|Program
ACME|Ice Age|2018-02-01T07:05:00|600|Program
ACME|Ice Age|2018-02-01T07:15:00|2700|Program
ACME|S01_4|2018-02-01T07:17:17|120|Episode
ACME|S01_5|2018-02-01T07:32:11|120|Episode
ACME|S01_6|2018-02-01T07:47:20|120|Episode
ACME|My Girl Friday|2018-02-01T08:00:00|3600|Program
ACME|S05_7|2018-02-01T08:17:28|120|Episode
ACME|S05_8|2018-02-01T08:31:59|120|Episode
ACME|S05_9|2018-02-01T08:44:42|120|Episode
ACME|Pirate Bay|2018-02-01T09:00:00|3600|Program
ACME|S01_1|2018-02-01T09:33:12|120|Episode
ACME|S01_2|2018-02-01T09:46:19|120|Episode
ACME|Broke Girls|2018-02-01T10:00:00|1200|Program
ACME|S05_3|2018-02-01T10:13:05|120|Episode
ACME|S05_4|2018-02-01T10:29:10|120|Episode
输出
{
"programs": [
{
"StartTime": "2018-02-01T00:00:00",
"Duration": 3600,
"Name": "Broke Girls",
"episode": [
{
"name": "S03_8",
"startDateTime": "2018-02-01T00:13:05",
"duration": 120
},
{
"name": "S03_9",
"startDateTime": "2018-02-01T00:29:10",
"duration": 120
},
{
"name": "S04_1",
"startDateTime": "2018-02-01T00:44:12",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T06:00:00",
"Duration": 3600,
"Name": "Miles & Smiles",
"episode": [
{
"name": "S015_1",
"startDateTime": "2018-02-01T06:13:53",
"duration": 120
},
{
"name": "S015_2",
"startDateTime": "2018-02-01T06:29:22",
"duration": 120
},
{
"name": "S015_3",
"startDateTime": "2018-02-01T06:46:28",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T07:00:00",
"Duration": 3600,
"Name": "Ice Age",
"episode": [
{
"name": "S01_4",
"startDateTime": "2018-02-01T07:17:17",
"duration": 120
},
{
"name": "S01_5",
"startDateTime": "2018-02-01T07:32:11",
"duration": 120
},
{
"name": "S01_6",
"startDateTime": "2018-02-01T07:47:20",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T08:00:00",
"Duration": 3600,
"Name": "My Girl Friday",
"episode": [
{
"name": "S05_7",
"startDateTime": "2018-02-01T08:17:28",
"duration": 120
},
{
"name": "S05_8",
"startDateTime": "2018-02-01T08:31:59",
"duration": 120
},
{
"name": "S05_9",
"startDateTime": "2018-02-01T08:44:42",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T09:00:00",
"Duration": 3600,
"Name": "Pirate Bay",
"episode": [
{
"name": "S01_1",
"startDateTime": "2018-02-01T09:33:12",
"duration": 120
},
{
"name": "S01_2",
"startDateTime": "2018-02-01T09:46:19",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T10:00:00",
"Duration": 1200,
"Name": "Broke Girls",
"episode": [
{
"name": "S05_3",
"startDateTime": "2018-02-01T10:13:05",
"duration": 120
},
{
"name": "S05_4",
"startDateTime": "2018-02-01T10:29:10",
"duration": 120
}
]
}
]
}
试一试,已嵌入评论:
%dw 2.0
output application/dw
var data = readUrl("classpath://data.csv","application/csv",{separator:"|"})
var firstProgram = data[0].Name
---
// Identify the programs by adding a field
(data reduce (e,acc={l: firstProgram, c:0, d: []}) -> do {
var next = acc.l != e.Name and e.Type == "Program"
var counter = if (next) acc.c+1 else acc.c
---
{
l: if (next) e.Name else acc.l,
c: counter,
d: acc.d + {(e), pc: counter}
}
}).d
// group by the identifier of individual programs
groupBy $.pc
// Get just the programs throw away the program identifiers
pluck $
// Throw away the programs with no episodes
filter ($.*Type contains "Episode")
// Iterate over the programs
map do {
// sum the program duration
var d = $ dw::core::Arrays::sumBy (e) -> if (e.Type == "Program") e.Duration else 0
// Get the episodes and do a little cleanup
var es = $ map $-"pc" filter ($.Type == "Episode")
---
// Form the desired structure
{
($[0] - "pc" - "Duration"),
Duration: d,
Episode: es
}
}
注意 1:我将内容存储在文件中并使用 readUrl
读取它,您需要进行调整以适应从何处获取数据。
注意 2:也许您需要重新考虑您的输入,如果可能的话,更好地组织它们。
注意 3:Studio 会显示错误(至少 Studio 7.5.1 会显示)。它们是误报,代码运行
注意 4:由于输入不平凡,所以步骤很多。 Potentialy 代码可以优化,但我确实花了足够的时间——我会让你处理优化或者社区中的其他人可以提供帮助。