将多列中的多值单元格拆分成行(打开优化)
Split multi valued cells in more than one column into rows (Open Refine)
我一直在清理 Open Refine 上的 table。我现在是这样的:
REF Handle Size Price
2002, 2003 t-shirt1 M, L 23
3001, 3002, 3003 t-shirt2 S, M, L 24
我需要在 REF 和 Size 中拆分那些多值单元格,以便我得到:
REF Handle Size Price
2002 t-shirt1 M 23
2003 t-shirt1 L 23
3001 t-shirt2 S 24
3002 t-shirt2 M 24
3003 t-shirt2 L 24
是否可以在 Open Refine 中执行此操作? "Split multi-valued cells..." 命令只处理一列。
谢谢,
安娜丽塔
是的,有可能:
- 使用“,”作为分隔符拆分第一列。
- 将第 2 列移到第一个位置
- 将您的项目显示为记录(不是行)
- 使用“,”作为分隔符拆分第 3 列
- 填写第 4 列和第 2 列
- 重新排列列
这是我在 GREL 中的食谱:
[
{
"op": "core/row-removal",
"description": "Remove rows",
"engineConfig": {
"facets": [
{
"invert": false,
"expression": "row.starred",
"selectError": false,
"omitError": false,
"selectBlank": false,
"name": "Starred Rows",
"omitBlank": false,
"columnName": "",
"type": "list",
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
]
}
],
"mode": "row-based"
}
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 1",
"columnName": "Column 1",
"keyColumnName": "Column 1",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/column-move",
"description": "Move column Column 2 to position 0",
"columnName": "Column 2",
"index": 0
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 3",
"columnName": "Column 3",
"keyColumnName": "Column 2",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 4",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 4"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 2",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 2"
},
{
"op": "core/column-reorder",
"description": "Reorder columns",
"columnNames": [
"Column 1",
"Column 2",
"Column 3",
"Column 4"
]
}
]
埃尔韦
刚刚找到一个不错的免费 OpenRefine 插件,它提供“不成对的枢轴”:
VIB-Bits plugin
3.2.1 不成对的枢轴...
Unpaired pivot 是将按行组织的数据转换为该行的表示
单独列中的数据。一个简单的例子是转换
Category
Value
a
1
a
2
b
3
c
2
进入
Value a
Value b
Value c
1
3
2
2
我一直在清理 Open Refine 上的 table。我现在是这样的:
REF Handle Size Price
2002, 2003 t-shirt1 M, L 23
3001, 3002, 3003 t-shirt2 S, M, L 24
我需要在 REF 和 Size 中拆分那些多值单元格,以便我得到:
REF Handle Size Price
2002 t-shirt1 M 23
2003 t-shirt1 L 23
3001 t-shirt2 S 24
3002 t-shirt2 M 24
3003 t-shirt2 L 24
是否可以在 Open Refine 中执行此操作? "Split multi-valued cells..." 命令只处理一列。 谢谢, 安娜丽塔
是的,有可能:
- 使用“,”作为分隔符拆分第一列。
- 将第 2 列移到第一个位置
- 将您的项目显示为记录(不是行)
- 使用“,”作为分隔符拆分第 3 列
- 填写第 4 列和第 2 列
- 重新排列列
这是我在 GREL 中的食谱:
[
{
"op": "core/row-removal",
"description": "Remove rows",
"engineConfig": {
"facets": [
{
"invert": false,
"expression": "row.starred",
"selectError": false,
"omitError": false,
"selectBlank": false,
"name": "Starred Rows",
"omitBlank": false,
"columnName": "",
"type": "list",
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
]
}
],
"mode": "row-based"
}
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 1",
"columnName": "Column 1",
"keyColumnName": "Column 1",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/column-move",
"description": "Move column Column 2 to position 0",
"columnName": "Column 2",
"index": 0
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 3",
"columnName": "Column 3",
"keyColumnName": "Column 2",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 4",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 4"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 2",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 2"
},
{
"op": "core/column-reorder",
"description": "Reorder columns",
"columnNames": [
"Column 1",
"Column 2",
"Column 3",
"Column 4"
]
}
]
埃尔韦
刚刚找到一个不错的免费 OpenRefine 插件,它提供“不成对的枢轴”: VIB-Bits plugin
3.2.1 不成对的枢轴... Unpaired pivot 是将按行组织的数据转换为该行的表示 单独列中的数据。一个简单的例子是转换
Category | Value |
---|---|
a | 1 |
a | 2 |
b | 3 |
c | 2 |
进入
Value a | Value b | Value c |
---|---|---|
1 | 3 | 2 |
2 |