将多列中的多值单元格拆分成行(打开优化)

Split multi valued cells in more than one column into rows (Open Refine)

我一直在清理 Open Refine 上的 table。我现在是这样的:

REF                 Handle      Size        Price
2002, 2003          t-shirt1    M, L        23
3001, 3002, 3003    t-shirt2    S, M, L     24

我需要在 REF 和 Size 中拆分那些多值单元格,以便我得到:

REF                 Handle      Size        Price
2002                t-shirt1    M           23
2003                t-shirt1    L           23  
3001                t-shirt2    S           24  
3002                t-shirt2    M           24
3003                t-shirt2    L           24

是否可以在 Open Refine 中执行此操作? "Split multi-valued cells..." 命令只处理一列。 谢谢, 安娜丽塔

是的,有可能:

  • 使用“,”作为分隔符拆分第一列。
  • 将第 2 列移到第一个位置
  • 将您的项目显示为记录(不是行)
  • 使用“,”作为分隔符拆分第 3 列
  • 填写第 4 列和第 2 列
  • 重新排列列

这是我在 GREL 中的食谱:

[
  {
    "op": "core/row-removal",
    "description": "Remove rows",
    "engineConfig": {
      "facets": [
        {
          "invert": false,
          "expression": "row.starred",
          "selectError": false,
          "omitError": false,
          "selectBlank": false,
          "name": "Starred Rows",
          "omitBlank": false,
          "columnName": "",
          "type": "list",
          "selection": [
            {
              "v": {
                "v": true,
                "l": "true"
              }
            }
          ]
        }
      ],
      "mode": "row-based"
    }
  },
  {
    "op": "core/multivalued-cell-split",
    "description": "Split multi-valued cells in column Column 1",
    "columnName": "Column 1",
    "keyColumnName": "Column 1",
    "separator": ", ",
    "mode": "plain"
  },
  {
    "op": "core/column-move",
    "description": "Move column Column 2 to position 0",
    "columnName": "Column 2",
    "index": 0
  },
  {
    "op": "core/multivalued-cell-split",
    "description": "Split multi-valued cells in column Column 3",
    "columnName": "Column 3",
    "keyColumnName": "Column 2",
    "separator": ", ",
    "mode": "plain"
  },
  {
    "op": "core/fill-down",
    "description": "Fill down cells in column Column 4",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Column 4"
  },
  {
    "op": "core/fill-down",
    "description": "Fill down cells in column Column 2",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Column 2"
  },
  {
    "op": "core/column-reorder",
    "description": "Reorder columns",
    "columnNames": [
      "Column 1",
      "Column 2",
      "Column 3",
      "Column 4"
    ]
  }
]

埃尔韦

刚刚找到一个不错的免费 OpenRefine 插件,它提供“不成对的枢轴”: VIB-Bits plugin

来自 their documentation:

3.2.1 不成对的枢轴... Unpaired pivot 是将按行组织的数据转换为该行的表示 单独列中的数据。一个简单的例子是转换

Category Value
a 1
a 2
b 3
c 2

进入

Value a Value b Value c
1 3 2
2