更新 Snowflake 中的混合嵌套对象

Question

我有一个带有一个变体列 (raw) 的 Snowflake table。

此 table 中的每一行都是复杂的（字典和数组）和嵌套的（多个层次结构）。

我想要做的是能够更新某个数组中的特定项。

使用示例会更容易理解，因此将其视为 table:

中的一行

{
  "id": "1234"
  "x_id": [
    {
      "y_id": "790437306684007491",
      "y_state": "some_state"
    }
  ],
  "comments": {
    "1": [
      {
        "comment_id": "bb288743-3b73-4423-b76b-f26b8c37f7d4",
        "comment_timestamp": "2021-02-10 14:53:25.667564",
        "comment_text": "Hey"
      },
      {
        "comment_id": "7378f332-93c4-4522-9f73-3b6a8a9425ce",
        "comment_text": "You",
        "comment_timestamp": "2021-02-10 14:54:21.337046"
      }
    ],
    "2": [
      {
        "comment_id": "9dd0cbb0-df80-4b0f-b399-9ee153161462",
        "comment_text": "Hello",
        "comment_timestamp": "2021-02-09 09:26:17.987386"
      },
      {
        "comment_id": "1a3bf1e8-82b5-4a9c-a959-a1da806ce7e3",
        "comment_text": "World",
        "comment_timestamp": "2021-02-09 09:28:32.144175"
      }
    ]
  }
}

而我想要的是更新特定评论的评论文本。

我知道我可以以编程方式更新整个 JSON 并使用 PARSE_JSON 更新整个对象，但这种方法是不够的，因为可能有其他更新会覆盖其他评论所以这种方法会失败（因为这些更新会相互覆盖）。

所以首先，我尝试了天真的方法（我知道这是行不通的，但我不得不尝试）：

update table1 set raw['comments']['1'][0]["comment_text"] = 'please work'

毫不奇怪，我收到以下错误：

SQL compilation error: syntax error line 2 at position 7 unexpected '['.

接下来，我尝试了 OBJECT_INSERT 这应该允许更新对象的方法，但由于嵌套键 ('1') 而失败：

UPDATE table1 SET raw = OBJECT_INSERT(raw:comments:1, "comment_test", 'please work')

有错误

SQL compilation error: syntax error line 1 at position 99 unexpected '1'.

（我也用 raw:comments:"1" 或 raw:comments:1[0] 或 raw['comments']['1'] 和其他一些尝试了这种方法的几种排列）

我还尝试重构对象，而不是将评论作为字典，将评论平放在一个数组中，例如：

{ "id": "1234" "x_id": [ { "y_id": "790437306684007491", "y_state": "some_state" } ], "comments": [ { "comment_id": "bb288743-3b73-4423-b76b-f26b8c37f7d4", "comment_timestamp": "2021-02-10 14:53:25.667564", "comment_text": "Hey" "comment_key": "1" }, { "comment_id": "7378f332-93c4-4522-9f73-3b6a8a9425ce", "comment_text": "You", "comment_timestamp": "2021-02-10 14:54:21.337046" "comment_key": "1" } { "comment_id": "9dd0cbb0-df80-4b0f-b399-9ee153161462", "comment_text": "Hello", "comment_timestamp": "2021-02-09 09:26:17.987386", "comment_key": "2" }, { "comment_id": "1a3bf1e8-82b5-4a9c-a959-a1da806ce7e3", "comment_text": "World", "comment_timestamp": "2021-02-09 09:28:32.144175", "comment_key": "2" } ] }

但这并没有让我更接近解决方案。我一直在寻找一些 ARRAY_REPLACE 函数来替换数组中的项目，但看起来不存在这样的函数 (all semi-structured related functions)

我也考虑过使用 JavaScript UDF 来完成它，但是我没有找到任何可以实际更新行的 UDF 源（它们都用于获取数据而不是更新它, 与我看到的相差甚远)。

有什么方法可以达到我想要的效果吗？

非常感谢！

Answer 1

您可以使用 JavaScript UDF 更新复杂的 JSON 结构。这是一个示例。请注意，您的两个 JSON 样本都有错误。我使用了第二个并修复了缺少的逗号。

-- Create a temp table with a sigle variant. By convention, I uses "v" as the name of any
-- column in a single-column table. You can change to "raw" in your code.
create or replace temp table foo(v variant);

-- Create a UDF that updates the exact key you want to update.
-- Unfortunately, JavaScript treats the object path as a constant so you can't make this 
-- a string that you pass in dynamically. There are ways around this possibly, but 
-- library restrictions would require a raw JavaScript parser function. Just update the
-- path you need in the UDF.
create or replace function update_json("v" variant, "newValue" string)
returns variant
language javascript
as
$$
   v.comments[0].comment_text = newValue;
   return v;
$$;

-- Insert the corrected JSON into the variant field
insert into foo select parse_json('{
    "id": "1234",
    "x_id": [{
        "y_id": "790437306684007491",
        "y_state": "some_state"
    }],
    "comments": [{
            "comment_id": "bb288743-3b73-4423-b76b-f26b8c37f7d4",
            "comment_timestamp": "2021-02-10 14:53:25.667564",
            "comment_text": "Hey",
            "comment_key": "1"
        },
        {
            "comment_id": "7378f332-93c4-4522-9f73-3b6a8a9425ce",
            "comment_text": "You",
            "comment_timestamp": "2021-02-10 14:54:21.337046",
            "comment_key": "1"
        },
        {
            "comment_id": "9dd0cbb0-df80-4b0f-b399-9ee153161462",
            "comment_text": "Hello",
            "comment_timestamp": "2021-02-09 09:26:17.987386",
            "comment_key": "2"
        },
        {
            "comment_id": "1a3bf1e8-82b5-4a9c-a959-a1da806ce7e3",
            "comment_text": "World",
            "comment_timestamp": "2021-02-09 09:28:32.144175",
            "comment_key": "2"
        }
    ]
}');

-- Show how the change works without updating the row
select update_json(v, 'please work') from foo;

-- Now update the row using the output. Note that this is updating the 
-- whole variant field, not a portion of it.
update foo set v = update_json(v, 'please work');

-- Show the updated key
select v:comments[0].comment_text::string from foo;

最后，如果你想修改一个属性，你必须首先通过键来找到你需要的东西，你可以在 JavaScript 中循环执行。例如，如果它不是您需要的第一个评论，而是具有特定 UUID 或 comment_text 等的评论，您可以循环查找它并在循环的同一迭代中更新 comment_key .

Answer 2

谢谢，成功了！

我有点设法使用内置函数让它工作 -

假设我们知道评论的位置（在这个例子中，position=3）：

UPDATE table1 SET 
raw = object_construct(
  'id', raw:id,
  'x_id', raw:x_id,
  'comments', array_cat(array_append(array_slice(raw:comments ,0 ,2), parse_json('{"id": "3", "comment_text": "please work"}')) , ARRAY_SLICE(raw:comments,3,array_size(raw:comments)))
)
WHERE raw['id'] = 'some_id'

但我仍在考虑哪种方法可以更好地完成工作。

总之，谢谢，帮了大忙

更新 Snowflake 中的混合嵌套对象

Update a mixed and nested object in Snowflake

sqlalchemy

variant

snowflake-schema

snowflake-cloud-data-platform