如何使用 GORM 将包含转义码的 JSON 插入到 PostgreSQL 中的 JSONB 列中

How to insert JSON containing escape codes into a JSONB column in PostgreSQL using GORM

我正在尝试将 JSON 字节存储到 PostgreSQL,但出现问题。

\u0000 cannot be converted to text.

如下所示,JSON 包含转义序列,例如 \u0000,PostgreSQL 似乎将其解释为 unicode 字符,而不是 JSON 字符串。

err := raws.SaveRawData(data, url)
// if there is "\u0000" in the bytes
if err.Error() == "ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" {
    // try to remove \u0000, but not work
    data = bytes.Trim(data, "\u0000")
    e := raws.SaveRawData(data, url) // save data again
    if e != nil {
        return e // return the same error
    }
    return nil
}

来源API 数据可以访问形式Here。里面有\u0000:

{
  "code": 0,
  "message": "0",
  "ttl": 1,
  "data": {
    "bvid": "BV1jb411C7m3",
    "aid": 42443484,
    "videos": 1,
    "tid": 172,
    "tname": "手机游戏",
    "copyright": 1,
    "pic": "http://i0.hdslb.com/bfs/archive/c76ee4798bf2ba0efc8449bcb3577d508321c6c5.jpg",
    "title": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
    "pubdate": 1549100438,
    "ctime": 1549100438,
    "desc": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
    "desc_v2": [
      {
        "raw_text": "bgm:逮虾户\n今天先水一期冰塔的,明天再水\u0000绿塔的,后天就可以下红莲啦,计划通嘿嘿嘿(º﹃º )",
        "type": 1,
        "biz_id": 0
      }
    ],
    "state": 0,
    "duration": 265,
    "rights": {
      "bp": 0,
      "elec": 0,
      "download": 1,
      "movie": 0,
      "pay": 0,
      "hd5": 0,
      "no_reprint": 1,
      "autoplay": 1,
      "ugc_pay": 0,
      "is_cooperation": 0,
      "ugc_pay_preview": 0,
      "no_background": 0,
      "clean_mode": 0,
      "is_stein_gate": 0
    },
    "owner": {
      "mid": 39699039,
      "name": "明眸-雅望",
      "face": "http://i0.hdslb.com/bfs/face/240f74f8706955119575ea6c6cb1d31892f93800.jpg"
    },
    "stat": {
      "aid": 42443484,
      "view": 1107,
      "danmaku": 7,
      "reply": 22,
      "favorite": 5,
      "coin": 4,
      "share": 0,
      "now_rank": 0,
      "his_rank": 0,
      "like": 10,
      "dislike": 0,
      "evaluation": "",
      "argue_msg": ""
    },
    "dynamic": "#崩坏3#",
    "cid": 74479750,
    "dimension": {
      "width": 1280,
      "height": 720,
      "rotate": 0
    },
    "no_cache": false,
    "pages": [
      {
        "cid": 74479750,
        "page": 1,
        "from": "vupload",
        "part": "冰塔:我连你的大招都敢硬抗,所以告诉我谁才是生物女王?!单s冰塔怒砍档案女王巴德尔,谁,才是生物一姐?(手动滑稽)",
        "duration": 265,
        "vid": "",
        "weblink": "",
        "dimension": {
          "width": 1280,
          "height": 720,
          "rotate": 0
        }
      }
    ],
    "subtitle": {
      "allow_submit": false,
      "list": []
    },
    "user_garb": {
      "url_image_ani_cut": ""
    }
  }
}

保存的结构是:

type RawJSONData struct {
    ID        uint64         `gorm:"primarykey" json:"id"`
    CreatedAt time.Time      `json:"-"`
    DeletedAt gorm.DeletedAt `json:"-" gorm:"index"`
    Data      datatypes.JSON `json:"data"`
    URL       string         `gorm:"index" json:"url"`
}

datatypes.JSON 来自 gorm.io/datatypes。好像只是 json.RawMessage,它是(从?)一个 []byte.

我使用 PostgreSQL 的 JSONB 类型来存储此数据。

Table:

create table raw_json_data
(
    id         bigserial not null constraint raw_json_data_pke primary key,
    created_at timestamp with time zone,
    deleted_at timestamp with time zone,
    data       jsonb,
    url        text
);

Unicode 转义序列 \u0000 在 Postgres TEXTJSONB 列中只是 not supported

The jsonb type also rejects \u0000 (because that cannot be represented in PostgreSQL's text type)

您可以将列类型更改为 JSON:

create table Foo (test JSON);
insert into Foo (test) values ('{"text": "明天再水\u0000绿塔的"}');
-- works

The json data type stores an exact copy of the input text

这样做的好处是可以使数据与您从 API 收到的数据保持一致,以防转义序列具有您需要保留的某些含义。

它还允许您使用 Postgres JSON 运算符(例如 ->>)进行查询,尽管将带有 \u0000 的 JSON 字段转换为文本仍然会失败:

select test->>'text' from Foo
-- ERROR:  unsupported Unicode escape sequence

BYTEA 类型的列也可以接受任何字节序列,而无需操作数据。在 Gorm 中,使用 type:bytea 标签:

type RawJSONData struct {
    // ... other fields
    Data      string `gorm:"type:bytea" json:"data"`
}

如果您不能接受以上任何一项,那么您必须清理输入字符串...