如何在具有唯一约束的列中插入长文本(> 3K 个字符)

How to insert long text (>3K chars) in columns with unique constraint

some problems 在 postgres 中插入的文本太长。当我有一个简单的 table 文本时,我可以根据需要插入文本(我测试了多达 40K 个字符)。但是,当我添加 unique 约束时,我开始遇到一个奇怪的 btree 问题,请参阅下面的最小工作示例 (MWE)

MWE:

#!/bin/bash                                                                                                                                                                                           
                                                                             
 N=4096 # Aiming for a URL of length 4,096 characters                        
 DB_NAME='foo'                                                               
 # Generate random N character alphanumeric string of lenght 4,096           
 URL="http://www.$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $(($N-15)) | head -n 1).com"
                                                                             
 # Case 1 we have a table ('website') which has a single column with no      
 # constraints                                                               
 TABLE_NAME='website'                                                        
 sudo -u postgres psql -c "drop database if exists $DB_NAME;"                
 sudo -u postgres psql -c "create database $DB_NAME;"                        
 sudo -u postgres psql -d $DB_NAME -c "drop table if exists $TABLE_NAME;"    
 sudo -u postgres psql -d $DB_NAME -c "create table $TABLE_NAME (url text);" 
 sudo -u postgres psql -d $DB_NAME -c "insert into $TABLE_NAME (url) values ('$URL');"
                                                                             
                                                                             
 # Case 2 we have a table ('website2') which has a single column which must be
 # unique                                                                    
 TABLE_NAME='website2'                                                       
 sudo -u postgres psql -c "drop database if exists $DB_NAME;"                
 sudo -u postgres psql -c "create database $DB_NAME;"                        
 sudo -u postgres psql -d $DB_NAME -c "drop table if exists $TABLE_NAME;"    
 sudo -u postgres psql -d $DB_NAME -c "create table $TABLE_NAME (url text unique);"
 sudo -u postgres psql -d $DB_NAME -c "insert into $TABLE_NAME (url) values ('$URL');"

输出:

$ ./test.sh                                                                 
 DROP DATABASE                                                                                                                                                                                         
 CREATE DATABASE                                                             
 NOTICE:  table "website" does not exist, skipping                           
 DROP TABLE                                                                  
 CREATE TABLE                                                                
 INSERT 0 1                                                                  
 DROP DATABASE                                                               
 CREATE DATABASE                                                             
 NOTICE:  table "website2" does not exist, skipping                          
 DROP TABLE                                                                  
 CREATE TABLE                                                                
 ERROR:  index row size 4112 exceeds btree version 4 maximum 2704 for index "website2_url_key"
 DETAIL:  Index row references tuple (0,1) in relation "website2".           
 HINT:  Values larger than 1/3 of a buffer page cannot be indexed.           
 Consider a function index of an MD5 hash of the value, or use full text indexing.

问题:如果我的文本非常长(>3K 个字符),我该怎么办?

  1. 有什么方法可以消除这个 btree 错误吗?
  2. 我是否应该删除 unique 约束并仅在应用程序级别进行检查?
  3. 我应该压缩所有 URL 吗?
  4. 是否根本不可能通过 PSQL 实现此目的?

我在 UNIQUE constraint on large VARCHARs - PostgreSQL

找到了解决方案

解法:

 N=4096 # Aiming for a URL of length 4,096 characters                        
 DB_NAME='foo'                                                               
 # Generate random N character alphanumeric string of lenght 4,096           
 URL="http://www.$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $(($N-15)) | head -n 1).com"
                                                                             
 # To get over the length limitation of a text column with a unique          
 # constraint we  carry out the following                                    
 #  1) Remove UNIQUE constraint from column                                  
 #  2) Add UNIQUE constraint for md5 of column                               
 TABLE_NAME='websitemd5'                                                     
 sudo -u postgres psql -c "drop database if exists $DB_NAME;"                
 sudo -u postgres psql -c "create database $DB_NAME;"                        
 sudo -u postgres psql -d $DB_NAME -c "drop table if exists $TABLE_NAME;"    
 sudo -u postgres psql -d $DB_NAME -c "create table $TABLE_NAME (url text);" 
 sudo -u postgres psql -d $DB_NAME -c "create unique index unique_url_index on $TABLE_NAME (md5(url));"
 sudo -u postgres psql -d $DB_NAME -c "insert into $TABLE_NAME (url) values ('$URL');"                                                                                                                 
 sudo -u postgres psql -d $DB_NAME -c "insert into $TABLE_NAME (url) values ('$URL');"

输出:

DROP DATABASE
CREATE DATABASE
NOTICE:  table "websitemd5" does not exist, skipping
DROP TABLE
CREATE TABLE
CREATE INDEX
INSERT 0 1
ERROR:  duplicate key value violates unique constraint "unique_url_index"
DETAIL:  Key (md5(url))=(71bf6c554ab335360cd657d060f84c2d) already exists.

更好的是,将 md5 转换为类型 uuid:

CREATE UNIQUE INDEX unique_url_index ON $TABLE_NAME ((md5(url)::uuid));  -- parens required

使索引更小更快。参见: