Postgresql 中的字符串替换生成一个附加字符串数组
String replacement in Postgresql originating an array of additional strings
假设您有两个 table 必须保持原样的替换,另一个 table 包含一组名称。我怎样才能得到所有可能的替代品?
Substitution Table
--------------------------------------
word subs_list
MOUNTAIN MOUNTAIN, MOUNT, MT, MTN
HOUSE HAUS, HOUSE
VIEW VU, VIEW
Synonyms table
-------------------------------------------------
EDUCATION SCHOOL, UNIVERSITY, COLLEGE, TRAINING
FOOD STORE, FOOD, CAFE
STORE FOOD, STORE, MARKET
REFRIGERATION FOODLOCKER, FREEZE, FRIDGE
names table
------------------------------------------------
MOUNT VU FOOD USA
MOUNTAIN VU STORE CA
注意:我知道最好只有一个替换 table,但是两个替换 table 必须保留,因为它们比一个有更多的用途上面解释过,那些 tables 已经被使用了。此外,两个 table 中的替换列表只是一个带有逗号分隔的字符串的 varchar
考虑到前面的问题,问题是通过替换生成可能的名称。例如,名称 MOUNT VU FOOD USA
应该分解为 MOUNTAIN VIEW FOOD USA
和 MOUNTAIN VIEW STORE USA
,第二个也是如此。
我已经能够以错误的顺序获得替换品并且所有功能都在一起,有一种方法
获取数组作为替换后生成的不同名称的输出?到目前为止,我已经为替换创建了这个函数:
create or replace function replace_companies_array(i_sentence IN VARCHAR) returns VARCHAR[] AS $p_replaced$
DECLARE
p_replaced VARCHAR[];
subs RECORD;
flag boolean:= True;
cur_s CURSOR(i_sentence VARCHAR)
FOR SELECT w.input, coalesce(x.word, w.input) as word, count(*) OVER (PARTITION BY w.input) as counter
FROM regexp_split_to_table(trim(i_sentence), '\s') as w(input)
LEFT JOIN (
select s.word, trim(s1.token) as token
from subs01 s
cross join unnest(string_to_array(s.subs_list, ',')) s1(token)
union
select sy.word, trim(s2.token) as token
from syns01 sy
cross join unnest(string_to_array(sy.syn_list, ',')) s2(token)
) as x on lower(trim(w.input)) = lower(x.token)
order by counter;
BEGIN
OPEN cur_s(i_sentence);
LOOP
--fetch row into the substitutions
FETCH cur_s INTO subs;
--Exit when no more rows to fetch
EXIT WHEN NOT FOUND;
SELECT REGEXP_REPLACE(i_sentence,'(^|[^a-z0-9])' || subs.input || '($|[^a-z0-9])','' || UPPER(subs.word) || '','g')
INTO i_sentence;
END LOOP;
p_replaced:=array_append(p_replaced, i_sentence);
RETURN p_replaced;
END;
$p_replaced$ LANGUAGE plpgsql;
非常感谢您的贡献
我没有得到最终结果,但我已经很接近了!
从句子:MOUNT VU FOOD USA
,我得到{"MOUNTAIN VIEW MARKET USA","MOUNTAIN VIEW STORE USA","MOUNTAIN VIEW CAFE USA","MOUNTAIN VIEW FOOD USA"}
以下是我重新创建同义词和替换表的所有脚本:
DROP TABLE IF EXISTS subs01;
DROP TABLE IF EXISTS syns01;
CREATE TABLE subs01 (word VARCHAR(20), subs_list VARCHAR(200));
CREATE TABLE syns01 (word VARCHAR(20), syn_list VARCHAR(200));
INSERT INTO subs01 (word, subs_list) VALUES ('MOUNTAIN', 'MOUNTAIN, MOUNT, MT, MTN'),('HOUSE', 'HAUS, HOUSE'),('VIEW', 'VU, VIEW');
INSERT INTO syns01 (word, syn_list) VALUES ('EDUCATION', 'SCHOOL, UNIVERSITY, COLLEGE, TRAINING'),('FOOD', 'STORE, FOOD, CAFE'),('STORE', 'FOOD, STORE, MARKET'),('REFRIGERATION', 'FOODLOCKER, FREEZE, FRIDGE');
我决定将工作分为两个阶段:
替换文字:
CREATE OR REPLACE function substitute_words (i_sentence IN VARCHAR) returns VARCHAR AS $p_substituted$
DECLARE
--p_substituted VARCHAR;
subs_cursor CURSOR FOR select su.word, trim(s2.token) as token from subs01 su cross join unnest(string_to_array(su.subs_list, ',')) s2(token);
subs_record record;
BEGIN
OPEN subs_cursor;
LOOP
FETCH subs_cursor INTO subs_record;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'INFO : TOKEN (%) ',subs_record.token ;
IF i_sentence LIKE '%'|| subs_record.token || '%' THEN
RAISE NOTICE '-- FOUND : TOKEN (%) ',subs_record.token ;
SELECT replace (i_sentence, subs_record.token, subs_record.word) INTO i_sentence;
END IF;
END LOOP;
CLOSE subs_cursor;
RETURN i_sentence;
END
$p_substituted$ LANGUAGE plpgsql;
用同义词替换已知单词:
CREATE OR REPLACE function synonymize_sentence (i_sentence IN VARCHAR) returns TABLE (sentence_result VARCHAR) AS $p_syn$
DECLARE
syn_cursor CURSOR FOR select su.word, trim(s2.token) as token from syns01 su cross join unnest(string_to_array(su.syn_list, ',')) s2(token);
syn_record record;
BEGIN
CREATE TEMPORARY TABLE record_syn (result VARCHAR(200)) ON COMMIT DROP;
INSERT INTO record_syn (result) SELECT i_sentence;
OPEN syn_cursor;
LOOP
FETCH syn_cursor INTO syn_record;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'INFO : WORD (%) ',syn_record.word ;
INSERT INTO record_syn (result) SELECT replace (result, syn_record.word, syn_record.token) FROM record_syn where result LIKE '%'|| syn_record.word || '%';
END LOOP;
CLOSE syn_cursor;
RETURN QUERY SELECT distinct result FROM record_syn;
END;
$p_syn$ LANGUAGE plpgsql;
然后,为了生成结果数组,我执行了这条语句:
SELECT ARRAY(SELECT synonymize_sentence (substitute_words ('MOUNT VU FOOD USA')));
假设您有两个 table 必须保持原样的替换,另一个 table 包含一组名称。我怎样才能得到所有可能的替代品?
Substitution Table
--------------------------------------
word subs_list
MOUNTAIN MOUNTAIN, MOUNT, MT, MTN
HOUSE HAUS, HOUSE
VIEW VU, VIEW
Synonyms table
-------------------------------------------------
EDUCATION SCHOOL, UNIVERSITY, COLLEGE, TRAINING
FOOD STORE, FOOD, CAFE
STORE FOOD, STORE, MARKET
REFRIGERATION FOODLOCKER, FREEZE, FRIDGE
names table
------------------------------------------------
MOUNT VU FOOD USA
MOUNTAIN VU STORE CA
注意:我知道最好只有一个替换 table,但是两个替换 table 必须保留,因为它们比一个有更多的用途上面解释过,那些 tables 已经被使用了。此外,两个 table 中的替换列表只是一个带有逗号分隔的字符串的 varchar
考虑到前面的问题,问题是通过替换生成可能的名称。例如,名称 MOUNT VU FOOD USA
应该分解为 MOUNTAIN VIEW FOOD USA
和 MOUNTAIN VIEW STORE USA
,第二个也是如此。
我已经能够以错误的顺序获得替换品并且所有功能都在一起,有一种方法 获取数组作为替换后生成的不同名称的输出?到目前为止,我已经为替换创建了这个函数:
create or replace function replace_companies_array(i_sentence IN VARCHAR) returns VARCHAR[] AS $p_replaced$
DECLARE
p_replaced VARCHAR[];
subs RECORD;
flag boolean:= True;
cur_s CURSOR(i_sentence VARCHAR)
FOR SELECT w.input, coalesce(x.word, w.input) as word, count(*) OVER (PARTITION BY w.input) as counter
FROM regexp_split_to_table(trim(i_sentence), '\s') as w(input)
LEFT JOIN (
select s.word, trim(s1.token) as token
from subs01 s
cross join unnest(string_to_array(s.subs_list, ',')) s1(token)
union
select sy.word, trim(s2.token) as token
from syns01 sy
cross join unnest(string_to_array(sy.syn_list, ',')) s2(token)
) as x on lower(trim(w.input)) = lower(x.token)
order by counter;
BEGIN
OPEN cur_s(i_sentence);
LOOP
--fetch row into the substitutions
FETCH cur_s INTO subs;
--Exit when no more rows to fetch
EXIT WHEN NOT FOUND;
SELECT REGEXP_REPLACE(i_sentence,'(^|[^a-z0-9])' || subs.input || '($|[^a-z0-9])','' || UPPER(subs.word) || '','g')
INTO i_sentence;
END LOOP;
p_replaced:=array_append(p_replaced, i_sentence);
RETURN p_replaced;
END;
$p_replaced$ LANGUAGE plpgsql;
非常感谢您的贡献
我没有得到最终结果,但我已经很接近了!
从句子:MOUNT VU FOOD USA
,我得到{"MOUNTAIN VIEW MARKET USA","MOUNTAIN VIEW STORE USA","MOUNTAIN VIEW CAFE USA","MOUNTAIN VIEW FOOD USA"}
以下是我重新创建同义词和替换表的所有脚本:
DROP TABLE IF EXISTS subs01;
DROP TABLE IF EXISTS syns01;
CREATE TABLE subs01 (word VARCHAR(20), subs_list VARCHAR(200));
CREATE TABLE syns01 (word VARCHAR(20), syn_list VARCHAR(200));
INSERT INTO subs01 (word, subs_list) VALUES ('MOUNTAIN', 'MOUNTAIN, MOUNT, MT, MTN'),('HOUSE', 'HAUS, HOUSE'),('VIEW', 'VU, VIEW');
INSERT INTO syns01 (word, syn_list) VALUES ('EDUCATION', 'SCHOOL, UNIVERSITY, COLLEGE, TRAINING'),('FOOD', 'STORE, FOOD, CAFE'),('STORE', 'FOOD, STORE, MARKET'),('REFRIGERATION', 'FOODLOCKER, FREEZE, FRIDGE');
我决定将工作分为两个阶段:
替换文字:
CREATE OR REPLACE function substitute_words (i_sentence IN VARCHAR) returns VARCHAR AS $p_substituted$ DECLARE --p_substituted VARCHAR; subs_cursor CURSOR FOR select su.word, trim(s2.token) as token from subs01 su cross join unnest(string_to_array(su.subs_list, ',')) s2(token); subs_record record; BEGIN OPEN subs_cursor; LOOP FETCH subs_cursor INTO subs_record; EXIT WHEN NOT FOUND; RAISE NOTICE 'INFO : TOKEN (%) ',subs_record.token ; IF i_sentence LIKE '%'|| subs_record.token || '%' THEN RAISE NOTICE '-- FOUND : TOKEN (%) ',subs_record.token ; SELECT replace (i_sentence, subs_record.token, subs_record.word) INTO i_sentence; END IF; END LOOP; CLOSE subs_cursor; RETURN i_sentence; END $p_substituted$ LANGUAGE plpgsql;
用同义词替换已知单词:
CREATE OR REPLACE function synonymize_sentence (i_sentence IN VARCHAR) returns TABLE (sentence_result VARCHAR) AS $p_syn$ DECLARE syn_cursor CURSOR FOR select su.word, trim(s2.token) as token from syns01 su cross join unnest(string_to_array(su.syn_list, ',')) s2(token); syn_record record; BEGIN CREATE TEMPORARY TABLE record_syn (result VARCHAR(200)) ON COMMIT DROP; INSERT INTO record_syn (result) SELECT i_sentence; OPEN syn_cursor; LOOP FETCH syn_cursor INTO syn_record; EXIT WHEN NOT FOUND; RAISE NOTICE 'INFO : WORD (%) ',syn_record.word ; INSERT INTO record_syn (result) SELECT replace (result, syn_record.word, syn_record.token) FROM record_syn where result LIKE '%'|| syn_record.word || '%'; END LOOP; CLOSE syn_cursor; RETURN QUERY SELECT distinct result FROM record_syn; END; $p_syn$ LANGUAGE plpgsql;
然后,为了生成结果数组,我执行了这条语句:
SELECT ARRAY(SELECT synonymize_sentence (substitute_words ('MOUNT VU FOOD USA')));