在某个字符之后从文本中提取每一行,并使用 postgres 将结果提取到 table
Extracting each line from a text after a certain character and extract the result to a table using postgres
我有这样一段文字:
>Sequenz: Test 1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 2 1234 Organism: Treponema
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 3
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
文本块之间不一定有空行,它可能是 'MTEITAAMVKELRESTGAGM'
的各种行数。唯一可以确定的是每行前的 >
。
我希望得到这样的 table:
HEADER
----------
Sequenz: Test 1
----------
Sequenz 2 1234 Organism: Treponema
----------
Sequenz 3
我试过了:
SELECT regexp_matches(regexp_split_to_table( 'text from above', '\n>'),'([A-Z,a-z,0-9]+\s)');
导致
HEADER
----------
Sequenz
----------
Sequenz
----------
Sequenz
和
Select regexp_split_to_table('text from bove', '[\\n>+(.)\\n]+')
导致
HEADER
----------
----------
Sequenz: Test 1
----------
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
----------
----------
Sequenz 2 1234 Organism: Treponema
----------
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
----------
----------
Sequenz 3
----------
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
试试这个:
SELECT split_part(regexp_split_to_table(trim(leading '>' from '>Sequenz: Test 1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 2 1234 Organism: Treponema
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 3
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG'), E'>'),E'\n', 1) AS res
如果要保留第一个空行,请删除 trim()
函数。
我有这样一段文字:
>Sequenz: Test 1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 2 1234 Organism: Treponema
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 3
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
文本块之间不一定有空行,它可能是 'MTEITAAMVKELRESTGAGM'
的各种行数。唯一可以确定的是每行前的 >
。
我希望得到这样的 table:
HEADER
----------
Sequenz: Test 1
----------
Sequenz 2 1234 Organism: Treponema
----------
Sequenz 3
我试过了:
SELECT regexp_matches(regexp_split_to_table( 'text from above', '\n>'),'([A-Z,a-z,0-9]+\s)');
导致
HEADER
----------
Sequenz
----------
Sequenz
----------
Sequenz
和
Select regexp_split_to_table('text from bove', '[\\n>+(.)\\n]+')
导致
HEADER
----------
----------
Sequenz: Test 1
----------
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
----------
----------
Sequenz 2 1234 Organism: Treponema
----------
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
----------
----------
Sequenz 3
----------
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
试试这个:
SELECT split_part(regexp_split_to_table(trim(leading '>' from '>Sequenz: Test 1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 2 1234 Organism: Treponema
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
>Sequenz 3
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG'), E'>'),E'\n', 1) AS res
如果要保留第一个空行,请删除 trim()
函数。