MYSQL如何计算文本列中每个单词的长度

MYSQL How to count the length of each word in text column

假设我在 mysql 中有一个简单的 table,其中包含一个名为 words 的 TEXT 列,其中包含任意文本,例如 'The big brown dog jumped over the lazy fox'.

如何使用 mysql 查询查找文本中每个单词的长度?

结果应该是:3 3 5 3 6 4 3 4 3 - 或类似的东西。可以排序,也可以包含单词本身。找不到我自己去做那件事。我可以找到一些计算单词数量的示例。但是我需要每个单词的长度。

你的例子没有显示标点符号,所以我假设你只是想用空格分割值(所以 The fox. 会给 The 3 和 4 fox.)

鉴于:

CREATE TABLE `foo` (
    `bar` text COLLATE utf8_unicode_ci DEFAULT NULL
);
insert into foo values ('Lorem ipsum dolor sit amet'),('consectetur adipiscing elit'),('sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'),('Ut enim ad minim veniam'),('quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat'),('Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur'),('Excepteur sint occaecat cupidatat non proident'),('sunt in culpa qui officia deserunt mollit anim id est laborum');

您可以使用递归 CTE 获取每个文本列中的所有单词:

with recursive words (word,remaining,word_number,full_string) as (
    select
        substring_index(bar,' ',1) word,
        if(instr(bar,' '),right(bar,length(bar)-instr(bar,' ')),'') remaining,
        1 word_number,
        bar full_string
    from foo

    union all

    select
        substring_index(remaining,' ',1) word,
        if(instr(remaining,' '),right(remaining,length(remaining)-instr(remaining,' ')),'') remaining,
        word_number+1 word_number,
        full_string
    from words
    where length(remaining)
)
select length(word) from words;

对于不支持递归通用 table 表达式的早期数据库版本,您必须假设某个最大单词数并创建一个子查询,其中的行数从 1 到最大。例如,给定最大值 256,您可以说:

select i*64+j*16+k*4+l+1 n
from (select 0 l union all select 1 union all select 2 union all select 3) l
cross join (select 0 k union all select 1 union all select 2 union all select 3) k
cross join (select 0 j union all select 1 union all select 2 union all select 3) j
cross join (select 0 i union all select 1 union all select 2 union all select 3) i;

或者,如果您愿意,您可以只使用一长串 (select 1 n union all select 2 union all select 3 ... union all select 256)

然后将其加入您的 table 并提取每个单词:

select
    length(substring_index(substring_index(bar,' ',word_number),' ',-1)) word_length,
    substring_index(substring_index(bar,' ',word_number),' ',-1) word,
    word_number,
    bar full_string
from (
    select i*64+j*16+k*4+l+1 word_number
    from (select 0 l union all select 1 union all select 2 union all select 3) l
    cross join (select 0 k union all select 1 union all select 2 union all select 3) k
    cross join (select 0 j union all select 1 union all select 2 union all select 3) j
    cross join (select 0 i union all select 1 union all select 2 union all select 3) i
) n
join foo on word_number <= length(bar)-length(replace(bar,' ',''))+1;