如何查询文本以找到SQL中最长的前缀字符串?

How to query text to find the longest prefix strings in SQL?

我正在使用 sparq sql。假设这是我的大 table:

的快照
ups store
ups store austin
ups store chicago
ups store bern
walmart
target

如何在sql中找到上述数据的最长前缀?即:

 ups store
 walmart
 target

我已经有一个 Java 程序来执行此操作,但我有一个大文件,现在我的问题是 是否可以在 [=33= 中合理地完成此操作]?

下面更复杂的场景怎么样? (我可以没有这个但如果可能的话很高兴)

ups store austin
ups store chicago
ups store bern
walmart
target

那就是 return [ups store, walmart, target]

假设您的列名称是 "mycolumn",您的大 table 是 "mytable",并且单个 space 是您的字段分隔符:

在 PostgreSQL 中,您可以执行如下简单的操作:

select
   mycolumn
from
   mytable
order by
   length(split_part(mycolumn, ' ', 1)) desc
limit
   1

如果您经常 运行 这个查询,我可能会像这样在 table 上尝试一个有序的函数索引:

create prefix_index on mytable (length(split_part(mycolumn, ' ', 1)) desc)

假设您可以自由创建另一个 table,它只包含一个从零到最长可能字符串大小的升序整数列表,那么下面应该只使用 ANSI [=23] 来完成这项工作=]:

SELECT
  id,
  SUBSTRING(name, 1, CASE WHEN number = 0 THEN LENGTH(name) ELSE number END) AS prefix
FROM
 -- Join all places to all possible substring lengths.
 (SELECT *
  FROM places p
  CROSS JOIN lengths l) subq
-- If number is zero then no prefix match was found elsewhere
-- (from the question it looked like you wanted to include these)
WHERE (subq.number = 0 OR
       -- Look for prefix match elsewhere
       EXISTS (SELECT * FROM places p
               WHERE SUBSTRING(p.name FROM 1 FOR subq.number)
                     = SUBSTRING(subq.name FROM 1 FOR subq.number)
                 AND p.id <> subq.id))
  -- Include as a prefix match if the whole string is being used
  AND (subq.number = LENGTH(name)
       -- Don't include trailing spaces in a prefix
       OR (SUBSTRING(subq.name, subq.number, 1) <> ' '
           -- Only include the longest prefix match 
           AND NOT EXISTS (SELECT * FROM places p 
                           WHERE SUBSTRING(p.name FROM 1 FOR subq.number + 1)
                                 = SUBSTRING(subq.name FROM 1 FOR subq.number + 1)
                             AND p.id <> subq.id)))
ORDER BY id;

现场演示: http://rextester.com/XPNRP24390

The second aspect is that what if we have (ups store austin, ups store chicago). can we use SQL to extract the 'ups store' off of it.

这应该只是一个以与上述类似的方式使用 SUBSTRING 的情况,例如:

SELECT SUBSTRING(name,
                 LENGTH('ups store ') + 1,
                 LENGTH(name) - LENGTH('ups store '))
FROM places
WHERE SUBSTRING(name,
                1,
                LENGTH('ups store ')) = 'ups store ';