获取字符串中的每个 <tag> - stackexchange 数据库

Question

我的问题的模型代码：

SELECT Id FROM Tags WHERE TagName IN '<osx><keyboard><security><screen-lock>'

问题详细

我正在尝试从 apple.stackexchange 数据中获取 2011 年使用的标签。 (this query)

正如您 can see 一样，标签更改中的标签以纯文本形式存储在 Text 字段中。

<tag1><tag2><tag3>
<osx><keyboard><security><screen-lock>

如何创建一个唯一的标签列表，以便在 Tags table 中查找它们，而不是这个硬编码版本：

SELECT * FROM Tags
  WHERE TagName = 'osx' 
     OR TagName = 'keyboard' 
     OR TagName = 'security'

这里是interactive example.

Stackexchange 使用 T-SQL，我的本地副本是运行 under postgresql using Postgres app version 9.4.5.0.

Answer 1

我已将数据简化为仅相关列，并将其命名为 tags 以展示示例。

示例数据

create table posthistory(tags text);
insert into posthistory values
  ('<lion><backup><time-machine>'),
  ('<spotlight><alfred><photo-booth>'),
  ('<lion><pdf><preview>'),
  ('<pdf>'),
  ('<asd>');

查询以获取唯一的标签列表

SELECT DISTINCT
  unnest(
    regexp_split_to_array(
      trim('><' from tags), '><'
    )
  )
FROM
  posthistory

首先，我们从每一行中删除所有出现的前导和尾随 > 和 < 符号，然后使用 regexp_split_to_array() 函数将值放入数组，然后 unnest() 将数组扩展为一组行。最后DISTINCT消除重复值。

呈现 SQLFiddle 预览其工作原理。

Answer 2

假设这个 table 定义：

CREATE TABLE posthistory(post_id int PRIMARY KEY, tags text);

具体取决于你想要什么：

要将字符串转换为数组，trim 前导和尾随 '<>'，然后将 '><' 视为分隔符：

SELECT *, string_to_array(trim(tags, '><'), '><') AS tag_arr
FROM   posthistory;

获取整个 table 的唯一标签列表（我猜你想要这个）：

SELECT DISTINCT tag
FROM   posthistory, unnest(string_to_array(trim(tags, '><'), '><')) tag;

隐式 LATERAL 联接需要 Postgres 9.3 或更高版本。

这应该比使用正则表达式快得多。如果您想尝试正则表达式，请使用 regexp_split_to_table() 而不是 regexp_split_to_array() 后跟 unnest()，如建议的另一个答案：

SELECT DISTINCT tag
FROM   posthistory, regexp_split_to_table(trim(tags, '><'), '><') tag;

也可以使用隐式 LATERAL 连接。相关：

要搜索特定标签：

SELECT *
FROM   posthistory
WHERE  tags LIKE '%<security>%'
AND    tags LIKE '%<osx>%';

SQL Fiddle.

已应用于您在我们的数据浏览器 T-SQL 中的搜索：

SELECT TOP 100
       PostId, UserId, Text AS Tags FROM PostHistory
WHERE  year(CreationDate) = 2011
AND    PostHistoryTypeId IN (3  -- initial tags
                           , 6  -- edit tags
                           , 9) -- rollback tags
AND    Text LIKE ('%<' + ##TagName:String?postgresql## + '>%');

(T-SQL 语法使用 non-standard + 而不是 ||。)
https://data.stackexchange.com/apple/query/edit/417055

获取字符串中的每个 <tag> - stackexchange 数据库

Get each <tag> in String - stackexchange database

sql

postgresql

dataexplorer

set-returning-functions

问题详细