SQL 服务器 2008 patindex 递归

Question

我想找到表达式的最新实例，然后继续寻找更好的匹配项，然后选择最佳匹配项。

我正在查看的单元格是一个重复附加的日志，其中包含注释，后跟用户名和时间戳。

示例单元格内容：

Starting the investigation.
JWAYNE entered the notes above on 08/12/1976 12:01

Taking over the case. Not a lot of progress recently.
CEASTWOOD entered the notes above on 03/14/2001 09:04

No wonder this case is not progressing, the whole town is covering up some shenanigans!
CEASTWOOD entered the notes above on 03/21/2001 05:23

Star command was right, this investigation has been tossed around like a hot potato for a long time!
BLIGHTYEAR entered the notes above on 08/29/2659 08:01

我不是数据库范式规则方面的专家，但很烦人的是条目挤在一个单元格中，这让我的工作是隔离和检查特定单词的注释，尤其是当单元格重复多行时直到调查结束，这会将未来阶段的笔记放入过去事件的笔记列中，最重要的是时间戳使时间戳 PATINDEX 甚至几分钟的余量都不可靠，如下所示：

CaseID, Username,  Notes,             Phase, Timestamp
E18902, JWAYNE,    Starting....08:01, E1,    03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E2,    03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E3,    03/21/2001 05:34
E18902, BLIGHTYEAR,Starting....08:01, E4,    08/29/2659 07:58

现在我正在对整个字符串进行反向操作，然后使用 patindex 来查找用户名，然后子字符串化到 select 只有调查阶段的注释，问题是同一用户输入注释对于多个阶段，我的简单 "look for the first match staring at the end of the string moving to the top" 选择了错误的条目。我的第一个想法是搜索用户名，然后再次检查以查看更上一层的条目是否更匹配（注意时间戳与列时间戳），但我不确定如何编码...

我是否必须进行复杂的字符串拆分或是否有更简单的解决方案？

Answer 1

这是我的建议。这是一个记录，但如果愿意，您可以将其转换为用户定义的 table 值函数。

我将使用上面的示例数据。

 declare @sourceText nvarchar(max)
    ,    @workText   nvarchar(max)
    ,    @xml        xml

 set @sourceText = <your example text in your question>
 set @workText = @sourceText

 -- We're going to replace all the carriage returns and line feeds with 
 -- characters unlikely to appear in your text.  (If they are, use some
 -- other character.)

 set    @workText = REPLACE(@workText, char(10), '|')
 set    @workText = REPLACE(@workText, char(13), '|')

 -- Now, we're going to turn your text into XML.  Our first target is 
 -- the string of four "|" characters that the blank lines between entries
 -- will be turned into.  (If you've got 3, or 6, or blanks in between, 
 -- adjust accordingly.)

set @workText = REPLACE(@workText, '||||', '</line></entry><entry><line>')

-- Now we replace every other "|".  
set @workText = REPLACE(@workText, '|', '</line><line>')

-- Now we construct the rest of the XML and convert the variable to an 
-- actual XML variable.
set @workText = '<entry><line>' + @workText + '</line></entry>'
set @workText = REPLACE(@workText, '<line></line>','') -- Get rid of any empty nodes.

set @xml = CONVERT(xml, @workText)

我们现在应该有一个看起来像这样的 XML 片段。（此时在SQL中插入select @xml即可看到）

<entry>
  <line>Starting the investigation.</line>
  <line>JWAYNE entered the notes above on 08/12/1976 12:01</line>
</entry>
<entry>
  <line>Taking over the case. Not a lot of progress recently.</line>
  <line>CEASTWOOD entered the notes above on 03/14/2001 09:04</line>
</entry>
<entry>
  <line>No wonder this case is not progressing, the whole town is covering up some shenanigans!</line>
  <line>CEASTWOOD entered the notes above on 03/21/2001 05:23</line>
</entry>
<entry>
  <line>Star command was right, this investigation has been tossed around like a hot potato for a long time!</line>
  <line>BLIGHTYEAR entered the notes above on 08/29/2659 08:01</line>
</entry>

我们现在可以将 XML 转换为我们更喜欢的 XML ：

  set @xml = @xml.query(
  'for $entry in /entry
    return <entry><data>
    {
    for $line in $entry/line[position() < last()] 
    return string($line)
    }
    </data>
    <timestamp>{ data($entry/line[last()]) }</timestamp>     
 </entry>
 ')

这让我们 XML 看起来像这样（由于篇幅原因，只显示了一个条目）：

<entry>
    <data>Starting the investigation.</data>
    <timestamp>JWAYNE entered the notes above on 08/12/1976 12:01</timestamp>
</entry>

您可以使用以下查询将其转换回表格数据：

select  EntryData = R.lines.value('data[1]', 'nvarchar(max)')
    ,   EntryTimestamp = R.lines.value('timestamp[1]', 'nvarchar(MAX)')
from    @xml.nodes('/entry') as R(lines)

...并获取如下所示的数据。

从那里，您可以做任何您需要做的事情。

SQL 服务器 2008 patindex 递归

SQL server 2008 patindex recursion

sql

sql-server

recursion

sql-server-2008

patindex