T-SQL 2008 - 从括号之间的字符串获取多个值
T-SQL 2008 - Get Multiple Values from string between brackets
我有一个带公式的 table,我需要能够提取方括号“[”和“]”之间的所有值。我要查找的值保证在括号之间。
部分字符串示例如下:
if ((DateTime.Parse("[ST 35401900]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3, 1, 0)
我要用什么替换 "ST"。
The result should be:
35401900
35401903
35401900
35401903
我要搜索的列名是 "DerivedEval"
我尝试了以下方法,但只 return 第一个结果。
SELECT RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(DerivedEval,CHARINDEX('[',DerivedEval)+1,CHARINDEX(']',DerivedEval)-CHARINDEX('[',DerivedEval)-1), 'ST', ''), 'INV','')))
如何将其扩展到 return 所有结果?
根据 Hogan 的回复,我决定推出一些功能来完成此任务。
CREATE FUNCTION [dbo].[GetDerivedDataPointsFromFormula]
(
@DerivedDataPointId INT
,@strFormula VARCHAR(MAX)
)
RETURNS @RtnValue table
(
id int identity(1,1)
,DerivedDataPointId INT
,DataPointId INT
,Formula VARCHAR(MAX)
)
AS
BEGIN
INSERT INTO @RtnValue(DerivedDataPointId, DataPointId, Formula)
SELECT @DerivedDataPointId
, RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(data, 0, CHARINDEX(']', data, 0)),'ST',''), 'INV', ''))) AS DataPointIdInvolved
, @strFormula
FROM (
SELECT DATA
FROM dbo.split(@strFormula, '[')
) AS data
WHERE LEN(RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(data, 0, CHARINDEX(']', data, 0)),'ST',''), 'INV', '')))) > 0
RETURN
结束
拆分函数定义为:
CREATE FUNCTION [dbo].[Split]
(
@RowData varchar(MAX),
@SplitOn nvarchar(5)
)
RETURNS @RtnValue table
(
Id int identity(1,1),
Data nvarchar(1000)
)
AS
BEGIN
Declare @Cnt int
Set @Cnt = 1
While (Charindex(@SplitOn,@RowData)>0)
Begin
Insert Into @RtnValue (data)
Select
Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))
Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
Set @Cnt = @Cnt + 1
End
Insert Into @RtnValue (data)
Select Data = ltrim(rtrim(@RowData))
Return
END
然后我可以在使用交叉应用时得到我需要的东西
Select DISTINCT f.DerivedDataPointId
, f.DataPointId
,DerivedEval
from DerivedDataPoint d (readuncommitted)
Cross Apply dbo.GetDerivedDataPointsFromFormula(d.DerivedDataPointId, d.DerivedEval) f
也许这会帮助其他人寻找类似的方法。
根据您发布的示例数据,如果您要提取的数字是:
,则这是一个超级简单的问题
八位数长
出现在括号的末尾
如果是这种情况,您只需要 NGrams8K 的副本,您可以用 3 行代码解决这个问题:
-- your sample data
DECLARE @string VARCHAR(8000) = 'if ((DateTime.Parse("[ST 35401900]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3, 1, 0)';
-- purely set-based solution using NGrams8K
SELECT ng.position, result = SUBSTRING(ng.token,1,8)
FROM samd.NGrams8k(@string,9) AS ng
WHERE CHARINDEX(']',ng.token,8) = 9;
Returns:
position result
---------- --------
26 35401900
60 35401903
78 35401900
95 35401903
我知道您不需要知道这些数字在字符串中的位置,但我还是将其包括在内以证明如果您需要的话它是多么容易。
更新于 2019 年 1 月 22 日(美国)基于以下评论中的问题
要处理您提取的数字 而不是 的长度始终相同的情况,您可以使用我的 patextract8k
函数(使用 NGrams8K):
CREATE FUNCTION samd.patExtract8K
(
@string VARCHAR(8000),
@pattern VARCHAR(50)
)
/*****************************************************************************************
[Description]:
This can be considered a T-SQL inline table valued function (iTVF) equivalent of
Microsoft's mdq.RegexExtract: except:
1. It includes each matching substring's position in the string
2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
instead of nvarchar(4000) for the pattern
3. The mask parameter is not required and therefore does not exist.
4. You have specify what text we're searching for as an exclusion; e.g. for numeric
characters you should search for '[^0-9]' instead of '[0-9]'.
5. There is is no parameter for naming a "capture group". Using the variable below, both
the following queries will return the same result:
DECLARE @string nvarchar(4000) = N'123 Main Street';
SELECT item FROM samd.patExtract8K(@string, '[^0-9]');
SELECT clr.RegexExtract(@string, N'(?<number>(\d+))(?<street>(.*))', N'number', 1);
Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
rows where [matched]=0. The key benefit of is that it performs substantially better
because you are only returning the number of rows required instead of returning twice as
many rows then filtering out half of them.
The following two sets of queries return the same result:
DECLARE @string varchar(100) = 'xx123xx555xx999';
BEGIN
-- QUERY #1
-- patExtract8K
SELECT ps.itemNumber, ps.item
FROM samd.patExtract8K(@string, '[^0-9]') ps;
-- patternSplitCM
SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item
FROM dbo.patternSplitCM(@string, '[^0-9]') ps
WHERE [matched] = 0;
-- QUERY #2
SELECT ps.itemNumber, ps.item
FROM samd.patExtract8K(@string, '[0-9]') ps;
SELECT itemNumber = row_number() over (order by itemNumber), item
FROM dbo.patternSplitCM(@string, '[0-9]')
WHERE [matched] = 0;
END;
[Compatibility]:
SQL Server 2008+
[Syntax]:
--===== Autonomous
SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
FROM samd.patExtract8K(@string,@pattern) pe;
--===== Against a table using APPLY
SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
FROM samd.SomeTable t
CROSS APPLY samd.patExtract8K(t.someString, @pattern) pe;
[Parameters]:
@string = varchar(8000); the input string
@searchString = varchar(50); pattern to search for
[Returns]:
itemNumber = bigint; the instance or ordinal position of the matched substring
itemIndex = bigint; the location of the matched substring inside the input string
itemLength = int; the length of the matched substring
item = varchar(8000); the returned text
[Developer Notes]:
1. Requires NGrams8k
2. patExtract8K does not return any rows on NULL or empty strings. Consider using
OUTER APPLY or append the function with the code below to force the function to return
a row on emply or NULL inputs:
UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;
3. patExtract8K is not case sensitive; use a case sensitive collation for
case-sensitive comparisons
4. patExtract8K is deterministic. For more about deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
5. patExtract8K performs substantially better with a parallel execution plan, often
2-3 times faster. For queries that leverage patextract8K that are not getting a
parallel exeution plan you should consider performance testing using Traceflag 8649
in Development environments and Adam Machanic's make_parallel in production.
[Examples]:
--===== (1) Basic extact all groups of numbers:
WITH temp(id, txt) as
(
SELECT * FROM (values
(1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
(2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
(3, '&**OOOOO=+ + + // ==?76543// and today !!222222\\tester{}))22222444 done'))t(x,xx)
)
SELECT
[temp.id] = t.id,
pe.itemNumber,
pe.itemIndex,
pe.itemLength,
pe.item
FROM temp AS t
CROSS APPLY samd.patExtract8K(t.txt, '[^0-9]') AS pe;
-----------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20170801 - Initial Development - Alan Burstein
Rev 01 - 20180619 - Complete re-write - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
itemIndex = f.position,
itemLength = itemLen.l,
item = SUBSTRING(f.token, 1, itemLen.l)
FROM
(
SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))
FROM samd.NGrams8k(@string, 1) AS ng
WHERE PATINDEX(@pattern, ng.token) < --<< this token does NOT match the pattern
ABS(SIGN(ng.position-1)-1) + --<< are you the first row? OR
PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row
) AS f(position, token)
CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),
DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);
使用 PatExtract8K,您可以轻松指定尺寸范围。例如,假设您的值可能是 7-9 位数字。你可以这样做:
-- your sample data
DECLARE @string VARCHAR(8000) = 'if ((DateTime.Parse("[ST 123456789]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3 and [ST 1234567]=x, 1, 0)';
-- Lower and upper bounds for the length of valid values
DECLARE @low INT = 7, @high INT = 9
SELECT
itemIndex = s.itemIndex,
itemLength = s.itemLength-1,
item = SUBSTRING(s.item,0,s.itemLength)
FROM samd.patExtract8K(REPLACE(@string,']',CHAR(1)),'[^0-9'+CHAR(1)+']') AS s
WHERE s.itemLength BETWEEN @low AND @high+1;
--AND SUBSTRING(s.item,0,s.itemLength) NOT LIKE '[^0-9]' <<< If required
Returns
itemIndex itemLength item
----------- ----------- ------------
26 9 123456789
61 8 35401903
79 8 35401900
96 8 35401903
116 7 1234567
几个注意事项:
我更新了示例数据以包含 7-9 位长的值
您必须将代码模式修改为 dbo(相对于 samd)或创建一个名为 samd 的模式才能使用此功能。
我有一个带公式的 table,我需要能够提取方括号“[”和“]”之间的所有值。我要查找的值保证在括号之间。
部分字符串示例如下:
if ((DateTime.Parse("[ST 35401900]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3, 1, 0)
我要用什么替换 "ST"。
The result should be:
35401900
35401903
35401900
35401903
我要搜索的列名是 "DerivedEval" 我尝试了以下方法,但只 return 第一个结果。
SELECT RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(DerivedEval,CHARINDEX('[',DerivedEval)+1,CHARINDEX(']',DerivedEval)-CHARINDEX('[',DerivedEval)-1), 'ST', ''), 'INV','')))
如何将其扩展到 return 所有结果?
根据 Hogan 的回复,我决定推出一些功能来完成此任务。
CREATE FUNCTION [dbo].[GetDerivedDataPointsFromFormula]
(
@DerivedDataPointId INT
,@strFormula VARCHAR(MAX)
)
RETURNS @RtnValue table
(
id int identity(1,1)
,DerivedDataPointId INT
,DataPointId INT
,Formula VARCHAR(MAX)
)
AS
BEGIN
INSERT INTO @RtnValue(DerivedDataPointId, DataPointId, Formula)
SELECT @DerivedDataPointId
, RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(data, 0, CHARINDEX(']', data, 0)),'ST',''), 'INV', ''))) AS DataPointIdInvolved
, @strFormula
FROM (
SELECT DATA
FROM dbo.split(@strFormula, '[')
) AS data
WHERE LEN(RTRIM(LTRIM(REPLACE(REPLACE(SUBSTRING(data, 0, CHARINDEX(']', data, 0)),'ST',''), 'INV', '')))) > 0
RETURN
结束
拆分函数定义为:
CREATE FUNCTION [dbo].[Split]
(
@RowData varchar(MAX),
@SplitOn nvarchar(5)
)
RETURNS @RtnValue table
(
Id int identity(1,1),
Data nvarchar(1000)
)
AS
BEGIN
Declare @Cnt int
Set @Cnt = 1
While (Charindex(@SplitOn,@RowData)>0)
Begin
Insert Into @RtnValue (data)
Select
Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))
Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
Set @Cnt = @Cnt + 1
End
Insert Into @RtnValue (data)
Select Data = ltrim(rtrim(@RowData))
Return
END
然后我可以在使用交叉应用时得到我需要的东西
Select DISTINCT f.DerivedDataPointId
, f.DataPointId
,DerivedEval
from DerivedDataPoint d (readuncommitted)
Cross Apply dbo.GetDerivedDataPointsFromFormula(d.DerivedDataPointId, d.DerivedEval) f
也许这会帮助其他人寻找类似的方法。
根据您发布的示例数据,如果您要提取的数字是:
,则这是一个超级简单的问题八位数长
出现在括号的末尾
如果是这种情况,您只需要 NGrams8K 的副本,您可以用 3 行代码解决这个问题:
-- your sample data
DECLARE @string VARCHAR(8000) = 'if ((DateTime.Parse("[ST 35401900]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3, 1, 0)';
-- purely set-based solution using NGrams8K
SELECT ng.position, result = SUBSTRING(ng.token,1,8)
FROM samd.NGrams8k(@string,9) AS ng
WHERE CHARINDEX(']',ng.token,8) = 9;
Returns:
position result
---------- --------
26 35401900
60 35401903
78 35401900
95 35401903
我知道您不需要知道这些数字在字符串中的位置,但我还是将其包括在内以证明如果您需要的话它是多么容易。
更新于 2019 年 1 月 22 日(美国)基于以下评论中的问题
要处理您提取的数字 而不是 的长度始终相同的情况,您可以使用我的 patextract8k
函数(使用 NGrams8K):
CREATE FUNCTION samd.patExtract8K
(
@string VARCHAR(8000),
@pattern VARCHAR(50)
)
/*****************************************************************************************
[Description]:
This can be considered a T-SQL inline table valued function (iTVF) equivalent of
Microsoft's mdq.RegexExtract: except:
1. It includes each matching substring's position in the string
2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
instead of nvarchar(4000) for the pattern
3. The mask parameter is not required and therefore does not exist.
4. You have specify what text we're searching for as an exclusion; e.g. for numeric
characters you should search for '[^0-9]' instead of '[0-9]'.
5. There is is no parameter for naming a "capture group". Using the variable below, both
the following queries will return the same result:
DECLARE @string nvarchar(4000) = N'123 Main Street';
SELECT item FROM samd.patExtract8K(@string, '[^0-9]');
SELECT clr.RegexExtract(@string, N'(?<number>(\d+))(?<street>(.*))', N'number', 1);
Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
rows where [matched]=0. The key benefit of is that it performs substantially better
because you are only returning the number of rows required instead of returning twice as
many rows then filtering out half of them.
The following two sets of queries return the same result:
DECLARE @string varchar(100) = 'xx123xx555xx999';
BEGIN
-- QUERY #1
-- patExtract8K
SELECT ps.itemNumber, ps.item
FROM samd.patExtract8K(@string, '[^0-9]') ps;
-- patternSplitCM
SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item
FROM dbo.patternSplitCM(@string, '[^0-9]') ps
WHERE [matched] = 0;
-- QUERY #2
SELECT ps.itemNumber, ps.item
FROM samd.patExtract8K(@string, '[0-9]') ps;
SELECT itemNumber = row_number() over (order by itemNumber), item
FROM dbo.patternSplitCM(@string, '[0-9]')
WHERE [matched] = 0;
END;
[Compatibility]:
SQL Server 2008+
[Syntax]:
--===== Autonomous
SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
FROM samd.patExtract8K(@string,@pattern) pe;
--===== Against a table using APPLY
SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
FROM samd.SomeTable t
CROSS APPLY samd.patExtract8K(t.someString, @pattern) pe;
[Parameters]:
@string = varchar(8000); the input string
@searchString = varchar(50); pattern to search for
[Returns]:
itemNumber = bigint; the instance or ordinal position of the matched substring
itemIndex = bigint; the location of the matched substring inside the input string
itemLength = int; the length of the matched substring
item = varchar(8000); the returned text
[Developer Notes]:
1. Requires NGrams8k
2. patExtract8K does not return any rows on NULL or empty strings. Consider using
OUTER APPLY or append the function with the code below to force the function to return
a row on emply or NULL inputs:
UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;
3. patExtract8K is not case sensitive; use a case sensitive collation for
case-sensitive comparisons
4. patExtract8K is deterministic. For more about deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
5. patExtract8K performs substantially better with a parallel execution plan, often
2-3 times faster. For queries that leverage patextract8K that are not getting a
parallel exeution plan you should consider performance testing using Traceflag 8649
in Development environments and Adam Machanic's make_parallel in production.
[Examples]:
--===== (1) Basic extact all groups of numbers:
WITH temp(id, txt) as
(
SELECT * FROM (values
(1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
(2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
(3, '&**OOOOO=+ + + // ==?76543// and today !!222222\\tester{}))22222444 done'))t(x,xx)
)
SELECT
[temp.id] = t.id,
pe.itemNumber,
pe.itemIndex,
pe.itemLength,
pe.item
FROM temp AS t
CROSS APPLY samd.patExtract8K(t.txt, '[^0-9]') AS pe;
-----------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20170801 - Initial Development - Alan Burstein
Rev 01 - 20180619 - Complete re-write - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
itemIndex = f.position,
itemLength = itemLen.l,
item = SUBSTRING(f.token, 1, itemLen.l)
FROM
(
SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))
FROM samd.NGrams8k(@string, 1) AS ng
WHERE PATINDEX(@pattern, ng.token) < --<< this token does NOT match the pattern
ABS(SIGN(ng.position-1)-1) + --<< are you the first row? OR
PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row
) AS f(position, token)
CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),
DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);
使用 PatExtract8K,您可以轻松指定尺寸范围。例如,假设您的值可能是 7-9 位数字。你可以这样做:
-- your sample data
DECLARE @string VARCHAR(8000) = 'if ((DateTime.Parse("[ST 123456789]") < DateTime.Parse("[ST 35401903]")) and [35401900]=0 and [35401903]=3 and [ST 1234567]=x, 1, 0)';
-- Lower and upper bounds for the length of valid values
DECLARE @low INT = 7, @high INT = 9
SELECT
itemIndex = s.itemIndex,
itemLength = s.itemLength-1,
item = SUBSTRING(s.item,0,s.itemLength)
FROM samd.patExtract8K(REPLACE(@string,']',CHAR(1)),'[^0-9'+CHAR(1)+']') AS s
WHERE s.itemLength BETWEEN @low AND @high+1;
--AND SUBSTRING(s.item,0,s.itemLength) NOT LIKE '[^0-9]' <<< If required
Returns
itemIndex itemLength item
----------- ----------- ------------
26 9 123456789
61 8 35401903
79 8 35401900
96 8 35401903
116 7 1234567
几个注意事项:
我更新了示例数据以包含 7-9 位长的值
您必须将代码模式修改为 dbo(相对于 samd)或创建一个名为 samd 的模式才能使用此功能。