根据长度将列值拆分为单独的列
Split column value into separate columns based on length
我在一列中有多个以逗号分隔的值,最大为 20000 个字符,我想将该列拆分为一列,但它基于字符值 2000(比如进入一个新列需要 2000字符,如果长度大于 2000,则它将像这样位于第二列中。
当它的逗号分隔值进入第一个新列时,它应该有意义,就像它应该基于 ,
一样,最多 2000 个字符。
我只完成了从行级值到列级的操作,但它应该是 2000 个字符并且基于逗号
你能帮我解决这个问题吗?
siddesh,虽然这个问题缺少所有内容我想指出一些事情并帮助你(因为你是一个没有经验的 SO 用户):
首先,我设置了一个最小的可重现示例。下次就交给你了。
我将从声明的 table 开始,插入一些行。
我们在 SO 上可以将其复制'n'粘贴到我们的环境中,这样可以轻松回答。
DECLARE @tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO @tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--这是您实际需要的:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
结果应该存储在物理table中,其中fkID
指向原始行的ID,fragmentPosition
存储顺序。 fkID
和 fragmentPosition
应该是组合的唯一键。
如果你真的想做,你在你的问题中提出的建议(不推荐!)你可以尝试一下:
DECLARE @maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>@maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>@maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>@maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>@maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
结果
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
简而言之:
- 第一个 cte 进行拆分(如上)
- 递归 cte 将遍历字符串并施展魔法。
- 最后的
SELECT
使用了 TOP 1 WITH TIES
和 ORDER BY ROW_NUMBER() OVER(...)
的 hack。这将 return 最高 中间结果。
提示:不要这样做...
更新
纯属娱乐:
你可以用这个
替换最后的SELECT
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
这将 return 正是您所要求的。
但是-如前所述-这违反了最佳实践...
的所有规则
我在一列中有多个以逗号分隔的值,最大为 20000 个字符,我想将该列拆分为一列,但它基于字符值 2000(比如进入一个新列需要 2000字符,如果长度大于 2000,则它将像这样位于第二列中。
当它的逗号分隔值进入第一个新列时,它应该有意义,就像它应该基于 ,
一样,最多 2000 个字符。
我只完成了从行级值到列级的操作,但它应该是 2000 个字符并且基于逗号
你能帮我解决这个问题吗?
siddesh,虽然这个问题缺少所有内容我想指出一些事情并帮助你(因为你是一个没有经验的 SO 用户):
首先,我设置了一个最小的可重现示例。下次就交给你了。
我将从声明的 table 开始,插入一些行。
我们在 SO 上可以将其复制'n'粘贴到我们的环境中,这样可以轻松回答。
DECLARE @tbl TABLE(ID INT IDENTITY, YourCSVString VARCHAR(MAX));
INSERT INTO @tbl VALUES('1 this is long text, 2 some second fragment, 3 third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ')
,('1 this is other long text, 2 some second fragment to show that this works with tabular data, 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf ');
--这是您实际需要的:
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength);
结果应该存储在物理table中,其中fkID
指向原始行的ID,fragmentPosition
存储顺序。 fkID
和 fragmentPosition
应该是组合的唯一键。
如果你真的想做,你在你的问题中提出的建议(不推荐!)你可以尝试一下:
DECLARE @maxPerColumn INT=75; --You can set the portion's max size, in your case 2000.
WITH cte AS
(
SELECT fkID = t.ID
,B.fragmentPosition
,B.fragmentContent
,C.framgentLength
FROM @tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVString,N',','","'),'"]')) A
CROSS APPLY(VALUES(A.[key],TRIM(A.[value]))) B(fragmentPosition,fragmentContent)
CROSS APPLY(VALUES(LEN(B.fragmentContent))) C(framgentLength)
)
,recCTE AS
(
SELECT *
,countPerColumn = 1
,columnCounter = 1
,sumLength = LEN(fragmentContent)
,growingString = CAST(fragmentContent AS NVARCHAR(MAX))
FROM cte WHERE fragmentPosition=0
UNION ALL
SELECT r.fkID
,cte.fragmentPosition
,cte.fragmentContent
,cte.framgentLength
,CASE WHEN A.newSumLength>@maxPerColumn THEN 1 ELSE r.countPerColumn + 1 END
,r.columnCounter + CASE WHEN A.newSumLength>@maxPerColumn THEN 1 ELSE 0 END
,CASE WHEN A.newSumLength>@maxPerColumn THEN LEN(cte.fragmentContent) ELSE newSumLength END
,CASE WHEN A.newSumLength>@maxPerColumn THEN cte.fragmentContent ELSE CONCAT(r.growingString,N', ',cte.fragmentContent) END
FROM cte
INNER JOIN recCTE r ON r.fkID=cte.fkID AND r.fragmentPosition+1=cte.fragmentPosition
CROSS APPLY(VALUES(r.sumLength+LEN(cte.fragmentContent))) A(newSumLength)
)
SELECT TOP 1 WITH TIES
fkID
,growingString
,LEN(growingString)
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC );
结果
fkID pos Content
1 2 1 this is long text, 2 some second fragment, 3 third fragment
1 4 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k, 5 halksjfh asdkf
2 0 1 this is other long text
2 1 2 some second fragment to show that this works with tabular data
2 3 3 again a third fragment, 4 adfjksdahfljsadhfjhadlfhasdjks alsdjfsadhf k
2 4 5 halksjfh asdkf
简而言之:
- 第一个 cte 进行拆分(如上)
- 递归 cte 将遍历字符串并施展魔法。
- 最后的
SELECT
使用了TOP 1 WITH TIES
和ORDER BY ROW_NUMBER() OVER(...)
的 hack。这将 return 最高 中间结果。
提示:不要这样做...
更新
纯属娱乐:
你可以用这个
替换最后的SELECT
,getPortions AS
(
SELECT TOP 1 WITH TIES
fkID
,fragmentPosition
,growingString
,LEN(growingString) portionLength
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY fkID,columnCounter ORDER BY countPerColumn DESC )
)
SELECT p.*
FROM
(
SELECT fkID
,CONCAT(N'col',ROW_NUMBER() OVER(PARTITION BY fkID ORDER BY fragmentPosition)) AS ColumnName
,growingString
FROM getPortions
) t
PIVOT(MAX(growingString) FOR ColumnName IN(col1,col2,col3,col4,col5)) p;
这将 return 正是您所要求的。
但是-如前所述-这违反了最佳实践...
的所有规则