SQL 服务器使用正则表达式模式生成数据

SQL SERVER generate data using Regex pattern

我想通过 SQL Server 中给定的正则表达式模式生成数据。有没有可能做?说,我有如下模式,我想生成如下数据:

这个概念背后的想法是 SQL STATIC DATA MASKING (which was removed in current feature)。我们的客户想要屏蔽测试数据库中的生产数据。我们现在没有 SQL 带有 sql 的 STATIC DATA MASKING 功能,但是我们有模式来屏蔽列,所以我想的是,使用这些模式我们可以 运行 更新查询.

SELECT "(\d){7}" AS RandonNumber, "(\W){5}" AS RandomString FROM tbl


  |  RandonNumber | RandomString |
  |  7894562      | AHJIL        |
  |  9632587      | ZLOKP        |
  |  4561238      | UJIOK        |

除了这个常规模式外,我还有一些自定义模式,如 Test_Product_(\d){1,4},其结果应如下所示:



Other Patterns                Samples
(\l){30}                      ahukoklijfahukokponmahukoahuko
(\d){7}                       7895623
(\W){5}                       ABCDEF
Test_Product_(\d){1,4}        Test_Product_007
0\.(\d){2}                    0.59
https://www\.(\l){10}\.com    https://www.anything.com

我不认为您为此需要正则表达式。为什么不直接使用 "scrub script" 并利用 newid() 函数生成一堆随机数据。看来您无论如何都需要编写这样的脚本,无论是否使用 Regex,而且这样做的好处是非常简单。


create table tbl (PersonalId int, Name varchar(max))

insert into tbl select 300300, 'Michael'
insert into tbl select 554455, 'Tim'
insert into tbl select 228899, 'John'

select * from tbl


update tbl set PersonalId = cast(rand(checksum(newid())) * 1000000 as int)
update tbl set Name = left(convert(varchar(255), newid()), 6)

select * from tbl

好吧,我可以给你一个解决方案,它不是基于正则表达式,而是基于一组参数 - 但它包含你所有要求的完整集合。
我将此解决方案基于我编写的用于生成随机字符串的用户定义函数 (You can read my blog post about it here) - 我刚刚对其进行了更改,以便它可以根据以下条件生成您想要的掩码:

  • 掩码有可选前缀。
  • 掩码有一个可选的后缀。
  • 掩码有一个可变长度的随机字符串。
  • 随机字符串可以包含小写字母、大写字母、数字或以上的任意组合。


(\d){7}                       7895623
(\W){5}                       ABCDEF
Test_Product_(\d){1,4}        Test_Product_007
0\.(\d){2}                    0.59
https://www\.(\l){10}\.com    https://www.anything.com

由于我使用的是用户定义的函数,我无法在其中使用 NewId() 内置函数 - 因此我们首先需要创建一个视图来为我们生成 guid:

CREATE VIEW GuidGenerator
    SELECT Newid() As NewGuid;

在函数中,我们将使用该视图生成一个 NewID() 作为所有随机性的基础。


CREATE FUNCTION dbo.MaskGenerator
    -- use null or an empty string for no prefix
    @Prefix nvarchar(4000), 
    -- use null or an empty string for no suffix
    @suffix nvarchar(4000), 
    -- the minimum length of the random part
    @MinLength int, 
    -- the maximum length of the random part
    @MaxLength int, 
    -- the maximum number of rows to return. Note: up to 1,000,000 rows
    @Count int, 
    -- 1, 2 and 4 stands for lower-case, upper-case and digits. 
    -- a bitwise combination of these values can be used to generate all possible combinations:
    -- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
    @CharType tinyint 
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)),   -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000

        n As Number, 
        CONCAT(@Prefix, (
        SELECT  TOP (Length) 
                -- choose what char combination to use for the random part
                CASE @CharType 
                    WHEN 1 THEN Lower
                    WHEN 2 THEN Upper
                    WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
                    WHEN 4 THEN Digit
                    WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
                    WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
                    WHEN 7 THEN 
                        CASE Rnd % 3
                            WHEN 0 THEN Lower
                            WHEN 1 THEN Upper
                            ELSE Digit
        FROM Tally As t0  
        -- create a random number from the guid using the GuidGenerator view
        CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
            -- generate a random lower-case char, upper-case char and digit
            SELECT  CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
                    CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
                    CHAR(48 + Rnd % 10) As Digit -- Random digit
        ) As Chars
        WHERE  t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
        FOR XML PATH('') 
        ), @Suffix) As RandomString
FROM Tally As t1
    -- Select a random length between @MinLength and @MaxLength (inclusive)
    SELECT TOP 1 n As Length
    FROM Tally As t2
    CROSS JOIN GuidGenerator 
    WHERE t2.n >= @MinLength
    AND t2.n <= @MaxLength
    AND t2.n <> t1.n
    ORDER BY NewGuid
) As Lengths;


(\l){30} - ahukoklijfahukokponmahukoahuko

SELECT RandomString FROM dbo.MaskGenerator(null, null, 30, 30, 2, 1); 


1, eyrutkzdugogyhxutcmcmplvzofser
2, juuyvtzsvmmcdkngnzipvsepviepsp

(\d){7} - 7895623

SELECT RandomString FROM dbo.MaskGenerator(null, null, 7, 7, 2, 4); 


1, 8744412
2, 2275313

(\W){5} - ABCDE

SELECT RandomString FROM dbo.MaskGenerator(null, null, 5, 5, 2, 2); 



Test_Product_(\d){1,4} - Test_Product_007

SELECT RandomString FROM dbo.MaskGenerator('Test_Product_', null, 1, 4, 2, 4); 


1, Test_Product_933
2, Test_Product_7

0\.(\d){2} - 0.59

SELECT RandomString FROM dbo.MaskGenerator('0.', null, 2, 2, 2, 4); 


1, 0.68
2, 0.70

https://www\.(\l){10}\.com - https://www.anything.com

SELECT RandomString FROM dbo.MaskGenerator('https://www.', '.com', 10, 10, 2, 1); 


1, https://www.xayvkmkuci.com
2, https://www.asbfcvomax.com       

以下是如何使用它来屏蔽 table 的内容:

DECLARE @Count int = 10; 

SELECT  CAST(IntVal.RandomString As Int) As IntColumn, 
        UpVal.RandomString as UpperCaseValue, 
        LowVal.RandomString as LowerCaseValue, 
        MixVal.RandomString as MixedValue,
        WithPrefix.RandomString As PrefixedValue
FROM dbo.MaskGenerator(null, null, 3, 7, @Count, 4) As IntVal
JOIN dbo.MaskGenerator(null, null, 10, 10, @Count, 1) As LowVal
    ON IntVal.Number = LowVal.Number
JOIN dbo.MaskGenerator(null, null, 5, 10, @Count, 2) As UpVal
    ON IntVal.Number = UpVal.Number
JOIN dbo.MaskGenerator(null, null, 10, 20, @Count, 7) As MixVal
    ON IntVal.Number = MixVal.Number
JOIN dbo.MaskGenerator('Test ', null, 1, 4, @Count, 4) As WithPrefix
    ON IntVal.Number = WithPrefix.Number


IntColumn   UpperCaseValue  LowerCaseValue  MixedValue              PrefixedValue
674         CCNVSDI         esjyyesesv      O2FAC7bfwg2Be5a91Q0     Test 4935
30732       UJKSL           jktisddbnq      7o8B91Sg1qrIZSvG3AcL    Test 0
4669472     HDLJNBWPJ       qgtfkjdyku      xUoLAZ4pAnpn            Test 8
26347       DNAKERR         vlehbnampb      NBv08yJdKb75ybhaFqED    Test 91
6084965     LJPMZMEU        ccigzyfwnf      MPxQ2t8jjmv0IT45yVcR    Test 4
6619851     FEHKGHTUW       wswuefehsp      40n7Ttg7H5YtVPF         Test 848
781         LRWKVDUV        bywoxqizju      UxIp2O4Jb82Ts           Test 6268
52237       XXNPBL          beqxrgstdo      Uf9j7tCB4W2             Test 43
876150      ZDRABW          fvvinypvqa      uo8zfRx07s6d0EP         Test 7

请注意,这是一个快速的过程 - 在我进行的测试中,生成 5 列的 1000 行平均花费了不到半秒的时间。