将 T-SQL 语句转换为 JavaScript 正则表达式时有哪些问题
What are the gotchas when converting a T-SQL statement into a JavaScript RegExp
我从我管理的服务器上记录了大量 T-SQL 语句。我试图将它们归结为每个实例。
这是其中之一:
SELECT TBLLANGUAGE.NAME AS NAME1, TBLLANGUAGE_1.NAME AS NAME2,
TBLLANGUAGELANGUAGE.LNGFKCHILD, TBLLANGUAGELANGUAGE.LNGFKPARENT,
TBLLANGUAGELANGUAGE.STYLE, TBLLANGUAGELANGUAGE.EXTENT,
TBLLANGUAGELANGUAGE.NATURE, TBLSOURCE.TXTTITLE, TBLSOURCE_1.TXTTITLE AS
SURTITLE FROM ((((TBLLANGUAGE LEFT JOIN TBLLANGUAGELANGUAGE ON
TBLLANGUAGE.ID = TBLLANGUAGELANGUAGE.LNGFKPARENT) LEFT JOIN TBLLANGUAGE
AS TBLLANGUAGE_1 ON TBLLANGUAGELANGUAGE.LNGFKCHILD = TBLLANGUAGE_1.ID)
LEFT JOIN TBLLANGLANGSOURCE ON TBLLANGUAGELANGUAGE.IDLANGLINK =
TBLLANGLANGSOURCE.LNGFKLANGLINK) LEFT JOIN TBLSOURCE ON
TBLLANGLANGSOURCE.LNGFKSOURCE = TBLSOURCE.IDSOURCE) LEFT JOIN TBLSOURCE
AS TBLSOURCE_1 ON TBLSOURCE.LNGPARTOF = TBLSOURCE_1.IDSOURCE WHERE
(((TBLLANGUAGELANGUAGE.LNGFKPARENT) = 8687)) OR
(((TBLLANGUAGELANGUAGE.LNGFKCHILD) = 8687)) ORDER BY
IIF(TBLLANGUAGELANGUAGE.LNGFKPARENT = 8687,'B','A'), TBLLANGUAGE.NAME,
TBLLANGUAGE_1.NAME;
我想将其转换为 JavaScript 正则表达式,用连续的数字代替 \d
并将撇号之间的内容替换为 '.*'
.
到目前为止,我对 Deno 的了解已经达到:
function getPattern(text: string): string {
text = text.replace(/\(/g, "\x28")
.replace(/\)/g, "\x29")
.replace(/$/g, "\x24")
.replace(/\^/g, "\x5e")
.replace(/\./g, "\x2e")
.replace(/\*/g, "\x2a")
.replace(/\[/g, "\x5b")
.replace(/\]/g, "\x5d")
.replace(/\?/g, "\x3f");
[ "\<\s\>", "\<", "\<=", "=", "\>=", "\>"].forEach((op) => {
const numberPattern = new RegExp(`\s${op}\s(\d+)`, "g");
text.match(numberPattern)?.forEach((e) => {
text = text.replace(e, ` ${op} \d+`);
});
});
//const textPattern = /'[^']*'\s/g;
const textPattern = /\s*'.*'\s*/g;
text.match(textPattern)?.forEach((e) => {
//const eLength = e.length;
text = text.replace(e, "\s*'.*'\s*");
});
return text; //.replace(/\</g, "\x3c")
//.replace(/\>/g, "\x3e");
}
这将上述语句呈现为
SELECT TBLLANGUAGE\x2eNAME AS NAME1, TBLLANGUAGE_1\x2eNAME AS NAME2,
TBLLANGUAGELANGUAGE\x2eLNGFKCHILD, TBLLANGUAGELANGUAGE\x2eLNGFKPARENT,
TBLLANGUAGELANGUAGE\x2eSTYLE, TBLLANGUAGELANGUAGE\x2eEXTENT,
TBLLANGUAGELANGUAGE\x2eNATURE, TBLSOURCE\x2eTXTTITLE,
TBLSOURCE_1\x2eTXTTITLE AS SURTITLE FROM \x28\x28\x28\x28TBLLANGUAGE
LEFT JOIN TBLLANGUAGELANGUAGE ON TBLLANGUAGE\x2eID =
TBLLANGUAGELANGUAGE\x2eLNGFKPARENT\x29 LEFT JOIN TBLLANGUAGE AS
TBLLANGUAGE_1 ON TBLLANGUAGELANGUAGE\x2eLNGFKCHILD =
TBLLANGUAGE_1\x2eID\x29 LEFT JOIN TBLLANGLANGSOURCE ON
TBLLANGUAGELANGUAGE\x2eIDLANGLINK =
TBLLANGLANGSOURCE\x2eLNGFKLANGLINK\x29 LEFT JOIN TBLSOURCE ON
TBLLANGLANGSOURCE\x2eLNGFKSOURCE = TBLSOURCE\x2eIDSOURCE\x29 LEFT JOIN
TBLSOURCE AS TBLSOURCE_1 ON TBLSOURCE\x2eLNGPARTOF =
TBLSOURCE_1\x2eIDSOURCE WHERE
\x28\x28\x28TBLLANGUAGELANGUAGE\x2eLNGFKPARENT\x29 = \d+\x29\x29 OR
\x28\x28\x28TBLLANGUAGELANGUAGE\x2eLNGFKCHILD\x29 = \d+\x29\x29 ORDER
BY IIF\x28TBLLANGUAGELANGUAGE\x2eLNGFKPARENT = \d+,\s*'.*'\s*\x29,
TBLLANGUAGE\x2eNAME, TBLLANGUAGE_1\x2eNAME;
我正在将各种组件转换为它们的 \xnn
形式,因为我阅读文档的方式显然 new RegExp()
不够聪明,无法看到嵌入式 (
和不要以为我在正则表达式中定义了一个组。也就是说,仅仅说
似乎还不够
const pattern = new RegExp("SELECT TBLLANGUAGE.NAME (etcetera)","gi");
我是不是看错了文档,还有更好的方法吗?不,除非有非常非常好的理由,否则我不想编写 T-SQL 解析器。
稍后
我基本上已经解决了我的问题,并且使用了不同的模式匹配方法。请参阅 Extracting example SQL statements from a log up on DEV。
我不完全理解您要实现的目标,但如果是:
convert this SQL statement into a valid regex which can find other SQL like it
然后这样做:
var sql = `SELECT TBLLANGUAGE.NAME AS NAME1, TBLLANGUAGE_1.NAME AS NAME2,
TBLLANGUAGELANGUAGE.LNGFKCHILD, TBLLANGUAGELANGUAGE.LNGFKPARENT,
TBLLANGUAGELANGUAGE.STYLE, TBLLANGUAGELANGUAGE.EXTENT,
TBLLANGUAGELANGUAGE.NATURE, TBLSOURCE.TXTTITLE, TBLSOURCE_1.TXTTITLE AS
SURTITLE FROM ((((TBLLANGUAGE LEFT JOIN TBLLANGUAGELANGUAGE ON
TBLLANGUAGE.ID = TBLLANGUAGELANGUAGE.LNGFKPARENT) LEFT JOIN TBLLANGUAGE
AS TBLLANGUAGE_1 ON TBLLANGUAGELANGUAGE.LNGFKCHILD = TBLLANGUAGE_1.ID)
LEFT JOIN TBLLANGLANGSOURCE ON TBLLANGUAGELANGUAGE.IDLANGLINK =
TBLLANGLANGSOURCE.LNGFKLANGLINK) LEFT JOIN TBLSOURCE ON
TBLLANGLANGSOURCE.LNGFKSOURCE = TBLSOURCE.IDSOURCE) LEFT JOIN TBLSOURCE
AS TBLSOURCE_1 ON TBLSOURCE.LNGPARTOF = TBLSOURCE_1.IDSOURCE WHERE
(((TBLLANGUAGELANGUAGE.LNGFKPARENT) = 8687)) OR
(((TBLLANGUAGELANGUAGE.LNGFKCHILD) = 8687)) ORDER BY
IIF(TBLLANGUAGELANGUAGE.LNGFKPARENT = 8687,'B','A'), TBLLANGUAGE.NAME,
TBLLANGUAGE_1.NAME;`;
// First replace: account for JS regex special chars and escape with backslash to make them literal
// Second replace: get everything between single quotes and make it .+?
// Third replace: get all digit sequences and make them \d+
// Fourth replace: get all whitespace sequences and make them \s+
var sql_regex = sql.replace( /[.*+?^${}()|[\]\]/g, '\$&' )
.replace( /('.+?')/g, '\'.+?\'' )
.replace( /\d+/g, '\d+' )
.replace( /\s+/g, '\s+' );
console.log( sql_regex );
// Test if our regex matches the string it was built from
console.log( new RegExp( sql_regex, 'g' ).test( sql ) );
sql_regex
的值:
SELECT\s+TBLLANGUAGE\.NAME\s+AS\s+NAME\d+,\s+TBLLANGUAGE_\d+\.NAME
\s+AS\s+NAME\d+,\s+TBLLANGUAGELANGUAGE\.LNGFKCHILD,
\s+TBLLANGUAGELANGUAGE\.LNGFKPARENT,\s+TBLLANGUAGELANGUAGE\.STYLE,
\s+TBLLANGUAGELANGUAGE\.EXTENT,\s+TBLLANGUAGELANGUAGE\.NATURE,
\s+TBLSOURCE\.TXTTITLE,\s+TBLSOURCE_\d+\.TXTTITLE\s+AS\s+SURTITLE
\s+FROM\s+\(\(\(\(TBLLANGUAGE\s+LEFT\s+JOIN\s+TBLLANGUAGELANGUAGE\s+ON
\s+TBLLANGUAGE\.ID\s+=\s+TBLLANGUAGELANGUAGE\.LNGFKPARENT\)\s+LEFT
\s+JOIN\s+TBLLANGUAGE\s+AS\s+TBLLANGUAGE_\d+\s+ON
\s+TBLLANGUAGELANGUAGE\.LNGFKCHILD\s+=\s+TBLLANGUAGE_\d+\.ID\)\s+LEFT
\s+JOIN\s+TBLLANGLANGSOURCE\s+ON\s+TBLLANGUAGELANGUAGE\.IDLANGLINK\s+=
\s+TBLLANGLANGSOURCE\.LNGFKLANGLINK\)\s+LEFT\s+JOIN\s+TBLSOURCE\s+ON
\s+TBLLANGLANGSOURCE\.LNGFKSOURCE\s+=\s+TBLSOURCE\.IDSOURCE\)\s+LEFT
\s+JOIN\s+TBLSOURCE\s+AS\s+TBLSOURCE_\d+\s+ON\s+TBLSOURCE\.LNGPARTOF
\s+=\s+TBLSOURCE_\d+\.IDSOURCE\s+WHERE
\s+\(\(\(TBLLANGUAGELANGUAGE\.LNGFKPARENT\)\s+=\s+\d+\)\)\s+OR
\s+\(\(\(TBLLANGUAGELANGUAGE\.LNGFKCHILD\)\s+=\s+\d+\)\)\s+ORDER\s+BY
\s+IIF\(TBLLANGUAGELANGUAGE\.LNGFKPARENT\s+=\s+\d+,'.+?','.+?'\),
\s+TBLLANGUAGE\.NAME,\s+TBLLANGUAGE_\d+\.NAME;
注意:新行是表面的,只是为了便于阅读而添加的
我从我管理的服务器上记录了大量 T-SQL 语句。我试图将它们归结为每个实例。
这是其中之一:
SELECT TBLLANGUAGE.NAME AS NAME1, TBLLANGUAGE_1.NAME AS NAME2,
TBLLANGUAGELANGUAGE.LNGFKCHILD, TBLLANGUAGELANGUAGE.LNGFKPARENT,
TBLLANGUAGELANGUAGE.STYLE, TBLLANGUAGELANGUAGE.EXTENT,
TBLLANGUAGELANGUAGE.NATURE, TBLSOURCE.TXTTITLE, TBLSOURCE_1.TXTTITLE AS
SURTITLE FROM ((((TBLLANGUAGE LEFT JOIN TBLLANGUAGELANGUAGE ON
TBLLANGUAGE.ID = TBLLANGUAGELANGUAGE.LNGFKPARENT) LEFT JOIN TBLLANGUAGE
AS TBLLANGUAGE_1 ON TBLLANGUAGELANGUAGE.LNGFKCHILD = TBLLANGUAGE_1.ID)
LEFT JOIN TBLLANGLANGSOURCE ON TBLLANGUAGELANGUAGE.IDLANGLINK =
TBLLANGLANGSOURCE.LNGFKLANGLINK) LEFT JOIN TBLSOURCE ON
TBLLANGLANGSOURCE.LNGFKSOURCE = TBLSOURCE.IDSOURCE) LEFT JOIN TBLSOURCE
AS TBLSOURCE_1 ON TBLSOURCE.LNGPARTOF = TBLSOURCE_1.IDSOURCE WHERE
(((TBLLANGUAGELANGUAGE.LNGFKPARENT) = 8687)) OR
(((TBLLANGUAGELANGUAGE.LNGFKCHILD) = 8687)) ORDER BY
IIF(TBLLANGUAGELANGUAGE.LNGFKPARENT = 8687,'B','A'), TBLLANGUAGE.NAME,
TBLLANGUAGE_1.NAME;
我想将其转换为 JavaScript 正则表达式,用连续的数字代替 \d
并将撇号之间的内容替换为 '.*'
.
到目前为止,我对 Deno 的了解已经达到:
function getPattern(text: string): string {
text = text.replace(/\(/g, "\x28")
.replace(/\)/g, "\x29")
.replace(/$/g, "\x24")
.replace(/\^/g, "\x5e")
.replace(/\./g, "\x2e")
.replace(/\*/g, "\x2a")
.replace(/\[/g, "\x5b")
.replace(/\]/g, "\x5d")
.replace(/\?/g, "\x3f");
[ "\<\s\>", "\<", "\<=", "=", "\>=", "\>"].forEach((op) => {
const numberPattern = new RegExp(`\s${op}\s(\d+)`, "g");
text.match(numberPattern)?.forEach((e) => {
text = text.replace(e, ` ${op} \d+`);
});
});
//const textPattern = /'[^']*'\s/g;
const textPattern = /\s*'.*'\s*/g;
text.match(textPattern)?.forEach((e) => {
//const eLength = e.length;
text = text.replace(e, "\s*'.*'\s*");
});
return text; //.replace(/\</g, "\x3c")
//.replace(/\>/g, "\x3e");
}
这将上述语句呈现为
SELECT TBLLANGUAGE\x2eNAME AS NAME1, TBLLANGUAGE_1\x2eNAME AS NAME2,
TBLLANGUAGELANGUAGE\x2eLNGFKCHILD, TBLLANGUAGELANGUAGE\x2eLNGFKPARENT,
TBLLANGUAGELANGUAGE\x2eSTYLE, TBLLANGUAGELANGUAGE\x2eEXTENT,
TBLLANGUAGELANGUAGE\x2eNATURE, TBLSOURCE\x2eTXTTITLE,
TBLSOURCE_1\x2eTXTTITLE AS SURTITLE FROM \x28\x28\x28\x28TBLLANGUAGE
LEFT JOIN TBLLANGUAGELANGUAGE ON TBLLANGUAGE\x2eID =
TBLLANGUAGELANGUAGE\x2eLNGFKPARENT\x29 LEFT JOIN TBLLANGUAGE AS
TBLLANGUAGE_1 ON TBLLANGUAGELANGUAGE\x2eLNGFKCHILD =
TBLLANGUAGE_1\x2eID\x29 LEFT JOIN TBLLANGLANGSOURCE ON
TBLLANGUAGELANGUAGE\x2eIDLANGLINK =
TBLLANGLANGSOURCE\x2eLNGFKLANGLINK\x29 LEFT JOIN TBLSOURCE ON
TBLLANGLANGSOURCE\x2eLNGFKSOURCE = TBLSOURCE\x2eIDSOURCE\x29 LEFT JOIN
TBLSOURCE AS TBLSOURCE_1 ON TBLSOURCE\x2eLNGPARTOF =
TBLSOURCE_1\x2eIDSOURCE WHERE
\x28\x28\x28TBLLANGUAGELANGUAGE\x2eLNGFKPARENT\x29 = \d+\x29\x29 OR
\x28\x28\x28TBLLANGUAGELANGUAGE\x2eLNGFKCHILD\x29 = \d+\x29\x29 ORDER
BY IIF\x28TBLLANGUAGELANGUAGE\x2eLNGFKPARENT = \d+,\s*'.*'\s*\x29,
TBLLANGUAGE\x2eNAME, TBLLANGUAGE_1\x2eNAME;
我正在将各种组件转换为它们的 \xnn
形式,因为我阅读文档的方式显然 new RegExp()
不够聪明,无法看到嵌入式 (
和不要以为我在正则表达式中定义了一个组。也就是说,仅仅说
const pattern = new RegExp("SELECT TBLLANGUAGE.NAME (etcetera)","gi");
我是不是看错了文档,还有更好的方法吗?不,除非有非常非常好的理由,否则我不想编写 T-SQL 解析器。
稍后
我基本上已经解决了我的问题,并且使用了不同的模式匹配方法。请参阅 Extracting example SQL statements from a log up on DEV。
我不完全理解您要实现的目标,但如果是:
convert this SQL statement into a valid regex which can find other SQL like it
然后这样做:
var sql = `SELECT TBLLANGUAGE.NAME AS NAME1, TBLLANGUAGE_1.NAME AS NAME2,
TBLLANGUAGELANGUAGE.LNGFKCHILD, TBLLANGUAGELANGUAGE.LNGFKPARENT,
TBLLANGUAGELANGUAGE.STYLE, TBLLANGUAGELANGUAGE.EXTENT,
TBLLANGUAGELANGUAGE.NATURE, TBLSOURCE.TXTTITLE, TBLSOURCE_1.TXTTITLE AS
SURTITLE FROM ((((TBLLANGUAGE LEFT JOIN TBLLANGUAGELANGUAGE ON
TBLLANGUAGE.ID = TBLLANGUAGELANGUAGE.LNGFKPARENT) LEFT JOIN TBLLANGUAGE
AS TBLLANGUAGE_1 ON TBLLANGUAGELANGUAGE.LNGFKCHILD = TBLLANGUAGE_1.ID)
LEFT JOIN TBLLANGLANGSOURCE ON TBLLANGUAGELANGUAGE.IDLANGLINK =
TBLLANGLANGSOURCE.LNGFKLANGLINK) LEFT JOIN TBLSOURCE ON
TBLLANGLANGSOURCE.LNGFKSOURCE = TBLSOURCE.IDSOURCE) LEFT JOIN TBLSOURCE
AS TBLSOURCE_1 ON TBLSOURCE.LNGPARTOF = TBLSOURCE_1.IDSOURCE WHERE
(((TBLLANGUAGELANGUAGE.LNGFKPARENT) = 8687)) OR
(((TBLLANGUAGELANGUAGE.LNGFKCHILD) = 8687)) ORDER BY
IIF(TBLLANGUAGELANGUAGE.LNGFKPARENT = 8687,'B','A'), TBLLANGUAGE.NAME,
TBLLANGUAGE_1.NAME;`;
// First replace: account for JS regex special chars and escape with backslash to make them literal
// Second replace: get everything between single quotes and make it .+?
// Third replace: get all digit sequences and make them \d+
// Fourth replace: get all whitespace sequences and make them \s+
var sql_regex = sql.replace( /[.*+?^${}()|[\]\]/g, '\$&' )
.replace( /('.+?')/g, '\'.+?\'' )
.replace( /\d+/g, '\d+' )
.replace( /\s+/g, '\s+' );
console.log( sql_regex );
// Test if our regex matches the string it was built from
console.log( new RegExp( sql_regex, 'g' ).test( sql ) );
sql_regex
的值:
SELECT\s+TBLLANGUAGE\.NAME\s+AS\s+NAME\d+,\s+TBLLANGUAGE_\d+\.NAME
\s+AS\s+NAME\d+,\s+TBLLANGUAGELANGUAGE\.LNGFKCHILD,
\s+TBLLANGUAGELANGUAGE\.LNGFKPARENT,\s+TBLLANGUAGELANGUAGE\.STYLE,
\s+TBLLANGUAGELANGUAGE\.EXTENT,\s+TBLLANGUAGELANGUAGE\.NATURE,
\s+TBLSOURCE\.TXTTITLE,\s+TBLSOURCE_\d+\.TXTTITLE\s+AS\s+SURTITLE
\s+FROM\s+\(\(\(\(TBLLANGUAGE\s+LEFT\s+JOIN\s+TBLLANGUAGELANGUAGE\s+ON
\s+TBLLANGUAGE\.ID\s+=\s+TBLLANGUAGELANGUAGE\.LNGFKPARENT\)\s+LEFT
\s+JOIN\s+TBLLANGUAGE\s+AS\s+TBLLANGUAGE_\d+\s+ON
\s+TBLLANGUAGELANGUAGE\.LNGFKCHILD\s+=\s+TBLLANGUAGE_\d+\.ID\)\s+LEFT
\s+JOIN\s+TBLLANGLANGSOURCE\s+ON\s+TBLLANGUAGELANGUAGE\.IDLANGLINK\s+=
\s+TBLLANGLANGSOURCE\.LNGFKLANGLINK\)\s+LEFT\s+JOIN\s+TBLSOURCE\s+ON
\s+TBLLANGLANGSOURCE\.LNGFKSOURCE\s+=\s+TBLSOURCE\.IDSOURCE\)\s+LEFT
\s+JOIN\s+TBLSOURCE\s+AS\s+TBLSOURCE_\d+\s+ON\s+TBLSOURCE\.LNGPARTOF
\s+=\s+TBLSOURCE_\d+\.IDSOURCE\s+WHERE
\s+\(\(\(TBLLANGUAGELANGUAGE\.LNGFKPARENT\)\s+=\s+\d+\)\)\s+OR
\s+\(\(\(TBLLANGUAGELANGUAGE\.LNGFKCHILD\)\s+=\s+\d+\)\)\s+ORDER\s+BY
\s+IIF\(TBLLANGUAGELANGUAGE\.LNGFKPARENT\s+=\s+\d+,'.+?','.+?'\),
\s+TBLLANGUAGE\.NAME,\s+TBLLANGUAGE_\d+\.NAME;
注意:新行是表面的,只是为了便于阅读而添加的