正则表达式，如何替换保留其长度的文本的一部分

Question

我在数据库中有序列化 PHP 字符串的记录，如果有的话，我必须混淆电子邮件。最简单的记录就像 {s:20:"pika.chu@pokemon.com"}。它基本上是在说：这是一个长度为 20 的字符串，即 pika.chu@pokemon.com。这个字段可以是千字节长，有很多电子邮件（或 none），有时它是空的。

我希望我可以使用 SQL 正则表达式函数来混淆电子邮件的用户部分，同时保留字符串的长度，以免破坏 PHP 序列化。上面的示例电子邮件应转换为 {s:20:"xxxxxxxx@pokemon.com"}，其中 x 的数量与 pika.chu.

的长度匹配

有什么想法吗？

这是一个更完整的例子，可以找到序列化的内容 PHP:

a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"john@something.com";s:7:"authors";a:2:{i:0;s:21:"william@something.com";i:1;s:19:"debbie@software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}

Answer 1

我尝试使用本机函数来做到这一点，但它不起作用，因为 REGEXP_REPLACE 等函数不允许您操纵匹配以获得它的大小，例如。

相反，我创建了一个 UDF 来执行此操作：

CREATE TEMP FUNCTION hideEmail(str STRING)
  RETURNS STRING
  LANGUAGE js AS """
  return str
          .replace(/([a-zA-Z.0-9_\+-:]*)@/g, function(txt){return '*'.repeat(txt.length-1)+"@";})
  """;


  select hideEmail('a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"john@something.com";s:7:"authors";a:2:{i:0;s:21:"william@something.com";i:1;s:19:"debbie@software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}')

结果：

a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"****@something.com";s:7:"authors";a:2:{i:0;s:21:"*******@something.com";i:1;s:19:"******@software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}

正则表达式，如何替换保留其长度的文本的一部分

regular expression, how to replace a part of a text preserving its length

google-bigquery