将 JavaScript 正则表达式转换为 JSON 格式

Convert a JavaScript RegEx into JSON format

我目前正在开发一个 Safari 扩展,它将利用 Safari 9 中可用的新 webkit-content-blocker 功能。现在,此类拦截器的规则需要写在 JSON 中。

我即将扩展的后台脚本生成了这样的 JSON 规则。我遇到的问题是我无法正确格式化正则表达式,其作用是过滤 URLs,以兼容 JSON。

假设我需要屏蔽 URL 包含 "banana"、"orange" 或 "apple" 的所有图像。我的正则表达式是这样的:

var url-filter = /banana|orange|apple/g;

现在 JSON 中的阻止程序规则,缺少 url 过滤部分:

"action": {
   "type": "block"
    },
"trigger": {
   "url-filter": <JSON regex here>,
   "resource-type": ["image"],
   "load-type": ["third-party"]
    }

[更新]

如何将我的正则表达式重写为 JSON compatible/ready,知道不支持交替?

The Regular expression format

Triggers support filtering the URLs of each resource based on regular expression.

The following features are supported:

  • Matching any character with “.”.
  • Matching ranges with the range syntax [a-b].
  • Quantifying expressions with “?”, “+” and “*”.
  • Groups with parenthesis.

It is possible to use the beginning of line (“^”) and end of line (“$”) marker but they are restricted to be the first and last character of the expression. For example, a pattern like “^bar$” is perfectly valid, while “(foo)?^bar$” causes a syntax error.

[更新 BIS]

考虑到 Safari 执行的严格的 CSP 策略以及不支持交替,我最终将我的原始正则表达式转换为数组,然后通过循环动态生成 JSON 规则。

var regex = 'banana|orange|apple',
    filters = regex.split('|'),
    json_rules = [];

var Blocker = {
        build: function() {

            filters.forEach( function(filter) {
                var rule = {
                    action: {
                        'type': 'block'
                    },
                    trigger: {
                        'url-filter': filter,
                        'resource-type': ['image'],
                        'load-type': ['third-party']
                    }
                };
                json_rules.push(rule);
            });

            Blocker.set(JSON.stringify(json_rules));
        },
        init: function() {
            Blocker.build();
        },
        set: function (rule) {
            safari.extension.setContentBlocker(rule);
        }

};

根据您链接的文档,过滤器的值被视为正则表达式(例如,它们显示 "url-filter": "evil-tracker\.js""url-filter": ".*")。

文档还说 url-filter 不区分大小写,因此您不必担心可能要使用的 i 标志。但是如果你想要一个区分大小写的,你会添加 "url-filter-is-case-sensitive": true.

在这种情况下,您只需将正则表达式放在引号中,确保在字符串文字中转义任何需要转义的字符(例如,请注意他们如何在 "evil-tracker\.js" 字符串,以便正则表达式为 evil-tracker\.js).

但是:你这个表达式的问题是不支持交替。同样,根据您链接的文档:

The format is a strict subset of JavaScript regular expressions. Syntactically, everything supported by JavaScript is reserved but only a subset will be accepted by the parser. An unsupported expression results in a parse error.

The following features are supported:

  • Matching any character with “.”.
  • Matching ranges with the range syntax [a-b].
  • Quantifying expressions with “?”, “+” and “*”.
  • Groups with parenthesis.

It is possible to use the beginning of line (“^”) and end of line (“$”) marker but they are restricted to be the first and last character of the expression. For example, a pattern like “^bar$” is perfectly valid, while “(foo)?^bar$” causes a syntax error.

请注意,他们不接受 |(交替)。

这告诉我您需要三个规则:一个用于 banana,一个用于 orange,一个用于 apple