正则表达式删除短句标注

regex remove short sentences callouts

我想从字符串中删除所有短句(标注)。请参阅下面的示例。我想删除所有突出显示的文本。也许文本 < 25 个以“.”结尾的字符

也许这个有帮助。我没有考虑'。字符 因为我用JS填充了这句话。

const sentence = (() => {
    const sentences = [];
    for (let i = 0; i < 15; i++) {
        const len = Math.floor(Math.random() * (30 - 15 + 1) + 15);
        const sentence = [];
        for (let j = 0; j < len; j++) {
        sentence.push(String.fromCharCode(Math.floor(Math.random() * (122 - 97 + 1) + 97)));
        }
        sentences.push(sentence.join(''));
    }
    return sentences
    })();

    console.log(sentence.length)
    console.log(sentence)
    console.log(sentence.filter(s => s.length > 24))
    console.log(sentence.filter(s => s.length > 24).length)

此正则表达式将以最大长度为 25 的句子为目标。

/(?<=^|\.)\s*.{1,25}?\./gms

测试片段:

const regex = /(?<=^|\.)\s*.{1,25}?\./gms;
const str = `This is a test.  Keep this longer text that has over 25 characters.  Remove this small text.  `;

const result = str.replace(regex, '');

console.log(result);

或没有 look-behind。对于落后的浏览器。

/(^|\.)\s*.{1,25}?\./gms

替换为第一个捕获组。

const regex = /(^|\.)\s*.{1,25}?\./gms;
const str = `This is a test.  Keep this longer text that has over 25 characters.  
Remove this small text.  `;

const result = str.replace(regex, '');

console.log(result);