仅使用正则表达式将字符串拆分为 'tuple' 的文件名和扩展名？

Question

我知道有更简单的方法 get file extensions with JavaScript，但部分是为了练习我的正则表达式技能，我想尝试使用正则表达式将文件名分成两个字符串，在最后一个点 (.字符).

这是我目前所拥有的

const myRegex = /^((?:[^.]+(?:\.)*)+?)(\w+)?$/
const [filename1, extension1] = 'foo.baz.bing.bong'.match(myRegex);
// filename1 = 'foo.baz.bing.'
// extension1 = 'bong'
const [filename, extension] = 'one.two'.match(myRegex);
// filename2 = 'one.'
// extension2 = 'two'
const [filename, extension] = 'noextension'.match(myRegex);
// filename2 = 'noextension'
// extension2 = ''

我试过使用否定前瞻来表示“只匹配文字”。如果它后面跟着一个以这样结尾的单词，将 (?:\.)* 更改为 (?:\.(?=\w+.))*:

/^((?:[^.]+(?:\.(?=(\w+\.))))*)(\w+)$/gm

但我想仅使用正则表达式排除最后一段，并且最好在初始组中匹配 'noextension'，我如何仅使用正则表达式来做到这一点？

这是我的正则表达式暂存文件：https://regex101.com/r/RTPRNU/1

Answer 1

如果你真的想使用正则表达式，我建议使用两个正则表达式：

// example with 'foo.baz.bing.bong'

const firstString = /^.+(?=\.\w+)./g // match 'foo.baz.bing.' 
const secondString = /\w+$/g   // match 'bong'

Answer 2

对于第一个捕获组，您可以从一个或多个单词字符开始匹配。然后可选择重复 . 和 1 个或多个单词字符。

然后你可以使用一个可选的非捕获组匹配一个 . 并在组 2 中捕获 1 个或多个单词字符。

由于第二个非捕获组是可选的，所以第一次重复应该是贪婪的。

^(\w+(?:\.\w+)*?)(?:\.(\w+))?$

模式匹配

^ 字符串开头
( 捕获 组 1
- \w+(?:\.\w+)*?匹配1+字字符，可选重复.和1+字字符
) 关闭组 1
(?:非捕获组整体匹配
- \.(\w+) 匹配一个 . 并在捕获 组 2
)? 关闭非捕获组并使其可选
$ 字符串结束

Regex demo

const regex = /^(\w+(?:\.\w+)*?)(?:\.(\w+))?$/;
[
  "foo.baz.bing.bong",
  "one.two",
  "noextension"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
    console.log(m[2]);
    console.log("----");
  }
});

@Wiktor Stribiżew 在评论中发布的另一个选项是使用非贪婪点来匹配文件名的任何字符：

^(.*?)(?:\.(\w+))?$

Regex demo

Answer 3

在不环顾四周的情况下更明确和准确的东西怎么样...

命名组变体.../^(?<noextension>\w+)$|(?<filename>\w+(?:\.\w+)*)\.(?<extension>\w+)$/
没有命名组.../^(\w+)$|(\w+(?:\.\w+)*)\.(\w+)$/

刚刚显示的两个变体都可以缩短为 2 个捕获组，而不是上面变体的 3 个捕获组，在我看来，这使正则表达式更易于使用，但代价是可读性较差 ...

命名组变体.../(?<filename>\w+(?:\.\w+)*?)(?:\.(?<extension>\w+))?$/
没有命名组.../(\w+(?:\.\w+)*?)(?:\.(\w+))?$/

const testData = [
  'foo.baz.bing.bong',
  'one.two',
  'noextension',
];
// https://regex101.com/r/RTPRNU/5
const regXTwoNamedFileNameCaptures = /(?<filename>\w+(?:\.\w+)*?)(?:\.(?<extension>\w+))?$/;
// https://regex101.com/r/RTPRNU/4
const regXTwoFileNameCaptures = /(\w+(?:\.\w+)*?)(?:\.(\w+))?$/;

// https://regex101.com/r/RTPRNU/3
const regXThreeNamedFileNameCaptures = /^(?<noextension>\w+)$|(?<filename>\w+(?:\.\w+)*)\.(?<extension>\w+)$/
// https://regex101.com/r/RTPRNU/3
const regXThreeFileNameCaptures = /^(\w+)$|(\w+(?:\.\w+)*)\.(\w+)$/

console.log(
  'based on 2 named file name captures ...\n',
  testData, ' =>',
  testData.map(str =>
    regXTwoNamedFileNameCaptures.exec(str)?.groups ?? {}
  )
);
console.log(
  'based on 2 unnamed file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const [
      match,
      filename,
      extension,
    ] = str.match(regXTwoFileNameCaptures) ?? [];
  //] = regXTwoFileNameCaptures.exec(str) ?? [];

    return {
      filename,
      extension,
    }
  })
);

console.log(
  'based on 3 named file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const {
      filename = '',
      extension = '',
      noextension = '',
    } = regXThreeNamedFileNameCaptures.exec(str)?.groups ?? {};

    return {
      filename: filename || noextension,
      extension,
    }
  })
);
console.log(
  'based on 3 unnamed file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const [
      match,
      noextension = '',
      filename = '',
      extension = '',
    ] = str.match(regXThreeFileNameCaptures) ?? [];
  //] = regXThreeFileNameCaptures.exec(str) ?? [];

    return {
      filename: filename || noextension,
      extension,
    }
  })
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

仅使用正则表达式将字符串拆分为 'tuple' 的文件名和扩展名？

Use just regexp to split a string into a 'tuple' of filename and extension?

javascript

regex

capture-group

regex-lookarounds