分别匹配句子和空格

Match sentences and whitespace separately

取下面的文字:

This is a sentence. This is a sentence...    This is a sentence! This is a sentence? This is a sentence.This is a sentence. This is a sentence

我想匹配这个,所以我有一个如下所示的数组:

[
  "This is a sentence.",
  " ",
  "This is a sentence...",
  "    ",
  "This is a sentence!",
  " ",
  "This is a sentence?",
  " ",
  "This is a sentence.",
  "",
  "This is a sentence.",
  " ",
  "This is a sentence",
]

然而,使用我当前的正则表达式:

str.match(/[^.!?]+[.!?]*(\s*)/g);

我得到以下信息:

[
  "This is a sentence. ",
  "This is a sentence...    ", 
  "This is a sentence! ",
  "This is a sentence? ", 
  "This is a sentence.", 
  "This is a sentence. ", 
  "This is a sentence"
]

如何使用 JS ReExp 实现此目的?

提前致谢!

只要在开头加上[^\s],把(\s*)改成|\s+即可。

最终的正则表达式如下:

str.match(/[^\s][^.!?]+[.!?]*|\s+/g)

  • [^\s] 将删除表达式开头的空格
  • |\s+ 会将空格视为新表达式

这是在问题中使用正则表达式的解决方案,但之后进行一些数组拆分以保留数组中的空格;本质上,如果它们位于字符串的末尾( $ 的正向超前),它将用空格拆分数组,然后再次将其展平以实现您想要的确切输出。

const baseStr = "This is a sentence. This is a sentence...    This is a sentence! This is a sentence? This is a sentence.This is a sentence. This is a sentence";

var result = baseStr.match(/[^.!?]+[.!?]*(\s*)/g).map( str => str.split(/(\s*)(?=$)/).filter(_=>_)).flat();

console.log(result);