Javascript 正则表达式匹配除另一个表达式之外的表达式

Question

首先，我将描述我想要实现的目标。我想复制粘贴足球比赛列表（作为普通用户，而不是开发人员，所以来自网站的纯文本，而不是检查 html 的 html），因此我必须解析文本。所以在网站上它看起来像这样：

粘贴的文本如下所示：

PERU\r\nLiga 2\r\nClasament Live\r\nFinal\r\nSanta Rosa\r\n\r\n0\r\n - \r\n3\r\n\r\nMolinos El Pirata\r\n(0 - 1)\r\n73 \r\nChavelines\r\n\r\n1\r\n - \r\n0\r\n\r\nDeportivo Coopsol\r\n(1 - 0)\r\n20:30\r\nComerciantes Unidos\r\n\r\n-\r\n\r\nJuan Aurich\r\n22:45\r\nSantos FC\r\n\r\n-\r\n\r\nHuaral\r\nPOLONIA\r\nEkstraklasa\r\nClasament Live\r\n90+1 \r\nPogon Szczecin\r\n\r\n2\r\n - \r\n0\r\n\r\nStal Mielec\r\n(1 - 0)\r\nlive\r\n20:30\r\nPlock\r\n\r\n-\r\n\r\nGornik Z.\r\nPORTUGALIA\r\nPrimeira Liga\r\nClasament\r\n21:15\r\nFarense\r\n\r\n-\r\n\r\nMaritimo

然后我需要的是构建这样的东西：

Final     Santa Rosa            0 - 3  Molinos El Pirata
75        Chavelines            1 - 0  Deportivo Coopsol
20:30     Comerciantes Unidos     -    Juan Aurich
22:45     Santos FC               -    Huaral
90+3      Pogon Szczecin        2 - 0  Stal Mielec
20:30     Plock                   -    Gornik Z.
21:15     Farense                 -    Maritimo

所以计划是将每一行提取到一个数组中，然后将它们放在一个 table 中。我首先清理不需要的文本（国家名称、联赛名称、半场比分：

gamesUnformatted = gamesUnformatted.replace(/\b[A-Z]{5,}\b/g, '['); // replace the country name (names with more than 4 letters, to avoid removing LASK, TSKA... but it will remove IRAN, ASIA - find better way) which is in capital letters with [
gamesUnformatted = gamesUnformatted.replace(/Clasament Live/g, ']');
gamesUnformatted = gamesUnformatted.replace(/Clasament/g, ']'); // replace the words Clasament with ]
gamesUnformatted = gamesUnformatted.replace(/ *\[[^\]]*]/g, ''); // remove everything between [ and ], including the square brackets
gamesUnformatted = gamesUnformatted.replace(/\(\d{1,2} - \d{1,2}\)/g, ''); // remove half time score eg (0 - 0)

现在我想在每行前面添加单词 newLine，这样以后我就可以按“newLine”拆分，并在数组中包含所有独立的行。一条线的开始位置有三种情况：如果游戏没有开始（20:30），如果游戏已经结束（Final）或者如果游戏是运行（例如 70）。对于前两个，我有以下内容：

gamesUnformatted = gamesUnformatted.replace(/\d{2}:\d{2}/g, 'newLine$&'); // add the word newLine in front of the starting hours
gamesUnformatted = gamesUnformatted.replace(/Final/g, 'newLine$&'); // add the word newLine in front of the word Final (game has ended)

但第三个更棘手。可以有 0-90，然后加时赛 90+ 左右（例如 90+3），然后可以有两个额外的半场（例如 120、120+..）。所以这就是我需要帮助的地方。我需要一个匹配所有这些场景的正则表达式，但排除其他场景。更准确地说，我需要匹配分钟（1-120 和 1-120+...），而不是比分或小时（1-0，20:30）。而且我已经尝试了半天的各种事情，无法在这里列出所有内容，但是已经尝试过 ^ 和 ?: 和 !什么不是。我必须说我不擅长正则表达式，所以可能我尝试过的大部分事情都很愚蠢，但是好吧，我现在拥有的是：

gamesUnformatted = gamesUnformatted.replace(/\d{1,3}[^(\d{2}:\d{2})]/g, 'newLine$&');

这只是第一步，将任何数字替换为 1 到 3 位数字，不考虑“90+4”。并试图忽略时间，而不是分数。但这效果不佳，因为它在每个数字前添加了新行。所以这个：

90+3      Pogon Szczecin        2 - 0 Stal Mielec
20:30     Plock                   - Gornik Z.

变成这样：

newLine90+newLine3      Pogon Szczecin        newLine2 - newLine0 Stal Mielec
newLinenewLine20:newLine30     Plock                   - Gornik Z.

而不是这个（第二行有两个新行，因为一个是在小时之前添加的，所以必须忽略）：

newLine90+3      Pogon Szczecin        2 - 0 Stal Mielec
newLinenewLine20:30     Plock                   - Gornik Z.

Answer 1

在添加新行之前，您可以再做一次替换以确保分数在 1 行中，例如 0-1

演示： https://regex101.com/r/Dur5lD/4

模式：匹配：(\d{1,2})\s*-\s*(\d{1,2})；替换：-

解释：因为我在文本中有换行符，所以我使用了\s来匹配space序列。使用捕获组 </code> 和 <code> 获得所需的输出。

完成此操作后，添加 newline 应该很简单。

演示： https://regex101.com/r/Dur5lD/5

模式： ^((?:Final)|(?:\d{2}:\d{2})|(?:\d{1,3}(?!\d)(?!-)))

解释：

捕获可以是 Final 或 hour 或 time 之一的组。
匹配时间，使用negative look ahead，(?!)。这意味着诸如 70 或 120 之类的时间值不应后跟 - 或其他数字。

注：

我假定 \r\n 为换行符。如果不是，我们可能需要用文字 \r\n.

\s

^

看起来您的正则表达式无法处理 PERU，因此我手动删除了该行。
将 \n 替换为 \t，然后将 newLine 替换为 \n 后得到 https://regex101.com/r/Dur5lD/6。

Answer 2

确实，理解纯文本副本是一项挑战。这是一种确定性地重建行的不同方法：

删除国家线
删除半场得分
移除'live'
合并分成多行的分数（真实分数和空分数）
现在我们有一组 4 行组成一行，例如我们可以避免复杂的时间解析 (final, 00, 00+0, 00:00, ...)
每 4 行拆分一次，生成包含换行符分隔部分的行
遍历行以从部件中提取时间、A 队、分数、B 队，并从部件创建对象
现在您可以遍历生成的对象数组以显示您的 table

const input = 'PERU\r\nLiga 2\r\nClasament Live\r\nFinal\r\nSanta Rosa\r\n\r\n0\r\n - \r\n3\r\n\r\nMolinos El Pirata\r\n(0 - 1)\r\n73 \r\nChavelines\r\n\r\n1\r\n - \r\n0\r\n\r\nDeportivo Coopsol\r\n(1 - 0)\r\n20:30\r\nComerciantes Unidos\r\n\r\n-\r\n\r\nJuan Aurich\r\n22:45\r\nSantos FC\r\n\r\n-\r\n\r\nHuaral\r\nPOLONIA\r\nEkstraklasa\r\nClasament Live\r\n90+1 \r\nPogon Szczecin\r\n\r\n2\r\n - \r\n0\r\n\r\nStal Mielec\r\n(1 - 0)\r\nlive\r\n20:30\r\nPlock\r\n\r\n-\r\n\r\nGornik Z.\r\nPORTUGALIA\r\nPrimeira Liga\r\nClasament\r\n21:15\r\nFarense\r\n\r\n-\r\n\r\nMaritimo';

let rows = input
  .replace(/\r/g, '') // remove '\r'
  .replace(/[A-Z]{4,}([^\n]*\n){3}/g, '') // remove country line
  .replace(/\n\(\d{1,2} - \d{1,2}\)\n/g, '\n') // remove half time scores 
  .replace(/\nlive\n/g, '\n') // remove 'live' lines
  .replace(/\n\n(\d{1,2})\n *- *\n(\d{1,2})\n\n/g, '\n - \n') // join 'n - m' scores
  .replace(/\n\n *- *\n\n/g, '\n-\n') // join '-' empty scores
  .match(/(?:^.*$\n?){1,4}/mg); // split every 4 lines
console.log('rows: ' + JSON.stringify(rows, null, ' '));

let scores = rows.map((line) => {
    let parts = line.split(/\n/);
    let obj = {
      time: parts[0],
      teamA: parts[1],
      score: parts[2],
      teamB: parts[3]
    };
    return obj
  });
console.log('scores: ' + JSON.stringify(scores, null, ' '));

输出：


rows:[
 "Final\nSanta Rosa\n0 - 3\nMolinos El Pirata\n",
 "73 \nChavelines\n1 - 0\nDeportivo Coopsol\n",
 "20:30\nComerciantes Unidos\n-\nJuan Aurich\n",
 "22:45\nSantos FC\n-\nHuaral\n",
 "90+1 \nPogon Szczecin\n2 - 0\nStal Mielec\n",
 "20:30\nPlock\n-\nGornik Z.\n",
 "21:15\nFarense\n-\nMaritimo"
]
scores: [
 {
  "time": "Final",
  "teamA": "Santa Rosa",
  "score": "0 - 3",
  "teamB": "Molinos El Pirata"
 },
 {
  "time": "73 ",
  "teamA": "Chavelines",
  "score": "1 - 0",
  "teamB": "Deportivo Coopsol"
 },
 {
  "time": "20:30",
  "teamA": "Comerciantes Unidos",
  "score": "-",
  "teamB": "Juan Aurich"
 },
 {
  "time": "22:45",
  "teamA": "Santos FC",
  "score": "-",
  "teamB": "Huaral"
 },
 {
  "time": "90+1 ",
  "teamA": "Pogon Szczecin",
  "score": "2 - 0",
  "teamB": "Stal Mielec"
 },
 {
  "time": "20:30",
  "teamA": "Plock",
  "score": "-",
  "teamB": "Gornik Z."
 },
 {
  "time": "21:15",
  "teamA": "Farense",
  "score": "-",
  "teamB": "Maritimo"
 }
]

Javascript 正则表达式匹配除另一个表达式之外的表达式

Javascript Regex to match expression except another expression

regex

regex-negation

regex-group

regexp-replace