正则表达式将逗号分隔的单词与最终的 "and" 子句匹配

Regex to match comma-separated words with final "and" clause

我需要一个 Regex 表达式来匹配英文事物列表中的单词或短语,采用以下形式之一:

  1. “一些话”
    会匹配“一些词
  2. "一些词和一些其他词"
    将匹配“Some words”和“some other words
  3. "一些字,更多字和一些其他字"
    将匹配“一些词”、“更多词”和“一些其他词
  4. "一些字,更多字,还有一些字"
    将匹配“一些词”、“更多词”和“一些其他词

换句话说,正则表达式允许我识别英语短语列表中的每个短语,除了最后一个短语(如果有两个以上的短语)以外的所有短语都用逗号分隔,最后的“和" 前面可以加逗号,也可以不加逗号。

获取逗号分隔的匹配很容易:

[^,]+

但我不知道如何处理可选的最后一个“and”分隔符(前面没有逗号)。

你可以试试

[some|Some|more]+\s(?:[a-z]+)?\s?words

希望对您有所帮助!

一种方法是将字符串拆分为 and(可选地以逗号开头)或逗号:

string[] inp = new string[] {
    "Some words",
    "Some words and some other words",
    "Some words, more words and some other words",
    "Some words, more words, and some other words" 
};
foreach (string s in inp) {
    string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
    Console.WriteLine(string.Join("\n", phrases));
}

输出:

Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words

Demo on ideone

您可以在 Regex.Split 中使用以下模式:

\s*(?:(?:,\s*)?\band\s+|,\s*)

参见regex demo

详情:

  • \s* - 零个或多个空格
  • (?:(?:,\s*)?\band\s+|,\s*) - 两种选择之一:
    • (?:,\s*)?\band\s+ - 一个可选的逗号序列和零个或多个空格,然后是一个完整的单词 and,在
    • 之后有一个或多个空格字符
    • | - 或
    • ,\s* - 一个逗号和零个或多个空格。

查看 C# 演示:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var texts = new List<string> { 
            "Some words",
            "Some words and some other words",
            "Some words, more words and some other words",
            "Some words, more words, and some other words" 
        };
        var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
        foreach (var text in texts) 
        {
            var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
            Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
        }
    }
}

输出:

'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']