正则表达式将逗号分隔的单词与最终的 "and" 子句匹配
Regex to match comma-separated words with final "and" clause
我需要一个 Regex 表达式来匹配英文事物列表中的单词或短语,采用以下形式之一:
- “一些话”
会匹配“一些词”
- "一些词和一些其他词"
将匹配“Some words”和“some other words”
- "一些字,更多字和一些其他字"
将匹配“一些词”、“更多词”和“一些其他词”
- "一些字,更多字,还有一些字"
将匹配“一些词”、“更多词”和“一些其他词”
换句话说,正则表达式允许我识别英语短语列表中的每个短语,除了最后一个短语(如果有两个以上的短语)以外的所有短语都用逗号分隔,最后的“和" 前面可以加逗号,也可以不加逗号。
获取逗号分隔的匹配很容易:
[^,]+
但我不知道如何处理可选的最后一个“and”分隔符(前面没有逗号)。
你可以试试
[some|Some|more]+\s(?:[a-z]+)?\s?words
希望对您有所帮助!
一种方法是将字符串拆分为 and
(可选地以逗号开头)或逗号:
string[] inp = new string[] {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
foreach (string s in inp) {
string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
Console.WriteLine(string.Join("\n", phrases));
}
输出:
Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words
您可以在 Regex.Split
中使用以下模式:
\s*(?:(?:,\s*)?\band\s+|,\s*)
参见regex demo。
详情:
\s*
- 零个或多个空格
(?:(?:,\s*)?\band\s+|,\s*)
- 两种选择之一:
(?:,\s*)?\band\s+
- 一个可选的逗号序列和零个或多个空格,然后是一个完整的单词 and
,在 之后有一个或多个空格字符
|
- 或
,\s*
- 一个逗号和零个或多个空格。
查看 C# 演示:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var texts = new List<string> {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
foreach (var text in texts)
{
var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
}
}
}
输出:
'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']
我需要一个 Regex 表达式来匹配英文事物列表中的单词或短语,采用以下形式之一:
- “一些话”
会匹配“一些词” - "一些词和一些其他词"
将匹配“Some words”和“some other words” - "一些字,更多字和一些其他字"
将匹配“一些词”、“更多词”和“一些其他词” - "一些字,更多字,还有一些字"
将匹配“一些词”、“更多词”和“一些其他词”
换句话说,正则表达式允许我识别英语短语列表中的每个短语,除了最后一个短语(如果有两个以上的短语)以外的所有短语都用逗号分隔,最后的“和" 前面可以加逗号,也可以不加逗号。
获取逗号分隔的匹配很容易:
[^,]+
但我不知道如何处理可选的最后一个“and”分隔符(前面没有逗号)。
你可以试试
[some|Some|more]+\s(?:[a-z]+)?\s?words
希望对您有所帮助!
一种方法是将字符串拆分为 and
(可选地以逗号开头)或逗号:
string[] inp = new string[] {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
foreach (string s in inp) {
string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
Console.WriteLine(string.Join("\n", phrases));
}
输出:
Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words
您可以在 Regex.Split
中使用以下模式:
\s*(?:(?:,\s*)?\band\s+|,\s*)
参见regex demo。
详情:
\s*
- 零个或多个空格(?:(?:,\s*)?\band\s+|,\s*)
- 两种选择之一:(?:,\s*)?\band\s+
- 一个可选的逗号序列和零个或多个空格,然后是一个完整的单词and
,在 之后有一个或多个空格字符
|
- 或,\s*
- 一个逗号和零个或多个空格。
查看 C# 演示:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var texts = new List<string> {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
foreach (var text in texts)
{
var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
}
}
}
输出:
'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']