用于捕获重复模式的正则表达式
RegEx for capturing a repeating pattern
我有以下来自
的正则表达式
([0-9]{1,2}h)[ ]*([0-9]{1,2}min):[ ]*(.*(?:\n(?![0-9]{1,2}h).*)*)
取下面字符串
1h 30min: Title
- Description Line 1
1h 30min: Title
- Description Line 1
- Description Line 2
- Description Line 3
并产生这个结果
Match 1:
"1h 30min: Title
- Description Line 1"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1"
Match 2:
"1h 30min: Title
- Description Line 1
- Description Line 2
- Description Line 3"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1
- Description Line 2
- Description Line 3"
我现在有匹配 1h 30min
并不总是出现在新行上。所以说我有以下字符串
1h 30min: Title
- Description Line 1 1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3
如何修改正则表达式以获得以下匹配结果?
Match 1:
"1h 30min: Title
- Description Line 1"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1"
Match 2:
"1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title - Description Line 1
- Description Line 2
- Description Line 3"
虽然删除 \n
可以解决问题,但它最终会匹配第一个 1h 30min
之后的所有内容
所需的输出很难匹配,但并非不可能。
我会做一部分,也许时间和标题部分用正则表达式,如果可以,那么剩下的用脚本。
在这里,我们可以从类似于以下的表达式开始:
([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+(Title)([\d\D]*?\d|.+)|[\s\S]*
或:
([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+([A-Za-z\s]+)([\d\D]*?\d|.+)|[\s\S]*
const regex = /([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+(Title)([\d\D]*?\d|.+)|[\s\S]*/gm;
const str = `1h 30min: Title
- Description Line 1 1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
正则表达式电路
jex.im 可视化正则表达式:
我无法通过微小的修改来解决它。
所以,我只提供我的解决方案:
([0-9]{1,2}h) *([0-9]{1,2}min):[\s\S]*?(?=[0-9]{1,2}h|$)
您只需稍作改动即可完成这项工作,但问题在于最后一部分。 tempered greedy token 的一般形式是这样的:
(.(?!notAllowed))+
因此,为您的情况使用此模式,并为清楚起见添加命名组:
(?<hours>[0-9]{1,2}h)[ ]*(?<minutes>[0-9]{1,2}min):\s*(?<description>(?:.(?!\dh\s\d{1,2}min))+)
PS:如果无法开启"dot matches newline"模式,可以use[\s\S]
模拟
我有以下来自
([0-9]{1,2}h)[ ]*([0-9]{1,2}min):[ ]*(.*(?:\n(?![0-9]{1,2}h).*)*)
取下面字符串
1h 30min: Title
- Description Line 1
1h 30min: Title
- Description Line 1
- Description Line 2
- Description Line 3
并产生这个结果
Match 1:
"1h 30min: Title
- Description Line 1"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1"
Match 2:
"1h 30min: Title
- Description Line 1
- Description Line 2
- Description Line 3"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1
- Description Line 2
- Description Line 3"
我现在有匹配 1h 30min
并不总是出现在新行上。所以说我有以下字符串
1h 30min: Title
- Description Line 1 1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3
如何修改正则表达式以获得以下匹配结果?
Match 1:
"1h 30min: Title
- Description Line 1"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1"
Match 2:
"1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title - Description Line 1
- Description Line 2
- Description Line 3"
虽然删除 \n
可以解决问题,但它最终会匹配第一个 1h 30min
所需的输出很难匹配,但并非不可能。
我会做一部分,也许时间和标题部分用正则表达式,如果可以,那么剩下的用脚本。
在这里,我们可以从类似于以下的表达式开始:
([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+(Title)([\d\D]*?\d|.+)|[\s\S]*
或:
([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+([A-Za-z\s]+)([\d\D]*?\d|.+)|[\s\S]*
const regex = /([0-9]{1,2}h)\s+([0-9]{1,2}min):\s+(Title)([\d\D]*?\d|.+)|[\s\S]*/gm;
const str = `1h 30min: Title
- Description Line 1 1h 30min: Title - Description Line 1
- Description Line 2
- Description Line 3`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
正则表达式电路
jex.im 可视化正则表达式:
我无法通过微小的修改来解决它。 所以,我只提供我的解决方案:
([0-9]{1,2}h) *([0-9]{1,2}min):[\s\S]*?(?=[0-9]{1,2}h|$)
您只需稍作改动即可完成这项工作,但问题在于最后一部分。 tempered greedy token 的一般形式是这样的:
(.(?!notAllowed))+
因此,为您的情况使用此模式,并为清楚起见添加命名组:
(?<hours>[0-9]{1,2}h)[ ]*(?<minutes>[0-9]{1,2}min):\s*(?<description>(?:.(?!\dh\s\d{1,2}min))+)
PS:如果无法开启"dot matches newline"模式,可以use[\s\S]
模拟