正则表达式一致分组
Regex Consistent Grouping
我从 修改了这个相当混乱的正则表达式
https://regex101.com/r/Trdwks/1
(([0-9]{1,2}h)[ ]*([0-9]{1,2}min):\s*|([0-9]{1,2}h)():\s*|()([0-9]{1,2}min):\s*)((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)
想法是它匹配这个字符串,将小时、分钟和描述分组。
1h 30min: Title
- Description Line 1
3h: SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3
1h 14min: Title
- another Great one 42min: Title - Great Movie
- Description Line 2
- Description Line 3
并产生以下结果:
Match 1:
"1h 30min: Title
- Description Line 1"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1"
Match 2:
"3h: SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3"
Group 1: "1h"
Group 2: ""
Group 3: "SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3"
Match 3:
"1h 14min: Title
- another Great one"
Group 1: "1h"
Group 2: "14min"
Group 3: "Title
- another Great one"
Match 4:
"42min: Title - Great Movie
- Description Line 2
- Description Line 3"
Group 1: ""
Group 2: "42min"
Group 3: "Title - Great Movie
- Description Line 2
- Description Line 3"
我很难使分组保持一致,因为时间可能只有几小时,只有几分钟,或者两者兼而有之。所以上面的正则表达式可能会将分钟放在 group 3
或 group 6
中。有没有办法将初始或语句中的分组固定为每个场景中 return 一致的分组?
此解决方案只需要支持先行断言。
(?s)(?=[^:]*\d[^:]*:)(([0-9]{1,2}h)?[ ]*([0-9]{1,2}min)?:\s*)((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)
https://regex101.com/r/gz4r9g/1
展开
(?s)
(?= [^:]* \d [^:]* : )
( # (1 start)
( [0-9]{1,2} h )? # (2)
[ ]*
( [0-9]{1,2} min )? # (3)
: \s*
) # (1 end)
( # (4 start)
(?:
.
(?!
( # (5 start)
\d h \s \d{1,2} min
| \d h
| \d{1,2} min
) # (5 end)
)
)+
) # (4 end)
此方案只需要支持Branch Reset即可。
(?s)(?|([0-9]{1,2}h)[ ]*([0-9]{1,2}min)|([0-9]{1,2}h)()|()([0-9]{1,2}min)):\s*((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)
https://regex101.com/r/pyACdi/1
展开
(?s)
(?|
( [0-9]{1,2} h ) # (1)
[ ]*
( [0-9]{1,2} min ) # (2)
| ( [0-9]{1,2} h ) # (1)
( ) # (2)
| ( ) # (1)
( [0-9]{1,2} min ) # (2)
)
: \s*
( # (3 start)
(?:
.
(?!
( # (4 start)
\d h \s \d{1,2} min
| \d h
| \d{1,2} min
) # (4 end)
)
)+
) # (3 end)
我从
(([0-9]{1,2}h)[ ]*([0-9]{1,2}min):\s*|([0-9]{1,2}h)():\s*|()([0-9]{1,2}min):\s*)((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)
想法是它匹配这个字符串,将小时、分钟和描述分组。
1h 30min: Title
- Description Line 1
3h: SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3
1h 14min: Title
- another Great one 42min: Title - Great Movie
- Description Line 2
- Description Line 3
并产生以下结果:
Match 1:
"1h 30min: Title
- Description Line 1"
Group 1: "1h"
Group 2: "30min"
Group 3: "Title
- Description Line 1"
Match 2:
"3h: SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3"
Group 1: "1h"
Group 2: ""
Group 3: "SECOND TITLE
- Description Line 1
- Description Line 2
- Description Line 3"
Match 3:
"1h 14min: Title
- another Great one"
Group 1: "1h"
Group 2: "14min"
Group 3: "Title
- another Great one"
Match 4:
"42min: Title - Great Movie
- Description Line 2
- Description Line 3"
Group 1: ""
Group 2: "42min"
Group 3: "Title - Great Movie
- Description Line 2
- Description Line 3"
我很难使分组保持一致,因为时间可能只有几小时,只有几分钟,或者两者兼而有之。所以上面的正则表达式可能会将分钟放在 group 3
或 group 6
中。有没有办法将初始或语句中的分组固定为每个场景中 return 一致的分组?
此解决方案只需要支持先行断言。
(?s)(?=[^:]*\d[^:]*:)(([0-9]{1,2}h)?[ ]*([0-9]{1,2}min)?:\s*)((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)
https://regex101.com/r/gz4r9g/1
展开
(?s)
(?= [^:]* \d [^:]* : )
( # (1 start)
( [0-9]{1,2} h )? # (2)
[ ]*
( [0-9]{1,2} min )? # (3)
: \s*
) # (1 end)
( # (4 start)
(?:
.
(?!
( # (5 start)
\d h \s \d{1,2} min
| \d h
| \d{1,2} min
) # (5 end)
)
)+
) # (4 end)
此方案只需要支持Branch Reset即可。
(?s)(?|([0-9]{1,2}h)[ ]*([0-9]{1,2}min)|([0-9]{1,2}h)()|()([0-9]{1,2}min)):\s*((?:.(?!(\dh\s\d{1,2}min|\dh|\d{1,2}min)))+)
https://regex101.com/r/pyACdi/1
展开
(?s)
(?|
( [0-9]{1,2} h ) # (1)
[ ]*
( [0-9]{1,2} min ) # (2)
| ( [0-9]{1,2} h ) # (1)
( ) # (2)
| ( ) # (1)
( [0-9]{1,2} min ) # (2)
)
: \s*
( # (3 start)
(?:
.
(?!
( # (4 start)
\d h \s \d{1,2} min
| \d h
| \d{1,2} min
) # (4 end)
)
)+
) # (3 end)