Javascript 正则表达式逗号分隔文本
Javascript Regex comma separated text
我有这个字符串:
remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820
我想匹配和提取以逗号分隔的字符串。
结果应该是:
MATCH 1
'remote:City|Vestavia Hills,AL'
MATCH 2
'remote:Citystate|Vestavia Hills'
MATCH 3
'395b5231539390675a7abe0751fc4820'
MATCH 4
'remote:City|Vestavia Hills,AL'
MATCH 5
'remote:Citystate|Vestavia Hills'
MATCH 6
'395b5231539390675a7abe0751fc4820'
我有这个正则表达式:
(remote:[a-zA-Z]+\|[^\,]+|[a-f0-9]{32})
但是那些有状态 'AL' 的城市(用逗号分隔)被错误地分隔。
可能的解决方案:
我正在考虑做这样的事情 - remote:[a-zA-Z]+\|.*
- 并在它后面的逗号结束匹配(remote:[a-zA-Z]+\|.*
)或 md5 散列([a-f0-9]{32},?
)。
这是我的正则表达式测试器 link:
一种选择是使用 javascript:
的拆分
var str = "remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820";
var aux = str.split("remote");
var res = [];
for (var i=1 ; i < aux.length ; i++){
res.push("remote" + aux[i]);
};
console.log(res);
您可以将您的正则表达式微调为这个基于前瞻性的正则表达式:
/(?:^|,)(.+?(?=,(?:[a-f0-9]{32}|remote:)|$))/igm
如您所料,这将提供 6 个捕获组。
(?:^|,) # Match line start or comma
( # captured group #1 start
.+? # match 1 or more of any character (lazy)
(?= # lookahead start
, # match comma followed by
(?: # non-capturing group start
[a-f0-9]{32} # match hex digit 32 times
| # OR
remote: # match literal "remote:"
) # non-capturing group end
| # OR
$ # line end
) # looakehad end
) # capturing group #1 end
([a-f0-9]{32}|remote:[^|]+\|[^,]+(?:,[A-Z]{2})?),?
这个比较好理解,我给组做了一个特殊的可选后缀,逗号后面只能是2个大写字母。
使用单个正则表达式,您可以执行以下操作;
var str = "remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820",
arr = str.match(/(r.+?|[\da-f]{32})(?=,?(remote|[\da-f]{32}|$))/g);
console.log(arr);
我有这个字符串:
remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820
我想匹配和提取以逗号分隔的字符串。
结果应该是:
MATCH 1
'remote:City|Vestavia Hills,AL'
MATCH 2
'remote:Citystate|Vestavia Hills'
MATCH 3
'395b5231539390675a7abe0751fc4820'
MATCH 4
'remote:City|Vestavia Hills,AL'
MATCH 5
'remote:Citystate|Vestavia Hills'
MATCH 6
'395b5231539390675a7abe0751fc4820'
我有这个正则表达式:
(remote:[a-zA-Z]+\|[^\,]+|[a-f0-9]{32})
但是那些有状态 'AL' 的城市(用逗号分隔)被错误地分隔。
可能的解决方案:
我正在考虑做这样的事情 - remote:[a-zA-Z]+\|.*
- 并在它后面的逗号结束匹配(remote:[a-zA-Z]+\|.*
)或 md5 散列([a-f0-9]{32},?
)。
这是我的正则表达式测试器 link:
一种选择是使用 javascript:
的拆分var str = "remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820";
var aux = str.split("remote");
var res = [];
for (var i=1 ; i < aux.length ; i++){
res.push("remote" + aux[i]);
};
console.log(res);
您可以将您的正则表达式微调为这个基于前瞻性的正则表达式:
/(?:^|,)(.+?(?=,(?:[a-f0-9]{32}|remote:)|$))/igm
如您所料,这将提供 6 个捕获组。
(?:^|,) # Match line start or comma
( # captured group #1 start
.+? # match 1 or more of any character (lazy)
(?= # lookahead start
, # match comma followed by
(?: # non-capturing group start
[a-f0-9]{32} # match hex digit 32 times
| # OR
remote: # match literal "remote:"
) # non-capturing group end
| # OR
$ # line end
) # looakehad end
) # capturing group #1 end
([a-f0-9]{32}|remote:[^|]+\|[^,]+(?:,[A-Z]{2})?),?
这个比较好理解,我给组做了一个特殊的可选后缀,逗号后面只能是2个大写字母。
使用单个正则表达式,您可以执行以下操作;
var str = "remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820",
arr = str.match(/(r.+?|[\da-f]{32})(?=,?(remote|[\da-f]{32}|$))/g);
console.log(arr);