使用正则表达式模式从 url 字符串中捕获键值对

Question

我正在尝试使用正则表达式来解析如下所示的字符串：

/subject=hello±@text=something that may contain\@hello.com or a normal sla/sh±@date=blah/somethingelseI don't want to capture after the first/

进入：

subject = hello
text =something that may contain\@hello.com or a normal sla/sh
date = blah

理想情况下，我希望能够在第一个“/”之后将字符串拆分为“±@”之类的东西 - 并且只能按该顺序进行组合。

我环顾四周，此刻有以下信息：

([^/±@,= ]+)=([^±@,= ]+)

但这不仅仅匹配“±@” - 它匹配 @ 或 ±。它也不能处理转义的@。（相反，我得到：text= something that may contain\）。

有更好的方法吗？

谢谢

Answer 1

试试这个：

(?:\/|(?<=±@))(.*?=.*?)(?:±@|$|\/(?!.*±@))

一个重要的部分是尾部斜线 /(?!.*±@) 之后的负面展望 - 这意味着 "match a slash, but only if ±@ doesn't appear in the input after it".

鉴于此输入：

/subject=hello±@text=something that may contain\@hello.com or a normal sla/sh±@date=blah/somethingelseI don't want to capture after the first/

它生成第 1 组的匹配项：

subject=hello
text=something that may contain\@hello.com or a normal sla/sh
date=blah

Capturing key value pairs from a url string with a regex pattern