删除除指定字符串以外的所有内容的正则表达式

Question

我正在处理看起来像这样的数据：

{"score":0,"compare":0,"words":["book","planet","sun","science"],"words":[],"good":[],"bad":[]}
{"score":-1,"compare":0,"words":["book","planet","sun","science"],"words":[],"good":[],"bad":[]}
{"score":1,"compare":0,"words":["book","planet","sun","science"],"words":[],"good":[],"bad":[]}

我唯一感兴趣的信息是 "score":#（可以是正面的也可以是负面的）。因为我正在处理像上面那样的数千行，所以我试图使用 regular expression.

仅提取我感兴趣的分数信息

我查阅了各种帖子，例如 here, here and here，但其中 none 似乎解决了我的问题。

我已经用它们来尝试编写我自己的正则表达式。到目前为止，我已经尝试过诸如：

(?!"score":(-)?[0-9])

^(?!"score":(-)?[0-9].*

(.(?!"score":(-)?[0-9]))*

但是这些例子中的每一个都选择了所有的信息，包括我感兴趣的内容。

如何修改这些正则表达式以获得我想要的结果，即：

"score":0
"score":-1
"score":1

Answer 1

我在这里创建了一个开发示例： https://regex101.com/r/yL7hA9/1

是：

"score":(-)?[0-9]+

随时根据您的要求进行修改。

Answer 2

您的正则表达式没有按预期工作：

(?!"score":(-)?\[0-9\]) 匹配每个符号前没有跟 "score":\d+
^(?!"score":(-)?\[0-9\].*) 匹配行首的空space
(.(?!"score":(-)?\[0-9\]))* 匹配除开头 {.

您可以使用

.*("score":[-+]?\d*\.?\d+).*

见demo

替换为</code>。 </p> <p>如果您不需要浮点数支持，只需使用</p> <pre><code>.*("score":[-+]?\d+).*

见another demo

主要概念是匹配所有行并捕获我们需要的子字符串("score":<number>)。然后，我们在替换字符串中还原捕获的文本。

这里，

.* - 匹配除换行符以外的任意数量的任意字符
("score":[-+]?\d*\.?\d+) - 匹配
- "score": - "score": 字面上的
- [+-]? - 文字 + 或 -（你可以保留任何一个 - 自行调整）
- \d*\.?\d+ 匹配浮点数（没有千位分隔符）或
- \d+ - 匹配 1 个或多个数字的序列。

删除除指定字符串以外的所有内容的正则表达式

Regex that removes everything except specified string

regex

regex-negation

regex-lookarounds