如何在 Python 中对正则表达式应用字符串方法

Question

我有一个 markdown 文件，它有点损坏：太长的链接和图像中有换行符。我想从中删除换行符。

示例：

来自：

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

至：

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

如您在此片段中所见，我设法将所有链接和图像与正确的模式相匹配：https://regex101.com/r/uL8pO4/2

但是现在，Python 中的语法是什么，可以在我用正则表达式捕获的内容上使用像 string.trim() 这样的字符串方法？

目前，我坚持这个：

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[]()'.trim() ??
post['content'] = fix_newlines.sub(r'[]()', post['content'])

编辑：我更新了示例以更明确地说明我的问题。

感谢您的回答

Answer 1

strip 的工作方式类似于 trim 的功能。由于您需要 trim 新行，请使用 strip('\n'),

fin.readline.strip('\n')

Answer 2

这也适用：

>>> s = """
...    ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """

>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>>

通常内置的字符串函数就可以了，而且比弄清楚正则表达式更容易阅读。在这种情况下，strip 删除前导和尾随 space，然后在换行符之间拆分 returns 项目列表，然后 join 将它们放回到一个字符串中。

Answer 3

好的，我终于找到了我要找的东西。通过下面的代码片段，我可以使用正则表达式捕获一个字符串，然后对每个字符串应用处理。

def remove_newlines(match):
    return "".join(match.group().strip().split('\n'))

links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])

感谢您的回答，如果我的问题不够明确，我们深表歉意。

如何在 Python 中对正则表达式应用字符串方法

How to apply string method on regular expression in Python

python

regex

markdown