如何在 Python 中对正则表达式应用字符串方法
How to apply string method on regular expression in Python
我有一个 markdown 文件,它有点损坏:太长的链接和图像中有换行符。我想从中删除换行符。
示例:
来自:
See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
至:
See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
如您在此片段中所见,我设法将所有链接和图像与正确的模式相匹配:https://regex101.com/r/uL8pO4/2
但是现在,Python 中的语法是什么,可以在我用正则表达式捕获的内容上使用像 string.trim()
这样的字符串方法?
目前,我坚持这个:
fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[]()'.trim() ??
post['content'] = fix_newlines.sub(r'[]()', post['content'])
编辑:我更新了示例以更明确地说明我的问题。
感谢您的回答
strip 的工作方式类似于 trim 的功能。由于您需要 trim 新行,请使用 strip('\n'),
fin.readline.strip('\n')
这也适用:
>>> s = """
... 
... """
>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
''
>>>
通常内置的字符串函数就可以了,而且比弄清楚正则表达式更容易阅读。在这种情况下,strip 删除前导和尾随 space,然后在换行符之间拆分 returns 项目列表,然后 join 将它们放回到一个字符串中。
好的,我终于找到了我要找的东西。通过下面的代码片段,我可以使用正则表达式捕获一个字符串,然后对每个字符串应用处理。
def remove_newlines(match):
return "".join(match.group().strip().split('\n'))
links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])
感谢您的回答,如果我的问题不够明确,我们深表歉意。
我有一个 markdown 文件,它有点损坏:太长的链接和图像中有换行符。我想从中删除换行符。
示例:
来自:
See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
至:
See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
如您在此片段中所见,我设法将所有链接和图像与正确的模式相匹配:https://regex101.com/r/uL8pO4/2
但是现在,Python 中的语法是什么,可以在我用正则表达式捕获的内容上使用像 string.trim()
这样的字符串方法?
目前,我坚持这个:
fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[]()'.trim() ??
post['content'] = fix_newlines.sub(r'[]()', post['content'])
编辑:我更新了示例以更明确地说明我的问题。
感谢您的回答
strip 的工作方式类似于 trim 的功能。由于您需要 trim 新行,请使用 strip('\n'),
fin.readline.strip('\n')
这也适用:
>>> s = """
... 
... """
>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
''
>>>
通常内置的字符串函数就可以了,而且比弄清楚正则表达式更容易阅读。在这种情况下,strip 删除前导和尾随 space,然后在换行符之间拆分 returns 项目列表,然后 join 将它们放回到一个字符串中。
好的,我终于找到了我要找的东西。通过下面的代码片段,我可以使用正则表达式捕获一个字符串,然后对每个字符串应用处理。
def remove_newlines(match):
return "".join(match.group().strip().split('\n'))
links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])
感谢您的回答,如果我的问题不够明确,我们深表歉意。