python：用于包装通过电子邮件约定引用的文本的模块？

Question

我正在寻找一个 python 模块或一些现有的 python 代码，它们可用于包装使用“>”行前缀来指示引用文本的文本（见下文举个例子）。

我知道我可以使用 python textwrap 模块对文本段落进行换行。但是，该模块不知道这种引用前缀。

我知道如何编写一个例程来执行此文本换行，我不是在寻求有关如何编写它的建议。相反，我想知道是否有人知道任何 python 代码或 python 模块已经存在并且已经能够对电子邮件类型的引用文本执行这种包装。

我一直在搜索，但在 python 中没有找到任何内容。

我只是不想"re-invent the wheel"，如果这样的东西已经写好了。

这是我想要执行的文本换行示例。假设我有以下来自假设电子邮件的文本：

Abc defg hijk lmnop.

Mary had a little lamb.
Her fleas were white as snow,

> Now is the time for all good men to come to the aid of their party.
>
> The quick
> brown fox jumped over the lazy sleeping dog.

>> When in the Course of human
>> events it
>> becomes necessary for one people to dissolve the political
>> bands
>> which have
>> connected them ...
      and everywhere that Mary went,
      her fleas were sure to go
      ... and to reproduce.
> What do you mean by this?
>> with another
>> and to assume among
>> the powers of the earth ...
> Doo wah diddy, diddy dum, diddy doo.
>> Text text text text text text text text text text text text text text text text text text text text text text text text text text text.

假设我想在第 52 列换行，结果文本应如下所示：

Abc defg hijk lmnop.

Mary had a little lamb. Her fleas were white as
snow,

> Now is the time for all good men to come to the
> aid of their party.
>
> The quick brown fox jumped over the lazy sleeping
> dog.

>> When in the Course of human events it becomes
>> necessary for one people to dissolve the
>> political bands which have connected them ...
      and everywhere that Mary went, her fleas were
      sure to go ... and to reproduce.
> What do you mean by this?
>> with another and to assume among the powers of
>> the earth ...
> Doo wah diddy, diddy dum, diddy doo.
>> Text text text text text text text text text text
>> text text text text text text text text text text
>> text text text text text text text.

感谢您对现有 python 代码的任何引用。

如果不存在这样的东西"out in the wild"，我会写下这个和post我的代码。

非常感谢。

Answer 1

我找不到任何包含这种引用文本的现有代码，所以这是我编写的代码。它使用 re 和 textwrap 模块。

我根据初始引号或缩进字符的数量将代码分成 "paragraphs"。然后，我使用 textwrap 将每个 "paragraph" 换行，并从每一行中删除引号或缩进前缀。包装后，我将前缀重新添加到 "paragraph".

的每一行

总有一天我会清理代码并使其更优雅一些，但至少它看起来可以正常工作。

import re
import textwrap
def wrapemail(text, wrap=72):
    if not text:
        return ''
    prefix      = None
    prev_prefix = None
    paragraph   = []
    paragraphs  = []
    for line in text.rstrip().split('\n'):
        line = line.rstrip()
        m = wrapemail.qprefixpat.search(line)
        if m:
            prefix = wrapemail.whitepat.sub('', m.group(1))
            text   = m.group(2)
            if text and wrapemail.whitepat.search(text[0]):
                prefix += text[0]
                text    = text[1:]
        else:
            m = wrapemail.wprefixpat.search(line)
            if m:
                prefix = m.group(1)
                text   = m.group(2)
            else:
                prefix = ''
                text   = line
        if not text:
            if paragraph and prev_prefix is not None:
                paragraphs.append((prev_prefix, paragraph))
            paragraphs.append((prefix, ['']))
            prev_prefix = None
            paragraph   = []
        elif prefix != prev_prefix:
            if paragraph and prev_prefix is not None:
                paragraphs.append((prev_prefix, paragraph))
            prev_prefix = prefix
            paragraph   = []
        paragraph.append(text)
    if paragraph and prefix is not None:
        paragraphs.append((prefix, paragraph))
    result = ''
    for paragraph in paragraphs:
        prefix = paragraph[0]
        text   = '\n'.join(paragraph[1]).rstrip()
        wraplen = wrap - len(prefix)
        if wraplen < 1:
            result += '{}{}\n'.format(prefix, text)
        elif text:
            for line in textwrap.wrap(text, wraplen):
                result += '{}{}\n'.format(prefix, line.rstrip())
        else:
            result += '{}\n'.format(prefix)
    return result
wrapemail.qprefixpat = re.compile(r'^([\s>]*>)([^>]*)$')
wrapemail.wprefixpat = re.compile(r'^(\s+)(\S.*)?$')
wrapemail.whitepat   = re.compile(r'\s')

将我的原始消息中的文本输入 'wrap' 指定为 52 确实会产生我上面指定的输出。

随时改进或窃取它。 :)

python：用于包装通过电子邮件约定引用的文本的模块？

python: A module for wrapping text quoted via email conventions?

python

email

word-wrap

quote