从列表中计算带有 \r\n 换行符的对联

count couplets from a list with \r\n line breaks

我正在尝试从一组歌词中获取对联的数量。假设歌词是:

I saw a little hermit crab
His coloring was oh so drab

It’s hard to see the butterfly
Because he flies across the sky

等等等...

Once upon a time
She made a little rhyme
Of course, of course

Before we say again
The pain the pain
A horse, a horse

Lightening, thunder, all around
Soon the rain falls on the ground

I tire of writing poems and rhyme

它们作为字符串存储在数据库中,由 u'\r\n' 分隔并通过 string.splitlines(树),对象将它们存储为这样:

>>> lyrics[6].track_lyrics['lyrics']
[u'I saw a little hermit crab\r\n', u'His coloring was oh so drab\r\n', u'\r\n', u'It\u2019s hard to see the butterfly\r\n', u'Because he flies across the sky\r\n', u'\r\n',  u'\r\n', u'Before we say again\r\n', u'The pain the pain\r\n', u'A horse, a horse\r\n', u'\r\n', u'Lightening, thunder, all around\r\n', u'Soon the rain falls on the ground\r\n', u'\r\n', u'I tire of writing poems and rhyme\r\n']

我可以接近这个:

len([i for i in lyrics if i != "\r\n"]) / 2

但它也将一组、三行或更多行算作对联。

我有点明白了,它基本上是说,如果前面有 "\r\n" 一行,后面有两行,我们就是对联:

>>> for k,v in enumerate(lyric_list):
...     if lyric_list[k+2] == "\r\n" and lyric_list[k-1] == "\r\n":
...             print(v)
... 
It’s hard to see the butterfly

Hear the honking of the goose


Lightening, thunder, all around

但是,当然:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IndexError: list index out of range

我可以像这样使用 tryexcept IndexError:

>>> if len(lyric_string) > 1:
...     for k, v in enumerate(lyric_string):
...             if k == 0 and lyric_string[k+2] == "\r\n":
...                     print(v)
...             elif lyric_string[k-1] == "\r\n" and lyric_string[k+2] == "\r\n":
...                     print(v)
... 
I saw a little hermit crab

It’s hard to see the butterfly

Hear the honking of the goose

His red sports car is just a dream

The children like the ocean shore

I made the cookies one by one

My cat, she likes to chase a mouse,

Lightening, thunder, all around

Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
IndexError: list index out of range

而且我考虑过做这样的事情,这更丑陋而且行不通! (只获取第一行和最后一行):

>>> if len(lyric_string) > 1:
...     for k, v in enumerate(lyric_string):
...             if k == 0 and lyric_string[k+2] == "\r\n":
...                     print(v)
...             elif lyric_string[k-1] == "\r\n" and (k+2 > len(lyric_string) \
...                                                     or lyric_string[k+2] == "\r\b"):
...                     print(v)

但我敢打赌还有更多 eloquent 甚至 pythonic 方法。

一种更简单的方法:用“”连接整个数组并计算换行符的出现次数。

>>> s = """Once upon a time
... She made a little rhyme
... Of course, of course
...
... Before we say again
... The pain the pain
... A horse, a horse
...
... Lightening, thunder, all around
... Soon the rain falls on the ground
...
... I tire of writing poems and rhyme"""

然后做:

>>> s.strip().count("\n\n") + 1
4

要在上面的代码中得到 s,您需要进行额外的连接。一个例子

s = "".join(lyrics[6].track_lyrics['lyrics'])

我在我的系统上使用 \n,您可能需要在您的系统上使用 \r\n

我假设对联是一组包含 2 行的行。

您可以通过拆分成块,然后计算每个块中的行数来实现此目的。在这个例子中,我计算了一个块中换行符的数量(在一个对联中应该是 1)。

>>> text = """I saw a little hermit crab
... His coloring was oh so drab
... 
... It’s hard to see the butterfly
... Because he flies across the sky
... 
... etc etc...
... 
... Once upon a time
... She made a little rhyme
... Of course, of course
... 
... Before we say again
... The pain the pain
... A horse, a horse
... 
... Lightening, thunder, all around
... Soon the rain falls on the ground
... 
... I tire of writing poems and rhyme
... """.replace('\n', '\r\n')
>>> len([block for block in text.split('\r\n\r\n') if block.count('\r\n') == 1])
3

这也假设每个块之间正好有两个换行符。要处理 2+ 换行符,你可以使用:

import re
...
.. block for block in re.split(r'(?:\r\n){2,}', text) ..