从 python 中的列表项中的某个单词后获取子字符串

Question

我正在使用 BeautifulSoup 从 goodreads 页面获取书名。

示例 HTML -

<td class="field title"><a href="/book/show/12996.Othello" title="Othello">
  Othello
</a></td>

我想获取锚标签之间的文本。使用下面的代码，我可以以列表形式获取 with class="field title" 的所有子项。

for txt in soup.findAll('td',{'class':"field title"}):
    child = txt.findAll('a')

给出输出-

[<a href="/book/show/12996.Othello" title="Othello">
  Othello
</a>]
...

如何只获取 'Othello' 部分？此正则表达式不起作用 -

for ch in child:
    match = re.search(r"([.]*)title=\"<name>\"([.]*)",str(ch))
    print(match.group('name'))

Answer 1

只需打印 txt 的文本（感谢@angurar 澄清 OP 的要求）：

for txt in soup.findAll('td',{'class':"field title"}):
    print txt.string

或者，如果您在 <a> 的标题属性之后：

for txt in soup.findAll('td',{'class':"field title"}):
    print [a.get('title') for a in txt.findAll('a')]

它将return列出所有<a>标题的属性。

Get a substring from a list item in python after a word