正确转义 html 个字符

Properly unescape html characters

我正在尝试获取电影简介,但无法转义一些麻烦的角色:

import requests
from lxml import html

res = requests.get('https://play.google.com/store/tv/show?id=lXH-sW6govE')
node=html.fromstring(res.content)
synopsis=node.xpath("//div[contains(@class, 'details-section') and contains(@class, 'description')]/meta")[0].attrib['content']

u'"Work Out New York" invites viewers to break a sweat with some of New York City\xe2\x80\x99s hottest personal trainers. They may be friends, but these high-end fitness experts compete against each other to earn the business of wealthy patrons and celebrity clientele. With training techniques and fitness regimens constantly evolving, these trainers better shape up or risk losing their clients to their competitors. Romances, jealousies, and bitter rivalries provide the ultimate test of endurance for these fitness fanatics.'

如何在 https://play.google.com/store/tv/show?id=lXH-sW6govE 处获得正确编码的概要,即 ""Work Out New York" invites viewers to break a sweat with some of New York City’s hottest personal trainers. They may be friends, but these high-end fitness experts compete against each other to earn the business of wealthy patrons and celebrity clientele. With training techniques and fitness regimens constantly evolving, these trainers better shape up or risk losing their clients to their competitors. Romances, jealousies, and bitter rivalries provide the ultimate test of endurance for these fitness fanatics."

您可以使用 HTMLParser 来取消转义。类似于:

print HTMLParser.HTMLParser().unescape(synopsis)
"Work Out New York" invites viewers to break a sweat with some of New York Cityâs hottest personal trainers. They may be friends, but these high-end fitness experts compete against each other to earn the business of wealthy patrons and celebrity clientele. With training techniques and fitness regimens constantly evolving, these trainers better shape up or risk losing their clients to their competitors. Romances, jealousies, and bitter rivalries provide the ultimate test of endurance for these fitness fanatics.

此处有更多详细信息:How do I unescape HTML entities in a string in Python 3.1?