转换成utf16

Question

我正在抓取多个网站并提取产品名称。在某些名称中存在这样的错误：

Malecon 12 Jahre 0,05 ltr.<br>Reserva Superior
Bols Watermelon Lik\u00f6r 0,7l
Hayman\u00b4s Sloe Gin
Ron Zacapa Edici\u00f3n Negra
Havana Club A\u00f1ejo Especial
Caol Ila 13 Jahre (G&amp;M Discovery)

我该如何解决？我正在使用 xpath 和 re.search 来获取名称。

在每个 Python 文件中，这是第一个代码：# -*- coding: utf-8 -*-

编辑：

这是源代码，我是如何得到这些信息的。

if '"articleName":' in details:
                            closer_to_product = details.split('"articleName":', 1)[1]
                            closer_to_product_2 = closer_to_product.split('"imageTitle', 1)[0]
                            if debug_product == 1:
                                print('product before try:' + repr(closer_to_product_2))
                            try:
                                found_product = re.search(f'{'"'}(.*?)'f'{'",'}'closer_to_product_2).group(1)
                            except AttributeError:
                                found_product = ''
                            if debug_product == 1:
                                print('cleared product: ', '>>>' + repr(found_product) + '<<<')
                            if not found_product:
                                print(product_detail_page, found_product)
                                items['products'] = 'default'
                            else:
                                items['products'] = found_product

详情

product_details = information.xpath('/*').extract()
product_details = [details.strip() for details in product_details]

Answer 1

哪里有问题(Python3.8.3)?

import html

strings = [
  'Bols Watermelon Lik\u00f6r 0,7l',
  'Hayman\u00b4s Sloe Gin',
  'Ron Zacapa Edici\u00f3n Negra',
  'Havana Club A\u00f1ejo Especial',
  'Caol Ila 13 Jahre (G&amp;M Discovery)',
  'Old Pulteney \u00b7 12 Years \u00b7 40% vol',
  'Killepitsch Kr\u00e4uterlik\u00f6r 42% 0,7 L']
  
for str in strings:
  print( html.unescape(str).
                encode('raw_unicode_escape').
                decode('unicode_escape') )

Bols Watermelon Likör 0,7l
Hayman´s Sloe Gin
Ron Zacapa Edición Negra
Havana Club Añejo Especial
Caol Ila 13 Jahre (G&M Discovery)
Old Pulteney · 12 Years · 40% vol
Killepitsch Kräuterlikör 42% 0,7 L

编辑使用 .encode('raw_unicode_escape').decode('unicode_escape') 加倍 反向 Solidi，参见 Python Specific Encodings

转换成utf16

Convert in utf16

html

python

utf-8

python-unicode