" 表示双引号字符的形式是什么?

What form is " where it represents the double quote character?

我有一个包含 JSON 条记录的数据集,如下所示:

{"reviewerID": "A3SSQRWUP2A04Q", "asin": "B0000224UE", "reviewerName": "Wilson", "helpful": [0, 0], "reviewText": "I love this tool. Yes it is heavy. Yes you can wear it on your belt and no one will notice. Really. And yes, you get a great amount of tools. I already always carry a Swiss Army Knife on me, so I went with this model to get the serrated blade, and not the scissors.It is perfectly aligned, buttery smooth, nicer than my Leatherman Rebar, and yes, bigger and heavier. The tools come out on the outside, and lock with a satisfying "snick." The release is easier to use than the Leatherman Rebar. Which itself is a nice tool, I carry that in my daily work messenger bag.The pliers are strong and super easy. One nice touch -- the ruler, inches/centimeters, is far easier to read than the Leatherman. The result I think of the brightly polished steel.", "overall": 5.0, "summary": "Top of the line heavy multi tool", "unixReviewTime": 1396224000, "reviewTime": "03 31, 2014"}

当我在google中搜索符号"时,它会自动将其转换为双引号(")符号。这是什么形式,如何将它们变成可读格式,例如 nltk?

啊,那是他们的HTML编码!在这里查看 table:https://www.toptal.com/designers/htmlarrows/symbols/

在Python中,您可以使用内置的html模块将这些转换为普通字符。

import html
normal_string = html.unescape(string_with_html_entities)