如何使用 Google Apps 脚本从网站获取特殊字符

Question

我正在获取一个网站，但是来自 .getContentText() 或 .getContentText("UTF-8") 的字符串中的所有特殊字符都被编码为 ’等等。我真的运行没主意了，老实说，我不太明白这种编码是在什么时候发生的。非常感谢你的帮助。我可以通过“手动”替换所有出现的事件来解决它，但这似乎不太干净。

var response = UrlFetchApp.fetch("https://podtail.com/de/top-podcasts/de/");
var html = response.getContentText();

Answer 1

您的示例代码表明您正在检索特定页面的 HTML 源代码。 HTML 源代码使用 ’ 和朋友，因此数据将采用该格式。目前还不清楚为什么你需要解码那些 HTML 个实体。

如果您真的需要在 Google Apps 脚本中完全解码 HTML，您将需要一个相当复杂的解析器。有some shortcuts that you can try if your app has an HTML user interface of its own, but it would probably make more sense to use a library like the one by mathiasbynens.

如果您只想将一些 HTML 实体替换为它们的非编码等效项，您可能只想使用 String.replace().

如何使用 Google Apps 脚本从网站获取特殊字符

How to fetch special chars from Website with Google Apps Script

encoding

special-characters

fetch

google-apps-script