字符串中的正斜杠“/”转换为“&#47”，是独立于平台的行为吗？

Question

我有一个 Python 脚本，可以将 html 读入行，然后在将这些行保存回 html 文件之前过滤掉相关行。我遇到了一些问题，直到我发现页面文本中的 / 在另存为字符串时被转换为 &#47。

我正在解析的源 html 具有以下行：

<h3 style="text-align:left">SYDNEY/KINGSFORD SMITH (YSSY)</h3>

当通过 file.readlines() 时会出现：

<h3 style='text-align:left'>SYDNEY&#47BANKSTOWN (YSBK)</h3>

然后会触发 beautifulsoup，因为它会与“&”符号混淆，从而触发所有后续标签。

我感兴趣的是这个替换值“/”是否独立于平台？

在保存每个字符串之前运行一个 .replace 并不难，避免了现在我在 windows 上编码和测试的问题，但如果我在 linux 服务器上部署我的脚本？

这是我现在拥有的，当运行在 windows 下工作正常：

def getHTML(self,html_source):
    with open(html_source, 'r') as file:
        source_lines = file.readlines()
    relevant = False
    relevant_lines = []
    for line in source_lines:
        if "</table>" in line:
            relevant = False
        if self.airport in line:
            relevant = True
        if relevant:
            line = line.replace("&#47", " ")
            relevant_lines.append(line)
    relevant_lines.append("</table>")
    filename = f"{html_source[:-5]}_{self.airport}.html"
    with open(filename, 'w') as file:
        file.writelines(relevant_lines)
    with open(filename, 'r') as file:
        relevant_html = file.read()
    return relevant_html

任何人都可以告诉我，无需安装带有 linux 的虚拟机，这是否可以跨平台工作？我试图寻找这方面的文档，但我能找到的只是关于在输入字符串时显式转义 / 的方法，没有任何文档记录如何处理 / 或阅读时读取的其他无效字符将源文件转换为字符串。

Answer 1

应该哪里都可以，是一个标准。参见 https://www.w3schools.com/charsets/ref_html_ascii.asp

字符串中的正斜杠“/”转换为“&#47”，是独立于平台的行为吗？

Forward slash "/" in string converted to "&#47", is that platform independent behaviour?

python

cross-platform

invalid-characters

character-replacement