为什么像 () "" : [] 这样的特殊字符经常在训练翻译机之前从数据中删除？

Why special characters like () "" : [] are often removed from data before training translation machine?

我看到人们经常在训练翻译机之前从数据中删除特殊字符，如 () "" : []。你能为我解释一下这样做的好处吗？

执行日期 clean-up 或 pre-processing 以便算法可以专注于重要的、具有语言意义的“单词”而不是“噪音”。见 "Removing Special Characters":

Special characters, as you know, are non-alphanumeric characters. These characters are most often found in comments, references, currency numbers etc. These characters add no value to text-understanding and induce noise into algorithms.

只要这种噪音进入模型，它就会在推理时产生输出，其中包含这些意想不到的字符（序列），甚至会影响整体翻译。在日文翻译中经常出现带括号的情况。