为什么像 () "" : [] 这样的特殊字符经常在训练翻译机之前从数据中删除?
Why special characters like () "" : [] are often removed from data before training translation machine?
我看到人们经常在训练翻译机之前从数据中删除特殊字符,如 () "" : []。你能为我解释一下这样做的好处吗?
执行日期 clean-up 或 pre-processing 以便算法可以专注于重要的、具有语言意义的“单词”而不是“噪音”。见 "Removing Special Characters":
Special characters, as you know, are non-alphanumeric characters.
These characters are most often found in comments, references,
currency numbers etc. These characters add no value to
text-understanding and induce noise into algorithms.
只要这种噪音进入模型,它就会在推理时产生输出,其中包含这些意想不到的字符(序列),甚至会影响整体翻译。在日文翻译中经常出现带括号的情况。
我看到人们经常在训练翻译机之前从数据中删除特殊字符,如 () "" : []。你能为我解释一下这样做的好处吗?
执行日期 clean-up 或 pre-processing 以便算法可以专注于重要的、具有语言意义的“单词”而不是“噪音”。见 "Removing Special Characters":
Special characters, as you know, are non-alphanumeric characters. These characters are most often found in comments, references, currency numbers etc. These characters add no value to text-understanding and induce noise into algorithms.
只要这种噪音进入模型,它就会在推理时产生输出,其中包含这些意想不到的字符(序列),甚至会影响整体翻译。在日文翻译中经常出现带括号的情况。