Spamassassin:贝叶斯学习在这里起作用吗?

Spamassassin: is bayesian learning working here?

我正在尝试训练最近安装的 Spamassassin 副本,我的印象是贝叶斯学习不起作用。

首先:是的,spamd 是 运行 --allow-tell 选项。

现在,我收到了一封垃圾邮件。我首先 运行 它是由 Spamassassin 获得的分数:

[paulo@myserver ~]$ spamc -R < spam6.txt 
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Nombre - herbertrl1 E-mail: - mu18@atsushi1010.masumi76.pushmail.fun
   Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
   porn star carl paula blum porn double d hamster porn video oiled porn clitoris
   massage free young nubile porn [...] 

Content analysis details:   (2.9 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
              [Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
 1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: sexjanet.com]
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record

所以我使用 -L 选项将其提供给 spamc

[paulo@myserver ~]$ spamc -L spam < spam6.txt
Message successfully un/learned

然后我再次尝试使用 spamc 对其进行分析...我得到了完全相同的分数:

[paulo@myserver ~]$ spamc -R < spam6.txt 
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Nombre - herbertrl1 E-mail: - mu18@atsushi1010.masumi76.pushmail.fun
   Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
   porn star carl paula blum porn double d hamster porn video oiled porn clitoris
   massage free young nubile porn [...] 

Content analysis details:   (2.9 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
              [Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
 1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: sexjanet.com]
 0.0 SPF_HELO_NONE          SPF: HELO does not publish an SPF Record

我是不是漏掉了什么?

SpamAssasin:贝叶斯需要多少学习?

默认的 spamassassin 配置需要至少 200 条垃圾邮件和 200 条非垃圾邮件来训练贝叶斯。您可以执行 sa-learn --dump magic 来检查传递给贝叶斯学习的消息数。

man Mail::SpamAssassin::Conf (SpamAssassin version 3.1)

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings

$ sa-learn --dump magic
[…]
0.000          0       2508          0  non-token data: nspam
0.000          0        508          0  non-token data: nham
[…]