仅解码文本电子邮件文件的一部分以进行 bash 处理

decoding only a portion of a text-email file for bash processing

我正在扫描 /home/vmail/ 子目录以查找收到的电子邮件文本文件,如果匹配到字符串则将其删除。优化脚本要归功于

my_new_del() {
    find /home/vmail -type f -name '*.some.file.pattern*' -exec grep -i -H -l -s "" {} + |
    xargs rm -f {}
}

它就像一个魅力,删除匹配我传递的字符串的文件。但是,我刚刚意识到某些文件的内容是 base64 编码的。这是一封垃圾邮件,内容是垃圾邮件,但看起来如下:

Return-Path: <Bartybeve@aznetwork.net>
X-Original-To: info@my_domain.com
Delivered-To: info@my_domain.com
Received: by some.qdmn.com (Postfix, from userid 5000)
        id D47C87F8CB; Thu, 11 Oct 2018 04:21:11 -0400 (EDT)
X-Original-To: info@my_domain.com
Delivered-To: info@my_domain.com
Received: from vlan131-44.aznetwork.net (unknown [185.129.1.44])
        by some.qdmn.com (Postfix) with ESMTP id 1F1077F8C9
        for info@my_domain.com Thu, 11 Oct 2018 04:21:05 -0400 (EDT)
Received: from unknown (60.233.87.144)
        by mmx09.tilkbans.com with ESMTP; Thu, 11 Oct 2018 00:16:37 -0700
Received: from unknown (124.156.103.124)
        by mailout.endmonthnow.com with ASMTP; Thu, 11 Oct 2018 00:10:28 -0700
Message-ID: <7B6B9A4E.9D85F307@aznetwork.net>
Date: Thu, 11 Oct 2018 00:10:28 -0700
Reply-To: "Anja" <Bartybeve@aznetwork.net>
From: "Anja" <Bartybeve@aznetwork.net>
User-Agent: Opera/7.02 (Windows ME; U)
MIME-Version: 1.0
To: "Anja" <info@my_domain.com>
Subject: I could not resist and pass by!
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: base64

PCFkb2N0eXBlIGh0bWw+DQo8aHRtbD4NCjxoZWFkPg0KPG1ldGEgY2hhcnNldD0idXRmLTgiPg0K
PC9oZWFkPg0KDQo8Ym9keT4NCjxwPjx0YWJsZSB3aWR0aD0iMTMlIiBib3JkZXI9IjAiPjx0Ym9k
eT48dHI+PHRkPjwvdGQ+PHRkPjwvdGQ+PHRkPjwvdGQ+PHRkPjwvdGQ+PHRkPjwvdGQ+PC90cj48
L3Rib2R5PjwvdGFibGU+PC9wPg0KPHA+V2FudCBtZT8gd2FubmEgZnVjayBtZT8gT2hoaGguLi4u
IG9rLCBjb21lIHRvIG1lICkpIEhlcmUgbXkgZm90byBhbmQgYWRkcmVzcywgZmluZCBtZSA6KSA8
L3A+DQo8cD48dGFibGUgd2lkdGg9IjcyJSIgYm9yZGVyPSIwIj48dGJvZHk+PHRyPjx0ZD48L3Rk
PjwvdHI+PC90Ym9keT48L3RhYmxlPjwvcD4NCjxhICAgaHJlZj0iaHR0cDovL2xvdmVmb3J5b3Uu
c3UiIHRhcmdldD0iX2JsYW5rIiBzdHlsZT0iZm9udC13ZWlnaHQ6IG5vcm1hbDtsZXR0ZXItc3Bh
Y2luZzogbm9ybWFsO2xpbmUtaGVpZ2h0OiAxMDAlO3RleHQtZGVjb3JhdGlvbjogbm9uZTtjb2xv
cjogIzc3NzsiPmh0dHA6Ly9sb3ZlZm9yeW91LnN1PC9hPg0KPHA+PHRhYmxlIHdpZHRoPSIyNyUi
IGJvcmRlcj0iMCI+PHRib2R5Pjx0cj48dGQ+PC90ZD48dGQ+PC90ZD48dGQ+PC90ZD48dGQ+PC90
ZD48L3RyPjwvdGJvZHk+PC90YWJsZT48L3A+DQo8YSBocmVmPSJodHRwOi8vbG92ZWZvcnlvdS5z
dSI+PGltZyBzcmM9Imh0dHBzOi8vNzgubWVkaWEudHVtYmxyLmNvbS83ZTU3ZjBlMDUzZWNlYjA2
MGQwZDMyMzQ3NmQxZWI3MS90dW1ibHJfb3kycmd4TkRFYzF3MmtqZGRvMV80MDAuZ2lmIiBhbHQ9
ImNsaWNrIGhlcmUgYW5kIHNlZSBteSBwaG90byIgYm9yZGVyPSIwIiA+PC9hPg0KPHA+PHRhYmxl
IHdpZHRoPSI3NiUiIGJvcmRlcj0iMCI+PHRib2R5Pjx0cj48dGQ+PC90ZD48dGQ+PC90ZD48dGQ+
PC90ZD48dGQ+PC90ZD48dGQ+PC90ZD48L3RyPjwvdGJvZHk+PC90YWJsZT48L3A+DQo8YSBocmVm
PSJodHRwOi8vbG92ZWZvcnlvdS5zdSI+dW5zdWJzY3JpYmU8L2E+DQo8cD48dWw+PC91bD48L3A+
DQo8L2JvZHk+DQo8L2h0bWw+DQo=

因此,当我尝试使用别名 bash 命令查找内容与字符串匹配的文件时 - 不会标记上述电子邮件文件。

我知道我可以使用 echo 'some-base64-encoded-text' | base64 --decode 来解码消息。 web decoding tool 确实告诉我解码后的文本包含垃圾邮件部分。

我想先 grep 寻找 Content-Transfer-Encoding: base64 匹配项,然后找到 Content-Transfer-Encoding: base64 字符串的索引,然后从那里解码消息,回显出来,然后 grep查找匹配项,如果找到匹配项,则删除该文件。

但是,有没有一种简单的方法可以即时完成?

这是一些 perl。它需要 MIME::Base64 (cpan install MIME::Base64)

#!perl
use strict; 
use warnings; 
use autodie;
use MIME::Base64;
$/ = "";
for my $file (@ARGV) {
    open my $fh, "<", $file; 
    my @paragraphs = <$fh>; 
    close $fh; 
    my $header = shift @paragraphs;
    my $content;
    if ($header =~ /Content-Transfer-Encoding: base64/) {
        $content = decode_base64($paragraphs[0]);
    }
    else {
        $content = join "\n\n", @paragraphs;
    }
    if ($content =~ /$ENV{pattern}/) { 
        print "delete: $file\n";
        ## unlink $file;   # uncomment to really delete the file
    }
}

然后你可以做:

find ... -exec env pattern="" perl email_scanner.pl +