使用 Python 从 TXT 文件解析 MIME 文件
Parsing MIME Files from TXT file using Python
我正在尝试解析 TXT 文件中的 Mimefile 格式如下
Content-Type: multipart/related; boundary="MIMEBoundary"
MIME-Version: 1.0
Content-Description: This Transmission File is created with Pegasus Test Suite
X-eFileRoutingCode: MEF
Content-Transfer-Encoding: Binary
Content-Location: manifest_xml
--MIMEBoundary
Content-Type: text/xml; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
<?xml version='1.0' encoding='UTF-8'?>
<SOAP:Envelope xmlns="http://www.efiles.id/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/" xmlns:efile="http://www.efiles.id/efile" xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ ../message/SOAP.xsd http://www.files./efile ../message/efileMessage.xsd"><SOAP:Header><IFATransmissionHeader><MessageId>012342018ABCDEFGHIJK</MessageId><TransmissionTs>2018-07-01T09:51:56-05:00</TransmissionTs><TransmitterDetail><ETIN>XXXXX</ETIN></TransmitterDetail></IFATransmissionHeader></SOAP:Header><SOAP:Body><TransmissionManifest><SubmissionDataList><Cnt>1</Cnt><SubmissionData><SubmissionId>0123452018OPQRSTUVWX</SubmissionId><ElectronicPostmarkTs>2018-07-01T09:51:56-05:00</ElectronicPostmarkTs></SubmissionData></SubmissionDataList></TransmissionManifest></SOAP:Body></SOAP:Envelope>
--MIMEBoundary
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
UEsDBBQAAAAAAOh2lFQAAAAAAAAAAAAAAAALAAAAYXR0YWNobWVudC9QSwMEFAAAAAAA6HaUVAAA
AAAAAAAAAAAAAAQAAAB4bWwvUEsDBBQAAAAAAOh2lFT0i/87pT0AAKU9AAAVAAAAYXR0YWNobWVu
dC9zYW1wbGUucGRmJVBERi0yLjAKJbq63toKMSAwIG9iajw8L1R5cGUvQ2F0YWxvZy9QYWdlcyAy
IDAgUi9NZXRhZGF0YSAxMiAwIFI+PgplbmRvYmoKMiAwIG9iajw8L1R5cGUvUGFnZXMvS2lkc1sz
IDAgUl0vQ291bnQgMT4+CmVuZG9iagoxMiAwIG9iajw8L1R5cGUvTWV0YWRhdGEvU3VidHlwZS9Y
--MIMEBoundary--
我想分开 XML 并将其编码成单独的字符串,我是如何制作这样的解码器的
class NewlineSafeBytesParser(email.parser.BytesParser):
def parse(self, fp, headersonly=False):
from io import TextIOWrapper
fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')
try:
return self.parser.parse(fp, headersonly)
finally:
fp.detach()
# Subclassing:
parser = NewlineSafeBytesParser()
但是我得到了损坏的解码结果,如何将编码的 zip 文件 (Base64) 与 txt 文件分开,以便可以单独解码?
在 python3 上解析并仅显示 Content-Type: text/xml
mailFile = open("list1.txt", "rb").read()
msg = email.message_from_file(mailFile)
for part in msg.walk():
print(part.get_content_type())
if part.get_content_type() == 'text/xml':
print(part.get_payload())
<?xml version='1.0' encoding='UTF-8'?>
<SOAP:Envelope xmlns="http://www.efiles.id/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/" xmlns:efile="http://www.efiles.id/efile" xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ ../message/SOAP.xsd http://www.files./efile ../message/efileMessage.xsd"><SOAP:Header><IFATransmissionHeader><MessageId>012342018ABCDEFGHIJK</MessageId><TransmissionTs>2018-07-01T09:51:56-05:00</TransmissionTs><TransmitterDetail><ETIN>XXXXX</ETIN></TransmitterDetail></IFATransmissionHeader></SOAP:Header><SOAP:Body><TransmissionManifest><SubmissionDataList><Cnt>1</Cnt><SubmissionData><SubmissionId>0123452018OPQRSTUVWX</SubmissionId><ElectronicPostmarkTs>2018-07-01T09:51:56-05:00</ElectronicPostmarkTs></SubmissionData></SubmissionDataList></TransmissionManifest></SOAP:Body></SOAP:Envelope>
我正在尝试解析 TXT 文件中的 Mimefile 格式如下
Content-Type: multipart/related; boundary="MIMEBoundary"
MIME-Version: 1.0
Content-Description: This Transmission File is created with Pegasus Test Suite
X-eFileRoutingCode: MEF
Content-Transfer-Encoding: Binary
Content-Location: manifest_xml
--MIMEBoundary
Content-Type: text/xml; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
<?xml version='1.0' encoding='UTF-8'?>
<SOAP:Envelope xmlns="http://www.efiles.id/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/" xmlns:efile="http://www.efiles.id/efile" xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ ../message/SOAP.xsd http://www.files./efile ../message/efileMessage.xsd"><SOAP:Header><IFATransmissionHeader><MessageId>012342018ABCDEFGHIJK</MessageId><TransmissionTs>2018-07-01T09:51:56-05:00</TransmissionTs><TransmitterDetail><ETIN>XXXXX</ETIN></TransmitterDetail></IFATransmissionHeader></SOAP:Header><SOAP:Body><TransmissionManifest><SubmissionDataList><Cnt>1</Cnt><SubmissionData><SubmissionId>0123452018OPQRSTUVWX</SubmissionId><ElectronicPostmarkTs>2018-07-01T09:51:56-05:00</ElectronicPostmarkTs></SubmissionData></SubmissionDataList></TransmissionManifest></SOAP:Body></SOAP:Envelope>
--MIMEBoundary
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
UEsDBBQAAAAAAOh2lFQAAAAAAAAAAAAAAAALAAAAYXR0YWNobWVudC9QSwMEFAAAAAAA6HaUVAAA
AAAAAAAAAAAAAAQAAAB4bWwvUEsDBBQAAAAAAOh2lFT0i/87pT0AAKU9AAAVAAAAYXR0YWNobWVu
dC9zYW1wbGUucGRmJVBERi0yLjAKJbq63toKMSAwIG9iajw8L1R5cGUvQ2F0YWxvZy9QYWdlcyAy
IDAgUi9NZXRhZGF0YSAxMiAwIFI+PgplbmRvYmoKMiAwIG9iajw8L1R5cGUvUGFnZXMvS2lkc1sz
IDAgUl0vQ291bnQgMT4+CmVuZG9iagoxMiAwIG9iajw8L1R5cGUvTWV0YWRhdGEvU3VidHlwZS9Y
--MIMEBoundary--
我想分开 XML 并将其编码成单独的字符串,我是如何制作这样的解码器的
class NewlineSafeBytesParser(email.parser.BytesParser):
def parse(self, fp, headersonly=False):
from io import TextIOWrapper
fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')
try:
return self.parser.parse(fp, headersonly)
finally:
fp.detach()
# Subclassing:
parser = NewlineSafeBytesParser()
但是我得到了损坏的解码结果,如何将编码的 zip 文件 (Base64) 与 txt 文件分开,以便可以单独解码?
在 python3 上解析并仅显示 Content-Type: text/xml
mailFile = open("list1.txt", "rb").read()
msg = email.message_from_file(mailFile)
for part in msg.walk():
print(part.get_content_type())
if part.get_content_type() == 'text/xml':
print(part.get_payload())
<?xml version='1.0' encoding='UTF-8'?>
<SOAP:Envelope xmlns="http://www.efiles.id/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/" xmlns:efile="http://www.efiles.id/efile" xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ ../message/SOAP.xsd http://www.files./efile ../message/efileMessage.xsd"><SOAP:Header><IFATransmissionHeader><MessageId>012342018ABCDEFGHIJK</MessageId><TransmissionTs>2018-07-01T09:51:56-05:00</TransmissionTs><TransmitterDetail><ETIN>XXXXX</ETIN></TransmitterDetail></IFATransmissionHeader></SOAP:Header><SOAP:Body><TransmissionManifest><SubmissionDataList><Cnt>1</Cnt><SubmissionData><SubmissionId>0123452018OPQRSTUVWX</SubmissionId><ElectronicPostmarkTs>2018-07-01T09:51:56-05:00</ElectronicPostmarkTs></SubmissionData></SubmissionDataList></TransmissionManifest></SOAP:Body></SOAP:Envelope>