foreword
How to implement mail parsing through python? The format of the mail is very complex, mainly the mime protocol. This article mainly starts from the realization, and the specific principle can be studied by yourself.
1. Installation
Mail parsing is achieved through mailgun's open source Flanker library. The library contains email address parsing and email mime format parsing.
Enter the following command:
pip install flanker
Second, the code implementation
1. Mail header
def emlAnayalyse(path):
with open(path, 'rb') as fhdl:
raw_email = fhdl.read()
eml = mime.from_string(raw_email)
subject = eml.subject
eml_header_from = eml.headers.get('From')
eml_header_to = eml.headers.get('To')
eml_header_cc=eml.headers.get('Cc')
eml_time = eml.headers.get('Date')
# get_annex(eml, '1')
eml_attachs=attachEml1(eml)
eml_body = contentEml(eml)
f = HTMLFilter()
f.feed(eml_body)
print(f.text)
def main():
path='邮件名.eml'
emlAnayalyse(path)
if __name__ == "__main__":
main()
The eml.header contains header information such as sender, recipient, cc, and time.
2. Email body
# 邮件正文
def contentEml(eml):
# 判断是否为单部分
if eml.content_type.is_singlepart():
eml_body = eml.body
else:
eml_body = ''
for part in eml.parts:
# 判断是否是多部分
if part.content_type.is_multipart():
eml_body = contentEml(part)
else:
if part.content_type.main == 'text':
eml_body = part.body
return eml_body
Through the callback function, take out the body part of the email
3. Mail attachments
def attachEml1(eml):
for part in eml.parts:
if not part.content_type.is_multipart():
name = part.detected_file_name
with open(name, 'wb') as annex:
annex.write(part.body)
Determine whether it is an attachment by content_type.is_multipart() and save it.
Summarize
The basic content of email parsing has been introduced, and friends who need it can communicate more! ! !