Python爬虫获取农业银行结售汇汇率

文章作为笔记记录学习过程

参考：

https://blog.csdn.net/u012662731/article/details/78537432

需求：通过qq邮箱将每天获取的银行美元实时汇率发送到邮箱

环境：开发环境Mac Os，Sublime Text3， Python3

服务器环境LAMP+Python3 (CentOs7)

首先先查看一下目标网站 http://app.abchina.com/static/app/ll/exchangerate/

ABC

爬一下整个网页看看：

requests

需要安装requests命令，在终端中输入：

pip install requests

新建一个名叫test.py的文件，输入以下代码：

import requests

if __name__ == '__main__':
	url = 'http://app.abchina.com/static/app/ll/exchangerate/'
	req = requests.get(url)
	print(req.text)

执行后发现有乱码：

扫描二维码关注公众号，回复： 16563877 查看本文章

解决办法：对req进行重新编码

import requests

if __name__ == '__main__':
	url = 'http://app.abchina.com/static/app/ll/exchangerate/'
	req = requests.get(url)
	req.encoding = 'utf-8'
	print(req.text)

通过观察网页代码发现网站内数据由js渲染，这就导致我们获取到的代码没有实时汇率的内容

解决办法：使用selenium

selenium

首先安装命令：

pip install selenium

导入selenium里的webdriver

我用的是Chrome浏览器，需要下载ChromeDriver

Mac环境下将下载的文件拖进bin目录就可以，可以参考下面安装教程：

Windows：https://www.jianshu.com/p/dd848e40c7ad

Mac:https://blog.csdn.net/ywj_486/article/details/80940087

注意：Mac下的usr/local/bin目录可能出现无法写入的情况，可参考https://blog.csdn.net/a547720714/article/details/52678643

import requests
import time
from selenium import webdriver

if __name__ == '__main__':
	driver = webdriver.Chrome()
	url = "http://app.abchina.com/static/app/ll/exchangerate/"
	driver.get(url)
	time.sleep(2) #暂停2秒让网页加载完，这里需要导入time库
	html = driver.page_source
	print(html)
	driver.close()

打印出来可以看到这一部分是汇率信息

密密麻麻一大堆，这个时候就要用到BeautifulSoup来解析网站代码来

BeautifulSoup

pip install beautifulsoup4

import requests
import time
from selenium import webdriver
from bs4 import BeautifulSoup

if __name__ == '__main__':
	driver = webdriver.Chrome()
	url = "http://app.abchina.com/static/app/ll/exchangerate/"
	driver.get(url)
	time.sleep(2) #暂停2秒让网页加载完，这里需要导入time库
	html = driver.page_source

	bf = BeautifulSoup(html,"html.parser")
	ul = bf.find_all('ul', class_ = 'g-priceLst') #通过class标签寻找目标
	li_ul = BeautifulSoup(str(ul),"html.parser")
	li = li_ul.find_all('li')

	for i in range(len(li)):
		print(li[i],'\n')
	
	driver.close()

运行后得到：

可以看到美元的买入和卖出价分别是：li[5]和li[6]

我将代码与邮件方法整合一下

邮箱需要导入smtplib库

参考： https://blog.csdn.net/mumuqingwei/article/details/82015459

import requests, sys
import time
import schedule
from bs4 import BeautifulSoup
from selenium import webdriver

import smtplib
from email.mime.text import MIMEText
from email.header import Header

from selenium.webdriver.chrome.options import Options



def getRate():
	chrome_options = Options()
	chrome_options.add_argument('--headless')
	chrome_options.add_argument('--disable-gpu')
	chrome_options.add_argument('--no-sandbox')
	driver = webdriver.Chrome(chrome_options=chrome_options)
	url = "http://app.abchina.com/static/app/ll/exchangerate/"
	driver.get(url)
	time.sleep(2)
	html = driver.page_source
	bf = BeautifulSoup(html,"html.parser")
	ul = bf.find_all('ul', class_ = 'g-priceLst')
	li_ul = BeautifulSoup(str(ul),"html.parser")
	li = li_ul.find_all('li')

	content = "银行买入价："+li[5].string+"\n"+"银行卖出价："+li[6].string
	driver.close()
	sent_email(mail_body = content)
	

def sent_email(mail_body):
	print(mail_body)
	mail_host="smtp.qq.com"#设置的邮件服务器host必须是发送邮箱的服务器，与接收邮箱无关。
	mail_user=""#qq邮箱登陆名
	mail_pass=""  #开启stmp服务的时候并设置的授权码，注意！不是QQ密码。

	sender=''#发送方qq邮箱
	receivers=['']#接收方qq邮箱

	message=MIMEText(mail_body,'plain','utf-8')
	message['From']=Header("将发件人写在这",'utf-8') #设置显示在邮件里的发件人
	message['To']=Header("接受者写在这",'utf-8') #设置显示在邮件里的收件人

	subject ='ABC中国农业银行结汇售牌价 美元USD'
	message['Subject']=Header(subject,'utf-8') #设置主题和格式

	try:
		smtpobj=smtplib.SMTP_SSL(mail_host,465) #本地如果有本地服务器，则用localhost ,默认端口２５,腾讯的（端口465或587）
		smtpobj.set_debuglevel(1)
		smtpobj.login(mail_user,mail_pass)#登陆QQ邮箱服务器
		smtpobj.sendmail(sender,receivers,message.as_string())#发送邮件
		print("邮件发送成功")
		smtpobj.quit()#退出
	except smtplib.SMTPException as e :
		print("Error:无法发送邮件")
		print(e)


#这个方法是定时器
def run():
	schedule.every().day.at("08:00").do(getRate) #每天08:00执行getRate()方法，注意是08:00，填8:00会报错
	# schedule.every(5).minutes.do(getRate) #每5分钟执行一次
	# schedule.every().hour.do(job) #每小时执行一次
	# schedule.every().monday.do(job) #每周一执行
	# schedule.every().wednesday.at("13:15").do(job) #每周三13:15执行

	while True:
		schedule.run_pending()
		time.sleep(1)

if __name__=="__main__":
	run() #执行定时器

将代码文件上传到服务器，使用nohup命令执行文件（参考：https://blog.csdn.net/lzw17750614592/article/details/89092319）

nohup python -u test.py >nohup.out 2>&1 &

补充：在服务器需要安装Chrome浏览器的Centos版本，可以参考：https://blog.csdn.net/yushun17/article/details/84112730

安装完Chrome后需要安装对应版本的ChromeDriver，可以参考：https://blog.csdn.net/zzzcl112/article/details/80470884