Python爬虫 requests教学(五):常见异常处理

网页出现乱码

出现乱码的原因是因为网页解码过程中没有设置如何编码

response.encoding = response.apparent_encoding

Python爬虫、数据分析、网站开发等案例教程视频免费在线观看

https://space.bilibili.com/523606542 

Python学习交流群:1039649593

请求头参数

InvalidHeader: Invalid return character or leading space in header: User-Agent

import requests

headers = {
  'User-Agent': ' Mozilla/5.0 (windows NT 10.0; wow64) Applewebkit/537.36(KHTML,like Gecko) chrome/84.0.4128.3 safari/537.36'
}
response = requests.get( ' http: //www.shuquge.com/txt/8659/index.htm1 ' ,
headers=headers)
response.encoding = response.apparent_encoding
html = response.text
print(htm7)

其实很难发现问题在哪,但事实上是因为'Mozilla'之前多了个空格,把空格删去即可

得不到数据&参数错误

import requests

headers = {
  'Host' : 'www.guazi. com ' ,
  'User-Agent ': 'Mozi11a/5.0 (windows NT 10.0; wOw64) ApplewebKit/537.36(KHTML,like Gecko) chrome/84.0.4128.3 safari/537.36',
}
response = requests.get( ' https: //www.guazi.com/cs/20e17311773b1706x.htm',
headers=headers)
response.encoding = response.apparent_encoding
print(response.text)

请求到的数据与期待的数据不一样,这时候肯定是某些参数出现了问题.就检查是不是缺少了参数或者给错了参数.

目标计算机积极拒绝

import requests

proxy_response = requests.get( 'http://134.175.188.27:5010/get')
proxy = proxy_response.json()
print(proxy)

错误

requests.exceptions.ConnectionError: 
HTTPConnectionPoo1(host='134.175.188.27',port=5010): 
Max retries exceeded with url: /get (caused byNewConnectionError( ' <ur1lib3.connection.HTTPConnection object at Ox0000023AB83AC828>: Failed to establish a new connection: [winError 10061]由于目标计算机积极拒绝,无法连接。',))
  • 被识别了
  • 网址输入错误了
  • 服务器停止提供服务器了

链接超时

import requests

proxy_response = requests.get( ' http://134.175.188.27:5010/get', timeout=0.0001)
proxy = proxy_response.json(
print(proxy)

错误

requests.exceptions.connectTimeout: 
HTTPConnectionPoo1(host='134.175.188.27'port=5010): 
Max retries exceeded with ur1: /get (caused byconnectTimeoutError(<ur1lib3.connection.HTTPConnection object at ox000002045EF9B8DO>,'Connection to 134.175.188.27 timed out.(connecttimeout=O.0001) '))

异常处理

import requests

try :
  proxy_response = requests.get( 'http:/ /134.175.188.27:5010/get',timeout=O.0001)
  proxy = proxy_response.json()
  print(proxy)
except:
  pass

猜你喜欢

转载自blog.csdn.net/m0_48405781/article/details/115249380