网页出现乱码
出现乱码的原因是因为网页解码过程中没有设置如何编码
response.encoding = response.apparent_encoding
Python爬虫、数据分析、网站开发等案例教程视频免费在线观看
https://space.bilibili.com/523606542
Python学习交流群:1039649593
请求头参数
InvalidHeader: Invalid return character or leading space in header: User-Agent
import requests
headers = {
'User-Agent': ' Mozilla/5.0 (windows NT 10.0; wow64) Applewebkit/537.36(KHTML,like Gecko) chrome/84.0.4128.3 safari/537.36'
}
response = requests.get( ' http: //www.shuquge.com/txt/8659/index.htm1 ' ,
headers=headers)
response.encoding = response.apparent_encoding
html = response.text
print(htm7)
其实很难发现问题在哪,但事实上是因为'Mozilla'之前多了个空格,把空格删去即可
得不到数据&参数错误
import requests
headers = {
'Host' : 'www.guazi. com ' ,
'User-Agent ': 'Mozi11a/5.0 (windows NT 10.0; wOw64) ApplewebKit/537.36(KHTML,like Gecko) chrome/84.0.4128.3 safari/537.36',
}
response = requests.get( ' https: //www.guazi.com/cs/20e17311773b1706x.htm',
headers=headers)
response.encoding = response.apparent_encoding
print(response.text)
请求到的数据与期待的数据不一样,这时候肯定是某些参数出现了问题.就检查是不是缺少了参数或者给错了参数.
目标计算机积极拒绝
import requests
proxy_response = requests.get( 'http://134.175.188.27:5010/get')
proxy = proxy_response.json()
print(proxy)
错误
requests.exceptions.ConnectionError:
HTTPConnectionPoo1(host='134.175.188.27',port=5010):
Max retries exceeded with url: /get (caused byNewConnectionError( ' <ur1lib3.connection.HTTPConnection object at Ox0000023AB83AC828>: Failed to establish a new connection: [winError 10061]由于目标计算机积极拒绝,无法连接。',))
- 被识别了
- 网址输入错误了
- 服务器停止提供服务器了
链接超时
import requests
proxy_response = requests.get( ' http://134.175.188.27:5010/get', timeout=0.0001)
proxy = proxy_response.json(
print(proxy)
错误
requests.exceptions.connectTimeout:
HTTPConnectionPoo1(host='134.175.188.27'port=5010):
Max retries exceeded with ur1: /get (caused byconnectTimeoutError(<ur1lib3.connection.HTTPConnection object at ox000002045EF9B8DO>,'Connection to 134.175.188.27 timed out.(connecttimeout=O.0001) '))
异常处理
import requests
try :
proxy_response = requests.get( 'http:/ /134.175.188.27:5010/get',timeout=O.0001)
proxy = proxy_response.json()
print(proxy)
except:
pass