【python】py35中使用requests库爬https协议下的网站 - 代码天地

【python】py35中使用requests库爬https协议下的网站

其他 2018-12-24 23:46:20 阅读次数: 0

版权声明：原创 https://blog.csdn.net/hangvane123/article/details/82937044

使用requests库可以非常简单地爬https协议下的网站：

import requests
url='https://www.baidu.com/'
r = requests.get(url,verify=False)
r.encoding = 'utf-8'
print(r.text)

而当爬取TLSv1或TLSv1.1网站时，这样的代码就会报错
于是我们需要使用HTTPAdapter定制requests参数：

#-*- coding:utf-8 -*-
import re
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import ssl
import os
class MyAdapter(HTTPAdapter):
    def init_poolmanager(self, connections, maxsize, block=False):
        self.poolmanager = PoolManager(num_pools=connections,
                                      maxsize=maxsize,
                                      block=block,
                                      ssl_version=ssl.PROTOCOL_TLSv1)#这里定义了ssl协议版本
s = requests.Session()
s.mount('https://', MyAdapter())

def downloadImage(netPath,localPath,imageName):#netPath=网络全路径,localPath=本地文件夹路径,imageName=图片文件名
    #检测当前路径的有效性
    if not os.path.isdir(localPath):
        os.makedirs(localPath)
    ok=0
    while(ok==0):
        try:
            r=s.get(netPath,timeout=10)
            ok=1
        except:
            print("连接超时")
    if(r.status_code==200):
        fp = open(localPath+'\\'+imageName, 'wb')
        fp.write(r.content)
        fp.close()
        return 1
    else:
        return 0

这样就可以通过定制HTTPAdapter实现爬取TLSv1或TLSv1.1的网站。

猜你喜欢

转载自blog.csdn.net/hangvane123/article/details/82937044

【python】py35中使用requests库爬https协议下的网站

解决python3.7中使用requests库报错：“Can‘t connect to HTTPS URL because the SSL module is not available.“

Python中requests库的使用

python3中使用requests库出现的编码问题

Linux安装基于py35的caffe(CPU版，结合Anaconda）

requests库爬取需要登录的网站

Python爬虫使用requests库爬取表情包

python中requests库的初级使用

Python爬虫学习（一）使用requests库和robots协议

爬虫系列爬虫的Robots协议请求库之requests库解析库beautifulsoup 爬取汽车之家新闻搭建免费代理池验证码破解模拟自动登录网站 xpath路径 selenium简介与安装 selenium的使用 Scrapy 架构介绍 scrapy 框架的安装与启动 scrapy项目架构与配置文件 Scrapy中response属性以及内容提取爬取数据并解析 Scrapy 持久化

利用python的requests和BeautifulSoup库爬取小说网站内容

小爬虫demo——爬取“妹子”等网站链接____使用requests库

Python中使用requests和parsel爬取喜马拉雅电台音频

python中使用requests库获取昵图网图片，且正则中re.S的用法

使用python的requests.get()爬取某些网站报错SSL: CERTIFICATE_VERIFY_FAILED

Python爬取数据之Requests库!

hyper爬虫https2.0协议网站，使用py2exe无法打包certs.pem解决办法

python爬虫学习第二天，利用BeautifulSoup库和Requests库爬取网站

Mac下python3使用requests库出现No module named 'requests'解决方法

python中使用Requests发送post请求

requests + re 爬去网站图书信息（Python）

python使用requests和BeautifulSoup包爬取Pixiv图片--指定tag下的所有作品

python中requests请求库

python中requests库使用方法详解

python --爬虫基础 --爬取今日头条使用 requests 库的基本操作, Ajax

python使用requests库爬取淘宝食品信息，包含sign参数破解

使用Python3+requests库爬取海量精美图片(改良版)

Python使用urllib,urllib3,requests库+beautifulsoup爬取网页

Python3.x使用requests库将爬取数据存储到MySQL

内附源码！使用Python和requests库轻松爬取全国高校排名

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)