【爬虫】 03 模拟浏览器访问网页&使用代理池访问网页 - 代码天地

【爬虫】 03 模拟浏览器访问网页&使用代理池访问网页

其他 2020-04-09 11:54:27 阅读次数: 0

用自己的请求头 headers (我的不可用，已做过处理)

如何找到自己的 headers ?

F12 或者开发者工具
找到 Network
在 Name 这一栏点击一个文件
找到右侧的 Request Headers
向下滑动找到 User-Agent 即为自己的 header
再找不到就百度！！！

import urllib.request

url = 'https://www.baidu.com/'

# 模拟请求头
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/80.0.3987.132 Safari/537.6'}
# 设置一个请求体
req = urllib.request.Request(url, headers=headers)

# 发起请求
response = urllib.request.urlopen(req)
data = response.read().decode('utf-8')
print(data)

代理池法

百度搜索 user-agent

搜到的存到列表中随机使用即可

agent_list = [
    'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR '
    '2.0.50727; SE 2.X MetaSr 1.0)',
    'User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)'
]

agentStr = random.choice(agent_list)
req = urllib.request.Request(url)
# 向请求体里添加了 User-Agent
req.add_header('User-Agent',agentStr)
response = urllib.request.urlopen(req)
print(response.read().decode('utf-8'))

暧昧忆故人

发布了96 篇原创文章 · 获赞 103 · 访问量 1万+

私信关注

猜你喜欢

转载自blog.csdn.net/weixin_38114487/article/details/104707576

【爬虫】 03 模拟浏览器访问网页&使用代理池访问网页

Python爬虫入门6：模拟浏览器访问网页的http报文体压缩传输

爬虫-使用模拟浏览器操作(截取网页)

爬虫之使用代理访问网页

Python爬虫教程-03-使用chardet

用户代理列表--爬虫伪装浏览器访问用

快速入门网络爬虫系列 Chapter03 | 抓取网页

Python 爬虫学习03 具体爬取网页的实现

python爬虫模拟浏览器访问-User-Agent

Python爬虫入门5：模拟浏览器访问网站

Scrapy爬虫：模拟浏览器和使用代理

python 爬虫（二）使用代理模拟浏览器

(PY爬虫03)爬虫初识

（爬虫）Python爬虫03（隐藏）

爬虫03之beautiful

爬虫学习笔记--03

爬虫_练习03

day03爬虫

Day 03 爬虫

爬虫 Day03

day03 爬虫

03python爬虫

Python爬虫开发-03--使用headers爬取网页的简单模型

2020/04/12 03-代理豆瓣图书爬虫

爬虫CASE01：反爬策略之使用随机user-agent模拟浏览器的网页爬取

python模拟浏览器爬虫之使用代理驱动“谷歌浏览器”

爬虫day03 request模块， Handler处理器

Python-爬虫03：urllib.request模块的使用

python爬虫-python 利用代理ip访问网页（requests）

python爬虫-python 利用代理ip访问网页（urllib）

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)