python urllib模拟浏览器请求爬虫 - 代码天地

python urllib模拟浏览器请求爬虫

其他 2018-09-15 17:37:17 阅读次数: 0

import urllib.request
import random

url = "http://baike.baidu.com"


"""
方式1
# 模拟请求头
headers = {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
    "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
}


# 设置一个请求体
req = urllib.request.Request(url,headers=headers)

# 发起请求
response = urllib.request.urlopen(req)
data = response.read().decode('utf-8')
print(data)

"""

#方式2
agentsList = [
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; SE 2.X MetaSr 1.0)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400) ",

]

agentStr = random.choice(agentsList)

req = urllib.request.Request(url)
# 向请求体里添加了User_Agent
req.add_header("User-Agent", agentStr)

response = urllib.request.urlopen(req)
data = response.read()
print(data)

猜你喜欢

转载自blog.csdn.net/qq_39198486/article/details/81502593

python urllib模拟浏览器请求爬虫

Python爬虫 —— urllib库的使用（get/post请求+模拟超时/浏览器）

Python爬虫：浏览器模拟登录

Python网络爬虫：自动模拟HTTP请求&爬虫的异常处理&爬虫浏览器伪装技术&新闻爬虫实战

urllib基础、超时设置、Get与Post请求、异常处理、浏览器伪装、Python新闻爬虫实战

Python爬虫：fake_useragent库模拟浏览器请求头

python爬虫模拟浏览器的两种方法

Python爬虫入门<二>—模拟浏览器

Python爬虫浏览器自动化模拟

python 爬虫（二）使用代理模拟浏览器

python模拟浏览器爬虫之下拉弹窗

python爬虫:使用Selenium模拟浏览器行为

python爬虫模拟浏览器访问-User-Agent

Python爬虫入门5：模拟浏览器访问网站

python爬虫常用浏览器请求头

python爬虫:使用Selenium模拟浏览器行为 python爬虫:使用Selenium模拟浏览器行为

[Python 爬虫] 模拟浏览器、代理ip、开启日志、超时处理、异常处理、Get/Post请求等

【爬虫】Urllib让我们的 python 假装是浏览器

python模拟浏览器爬虫之使用代理驱动“谷歌浏览器”

爬虫--浏览器伪装技术（urllib）

Python爬虫入门：那个叫做Urllib的库让我们的python假装是浏览器

python爬虫03：那个叫做 Urllib 的库让我们的 python 假装是浏览器

python基于控制浏览器爬虫

python 爬虫浏览器头部

python爬虫：常用浏览器的useragent

python爬虫伪装浏览器

Python网络爬虫:伪装浏览器

python 代理浏览器爬虫资源

python网络爬虫--浏览器伪装

Python 和 Selenium 的浏览器爬虫

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)