python 爬虫整理 - 代码天地

python 爬虫整理

其他 2019-02-23 13:41:17 阅读次数: 0

import requests
from bs4 import BeautifulSoup as bs
import datetime
import json
import re
import multiprocessing as mp

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"}

第一种：返回的是json格式，直接解析

while True:
try:
r = requests.get("http://yunhq.sse.com.cn:32041/v1/sh1/list/exchange/equity?select=code%2Cprev_close&order=&begin=0&end=1500", headers=headers).text
text = json.loads(r)
cnt = text["list"]
break
except:
continue
可以用循环来过滤一两次爬取失败。

第二种：返回html，用BeautifulSoup解析

r = requests.get(url, headers=gheaders).content
content = bs(r, "html.parser", from_encoding='utf-8')
text = content.find("table", attrs={"class": "quote-info"})
tds = text.find_all("td")
p1 = str(tds[6].find("span", class_="stock-fall").text)
p2 = str(tds[2].find("span", class_="stock-rise").text)

另外可以用多进程并行爬取：

def get_close(code):
r = requests.get(b'http://www.szse.cn/api/market/ssjjhq/getTimeData?marketId=1&code=%s' % code, headers=headers).text
text = json.loads(r)
px = str(text["data"]["close"])
return code+","+px+"\n"
syms = get_list()
res = []
nProcess = 2*mp.cpu_count()/3
if nProcess > 1:
pool = mp.Pool(nProcess)
res = pool.map(get_close, syms)
pool.close()
pool.join()
else:
res = map(get_close, syms)

https://blog.csdn.net/qq_32784541/article/details/79655146

扫描二维码关注公众号，回复： 5288061 查看本文章

猜你喜欢

转载自blog.csdn.net/qq_24920947/article/details/84952758

python爬虫整理——爬虫简介

整理python小爬虫

python 爬虫整理

python 爬虫面试整理

python爬虫相关知识整理

python系列整理---爬虫基础

python基础整理7——爬虫——爬虫开发工具

python基础语法与基础爬虫整理——python基础语法Ⅰ

python爬虫基础知识整理——urlerror异常处理

python基础整理6——爬虫基础知识点

给入门的小白整理的python爬虫学习路线指导参考

Python爬虫面试题整理（小白，自己备用）

使用Python爬虫整理小说网资源-自学

【Python网络爬虫整理记录 D：01】——JS混淆加密

python系列整理---爬虫架构简单代码实现

python爬虫基础知识点整理

Python爬虫及网络编程相关面试题整理

python 爬虫框架scrapy学习记录和整理 python爬虫框架scrapy入门文档学习

五年Python爬虫程序员整理的全栈爬虫知识点

Python网络爬虫与信息提取（15）—— 新浪网新闻爬虫并分类整理

【整理】爬虫的资料整理

Python爬虫应该怎么学？程序猿花了一周整理的学习技巧，请收下

Python3.6蜘蛛爬虫系列教程入门自学详细教程博文收集汇总整理

给大家整理了一篇Python：爬虫技巧的资料总结

python网络爬虫（web spider）系统化整理总结（一）：入门

王者程序员整理的Python网络爬虫和web的系统学习路线图

（待整理）Python:requests库、BeautifulSoup4库的基本使用（实现简单的网络爬虫）

python爬虫入门笔记整理，文末附带视频教程和项目代码

Python整理

python网络爬虫（web spider）系统化整理总结（二）：爬虫python代码示例(两种响应格式：json和html)

今日推荐

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

周排行

记一下去大梅沙的准备（2018-05-26）

Spring 注解事务

基于HTTP协议的客户端缓存

阿里云rds 备份和还原

[PHP] 几个拖慢 PHP 程序/API 运行速度的点

python 代码风格------------PEP8规则

js控制json生成菜单——自制菜单（一）

将字符串: 'k:1|k1:2|k2:3|k3:4 ' ,处理成 python 字典: {'k':1, 'k1':2, ...}

微信小程序转支付宝小程序

Qt551.窗口滚动条

每日归档

更多

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)