python爬虫beautifulsoup4系列3 - 代码天地

python爬虫beautifulsoup4系列3

其他 2018-05-12 10:17:10 阅读次数: 0

前言

本篇手把手教大家如何爬取网站上的图片，并保存到本地电脑

一、目标网站

1.随便打开一个风景图的网站：http://699pic.com/sousuo-218808-13-1.html

2.用firebug定位，打开firepath里css定位目标图片

3.从下图可以看出，所有的图片都是img标签，class属性都是lazy

二、用find_all找出所有的标签

1.find_all(class_="lazy")获取所有的图片对象标签

2.从标签里面提出jpg的url地址和title

复制代码

 1 # coding:utf-8
 2 from bs4 import BeautifulSoup
 3 import requests
 4 import os
 5 r = requests.get("http://699pic.com/sousuo-218808-13-1.html")
 6 fengjing = r.content
 7 soup = BeautifulSoup(fengjing, "html.parser")
 8 # 找出所有的标签
 9 images = soup.find_all(class_="lazy")
10 # print images # 返回list对象
11 
12 for i in images:
13     jpg_rl = i["data-original"]  # 获取url地址
14     title = i["title"]           # 返回title名称
15     print title
16     print jpg_rl
17     print ""

复制代码

三、保存图片

1.在当前脚本文件夹下创建一个jpg的子文件夹

2.导入os模块，os.getcwd()这个方法可以获取当前脚本的路径

3.用open打开写入本地电脑的文件路径，命名为：os.getcwd()+"\\jpg\\"+title+'.jpg'（命名重复的话，会被覆盖掉）

4.requests里get打开图片的url地址，content方法返回的是二进制流文件，可以直接写到本地

四、参考代码

复制代码

 1 # coding:utf-8
 2 from bs4 import BeautifulSoup
 3 import requests
 4 import os
 5 r = requests.get("http://699pic.com/sousuo-218808-13-1.html")
 6 fengjing = r.content
 7 soup = BeautifulSoup(fengjing, "html.parser")
 8 # 找出所有的标签
 9 images = soup.find_all(class_="lazy")
10 # print images # 返回list对象
11 
12 for i in images:
13     jpg_rl = i["data-original"]
14     title = i["title"]
15     print title
16     print jpg_rl
17     print ""
18     with open(os.getcwd()+"\\jpg\\"+title+'.jpg', "wb") as f:
19         f.write(requests.get(jpg_rl).content)

复制代码

猜你喜欢

转载自www.cnblogs.com/jason89/p/9027624.html

python爬虫beautifulsoup4系列3

python爬虫beautifulsoup4系列1

python爬虫beautifulsoup4系列2

python 爬虫-beautifulsoup4

python爬虫beautifulsoup4系列4-子节点

【python3爬虫】beautifulsoup4 安装

Python爬虫--BeautifulSoup4教程、练习

Python 爬虫 BeautifulSoup4 库的使用

python爬虫之-BeautifulSoup4

python3之beautifulsoup4

Python3 BeautifulSoup4

爬虫利器beautifulsoup4

爬虫基础——BeautifulSoup4

爬虫（BeautifulSoup4）——安装

爬虫之BeautifulSoup4

python爬虫实战：基础爬虫(使用BeautifulSoup4等) python爬虫实战：基础爬虫(使用BeautifulSoup4等)

python3爬虫(基于requests、BeautifulSoup4)之项目实战(三)

python3爬虫(基于requests、BeautifulSoup4)之环境配置

python3爬虫(基于requests、BeautifulSoup4)之项目实战(二)

python3爬虫(基于requests、BeautifulSoup4)之项目实战(一)

【爬虫】002 python3 +beautifulsoup4 +requests 爬取静态页面

Python3网络爬虫教程14——BeautifulSoup4之搜索文档树

【Python3 爬虫】U10_初识BeautifulSoup4库

【Python爬虫】beautifulsoup4库的安装与调用

【python 爬虫】BeautifulSoup4 库的介绍使用

Python爬虫beautifulsoup4常用的解析方法总结

Python爬虫(十二)_BeautifulSoup4 解析器

Python网络爬虫——BeautifulSoup4库的使用

python爬虫之BeautifulSoup4库的简单用法

python爬虫之BeautifulSoup4介绍

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)