【初学python爬虫01】Python3用selenium动态爬取美图壁纸 - 代码天地

【初学python爬虫01】Python3用selenium动态爬取美图壁纸

其他 2019-04-02 10:21:15 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/qq_32958797/article/details/84863884

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
import os
import re
import time

class Spider:
    url = "https://mm.enterdesk.com/"
    directory = "images2"
    pages_pattern = '\/([0-9]+?).html'

    # 获取网站图片信息
    def get_html(self):
        browser=webdriver.Chrome()
        browser.get("https://mm.enterdesk.com/")
        wait=WebDriverWait(browser,10)
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".egeli_pic_m")))
        # 获取图片页数
        #page = browser.find_element(By.CSS_SELECTOR,".listpages ul li:last-child a")
        #pages = re.findall(Spider.pages_pattern,page.get_attribute("href"))
        #print(int(pages[0]))
        # 根据页数进行下拉操作(下拉6次测验)
        for i in range(6):
            browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            print("第"+str(i+1)+"进行下拉操作")
            time.sleep(3)
        # 获取图片列表
        lst=browser.find_element(By.CSS_SELECTOR,".egeli_pic_m")
        lst = lst.find_elements(By.CSS_SELECTOR,".egeli_pic_li dl dd img")
        #print(len(lst))
        images=[]
        for l in lst:
            image={'url':l.get_attribute("src"),'title':l.get_attribute("title")}
            images.append(image)
        return images
    # 下载图片
    def get_image(self, images):
        i = 0   #记录下载的图片数
        if not os.path.exists(Spider.directory):
            os.makedirs(Spider.directory)
        for img in images:
            dirc=os.path.join(Spider.directory, img['title']+'.jpg')
            # 过滤已存在的图片
            if os.path.exists(dirc):
                continue
            res = requests.get(img['url'])
            with open(dirc,'wb') as f:
                f.write(res.content)
            i += 1
        print("本次共保存%s张图片"%i)
    # 启动方法
    def go(self):
        start = time.time()
        images = self.get_html()
        self.get_image(images)
        end = time.time()
        print("本次总共耗时：",end-start)
spider = Spider()
spider.go()

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/qq_32958797/article/details/84863884

【初学python爬虫01】Python3用selenium动态爬取美图壁纸

Python3爬虫（十三）爬取动态页之Selenium

Python3网络爬虫实战解析——优美壁纸爬取

python爬虫实战——爬点美图给你做壁纸

Python3网络爬虫实战-36、分析Ajax爬取今日头条街拍美图

用Python爬取某吧的美图

python爬虫爬取wallpapers最新壁纸

python3网络爬虫学习第六章Ajax数据爬取（爬取今日头条街拍美图）

python3爬虫(2):使用Selenium爬取百度文库word文章

Python3爬虫-selenium爬取百度文库

python3爬虫-通过selenium登陆拉钩，爬取职位信息

【初学python爬虫02】Python3用Requests+正则表达式爬取豆瓣电影Top250

【Python3爬虫-爬图片】多线程爬取中国国家地理全站美图，多图可以提高你的审美哦

Python爬取动态加载的壁纸网站（高清壁纸福利）

Python3网络爬虫：requests爬取动态网页内容

python3编写网络爬虫14-动态渲染页面爬取

【python3爬虫系列】问题一：去西刺爬取免费可用的代理（用requests爬取）

【Python3网络爬虫开发实战】6.4-分析Ajax爬取今日头条街拍美图

转：【Python3网络爬虫开发实战】6.4-分析Ajax爬取今日头条街拍美图

【Python3网络爬虫开发实战】6.4-分析Ajax爬取今日头条街拍美图【华为云技术分享】

【Python3 爬虫】17_爬取天气信息

python3 --小爬虫（爬取美剧字幕）

python3爬虫爬取网页图片简单示例

python3爬虫之二：爬取网页图片

Python3 爬虫实战（并发爬取）

python3爬虫爬取煎蛋网妹纸图片

python3 爬虫爬取blog内容

python3 爬虫学习之爬取猫眼电影

Python3爬虫爬取VIP视频

python3爬虫 —— 爬取豆瓣电影信息

今日推荐

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

周排行

static方法和非static方法的区别（java）

如何查找计算机专业paper

java.lang.ClassFormatError: Incompatible magic value 0 in class file com/sitecha

跳跃游戏II

stm32_之【建立工程】

TeaWeb v0.0.9 发布，统计底层优化、主机监控功能改进

事件分发 -----控制字体大小

JavaScript DOM练习（动态表格添加） December 25，2019

JSF Scope & CDI

实现从零搭建一个登录注册页面（附源代码）

每日归档

更多

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)