Python_selenium爬虫 - 代码天地

Python_selenium爬虫

企业开发 2023-10-03 09:45:04 阅读次数: 0

1、webdriver下载地址：

https://registry.npmmirror.com/binary.html?path=chromedriver/

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
import time

driver = Chrome()
driver.maximize_window()
driver.get("https://jour.duxiu.com/magDetail.jsp?magid=320910034129&d=3305845747873E9BCB4DD9EDA6053C20")
# for year in range(1990,2022):
for year in range(1985,1990):
    driver.find_element(By.XPATH,f'//*[@id="y{
      
      year}"]/a').click()
    year = str(year)
    time.sleep(2)
# driver.find_element(By.XPATH,'//*[@id="qihao_20220"]/a').click()
    for qihao in range(0,12):
        time.sleep(4)
        qihao = str(qihao)
        try:
            driver.find_element(By.XPATH,f'//*[@id="qihao_{
      
      year+qihao}"]/a').click()
        except:
            continue
        else:
        # 爬取目录详情
        # ulEle = driver.find_element(By.XPATH,'//*[@id="jourlist"]/ul')
        # time.sleep(2)
        # titles = ulEle.find_elements(By.TAG_NAME,'li')
        # time.sleep(2)
        # print(len(titles))
        # print(f"{year}-{qihao}:")
            qihao = int(qihao)
            year_content = f"{
      
      year}-{
      
      qihao+1}:"
            file = open('content.txt', 'a')
            file.write(year_content + '\n')
            for i in range(1, 16):
                # //*[@id="jourlist"]/ul/li[15]
                time.sleep(2)
                try:
                    titles = driver.find_elements(By.XPATH, f'//*[@id="jourlist"]/ul/li[{
      
      i}]')
                except:
                    continue
                else:
                # file = open('content.txt', 'a')
                # file.write(year_content + '\n')
                    for title in titles:
                        print(year_content)
                        print(title.text)
                        file.write(title.text + '\n')

猜你喜欢

转载自blog.csdn.net/qq_42383069/article/details/124798619

Python_selenium爬虫

python_selenium(五)

【python_selenium】python_selenium自动化测试框架 python_selenium自动化测试框架

day04 python_selenium

day4 python_selenium

python_selenium元素定位(1)

Python_Selenium 之PO模式的思想

python_selenium处理动态id、class生成的element

python_selenium自动化测试框架

python_selenium简单的滑动验证码

【Python_Selenium学习笔记（八）】基于Selenium模块实现滑块验证码破解

Python_Selenium自动化测试实战（未完待续）

python_selenium基础篇_1_iframe定位（126邮箱）

全程干货 python_selenium自动化测试框架，可以拿去直接用

Python_selenium之获取当前页面的href属性，id属性，图片信息和截全屏

Python爬虫——Selenium库

Python爬虫 selenium

Python爬虫之selenium

Python 爬虫利器 Selenium

Python爬虫-Selenium（3）

Python爬虫-Selenium（2）

Python爬虫-Selenium（1）

Python selenium爬虫

Python爬虫利器—selenium

Python 爬虫基础Selenium

Python爬虫_selenium

python 爬虫 selenium

Selenium python爬虫

Python爬虫 -selenium

python爬虫selenium相关

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)