pytho爬虫 requests+json 抓取酷六短视频 - 代码天地

pytho爬虫 requests+json 抓取酷六短视频

其他 2020-10-17 08:13:05 阅读次数: 0

爬取照片或者视频，究其本质，就是下载下来网页中对应的二进制文件。

# 课题：爬取酷6全站视频
# 根目录下需要有img文件夹
# requests
# json

import requests  # pip install requests
import json
import re

def change_title(title):
    """处理文件名非法字符的方法"""
    pattern = re.compile(r"[\/\\\:\*\?\"\<\>\|]")  # '/ \ : * ? " < > |'
    new_title = re.sub(pattern, "_", title)  # 替换为下划线
    return new_title

for page in range(0, 10):
    print('++++++++++++++++正在抓取第{}页数据+++++++++++++++++++++'.format(page + 1))
    # 爬虫的一般思路
    # 1、分析目标网页，确定爬取的url路径，headers参数
    base_url = 'https://www.ku6.com/video/feed?pageNo={}&pageSize=40&subjectId=76'.format(str(page))
    headers = {
    
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

    # 2、发送请求 -- requests 模拟浏览器发送请求，获取响应数据
    response = requests.get(base_url, headers=headers)
    data = response.text
    # print(data)

    # 3、解析数据 -- json模块：把json字符串转化成python可交互的数据类型
    # 3、1 转换数据类型
    json_data = json.loads(data)
    # 3、2 数据提取
    data_list = json_data['data']
    # print(data_list)

    # 遍历列表
    for data1 in data_list:
        video_name = data1['title'] + '.mp4'
        video_url = data1['playUrl']
        # print(video_name, video_url)

        new_title = change_title(video_name)

        # 再次发送视频的请求
        print('正在下载：', video_name)
        video_data = requests.get(video_url, headers=headers).content  # 视频的数据

        # 4、保存数据 -- 保存在目标文件夹中
        with open('video\\' + new_title, 'wb') as f:
            f.write(video_data)
            print('下载完成。。。\n')

猜你喜欢

转载自blog.csdn.net/qq_43478096/article/details/104680951

pytho爬虫 requests+json 抓取酷六短视频

python爬虫 requests+json 爬取王者荣耀英雄皮肤脚本

pytho爬虫之requests的使用

python requests+json爬取ajax加载爱彼迎深圳所有房源

Pytho：json格式读写

pytho中pickle、json模块

爬虫基础——————（requests，cookie，session，json）

短视频爬虫

爬虫3 requests之json 把json数据转化为字典

爬虫原理与数据抓取----- Requests模块

Youku爬虫抓取视频

python爬虫requests json与字典对象互相转换

python_requests ~爬虫~小视频~~~

PYTHO 爬虫,抓去京东产品价格DEMO

Pytho并发编程-利用协程实现简单爬虫

Pytho爬虫-4567电影网电影信息爬取

爬虫（Requests）

requests 爬虫

requests爬虫

爬虫_requests

爬虫 - requests

[Python][爬虫03]requests+BeautifulSoup实例:抓取图片并保存

python淘宝爬虫基于requests抓取淘宝商品数据

[Python爬虫] 三、数据抓取之Requests HTTP 库

抓取王者荣耀英雄列表的爬虫笔记(python+requests)

【python爬虫系列】4.Requests数据抓取

Pytho中dict(或对象)与json之间的互相转化

Python爬虫小白入门（九）Python 爬虫 – 使用requests抓取网页

短视频评论的抓取及分析

网络爬虫-抓取酷航机票信息

今日推荐

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

周排行

计算机组成与设计（七）—— 除法器

Integer Approximation(分治+枚举)

大话数据库索引

windows10系统JDK的配置及下载地址

mysql实现秒值转换中原六仔平台搭建

Codeforces Round #556 (Div. 1)

百练1064 网线主管

Codeforces 995F Cowmpany Cowmpensation

子集生成之增量构造法，位向量法，二进制法

ERROR: cmd.exe failed with args /c "/APK\gradle\rungradle.bat...

每日归档

更多

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)