python 爬取喜马拉雅 - 代码天地

python 爬取喜马拉雅

其他 2020-03-22 16:18:46 阅读次数: 0

import re

import requests


class SpiderHimalaya(object):
    def __init__(self):
        self.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50"}
        self.audio_url = ''
    def get_page_url(self):
        """每一页的url"""
        pageUrl= ""
        pageUrlList=[pageUrl.format(i) for i in range(1,13)]
        return pageUrlList
    def get_response(self,url):
        """获取响应"""
        resp = requests.get(url,headers=self.headers)
        if resp.status_code == 200:
            return resp
        else:
            print(resp)
    def get_item_id(self):
        """获取每一节的id"""
        pageUrlList=self.get_page_url()
        resp = self.get_response(url=pageUrlList[0])
        content_list = resp.json()['data']['tracks']
        item_list = []
        for con in content_list:
            item={}
            key = self.audio_url.format(con['trackId'])
            item[key] = con["title"]
            item_list.append(item)
        return item_list
    def down_mp3(self,item):
        """下载音频"""
        (ite,) = item.items() # 拆包,返回一个元祖
        url,name = ite # 元组拆包
        resp=self.get_response(url)
        file_name = (lambda :''.join((lambda :re.split(r"[/ \\ : * \" < > | ？]+",name))()))() # 文件名处理
        print(file_name)
        mp3_url = resp.json()['data']['src']
        mp3_content = self.get_response(mp3_url).content
        with open(''.join(['三国志/',file_name,'.mp3']),'wb') as f:
            f.write(mp3_content)
    def run(self):
        """主函数"""
        item_list=self.get_item_id()
        for item in item_list:
            self.down_mp3(item)

if __name__ == '__main__':
    SpiderHimalaya().run()

go_flush

发布了127 篇原创文章 · 获赞 25 · 访问量 3万+

私信关注

猜你喜欢

转载自blog.csdn.net/weixin_44224529/article/details/104836401

python 爬取喜马拉雅

practice之Python爬取喜马拉雅的音频

Python---喜马拉雅fm的音频爬取

如何用Python爬取喜马拉雅全网音频文件

Python爬虫--喜马拉雅三国音频爬取

Python实例---爬取喜马拉雅全网音频文件

教你用python爬取喜马拉雅FM音频，干货分享~

Python爬取喜马拉雅有声小说【转载】

Python中使用requests和parsel爬取喜马拉雅电台音频

Python爬取喜马拉雅有声书

【python爬虫】对喜马拉雅上一个专辑的音频进行爬取并保存到本地

【Python3 爬虫学习笔记】爬取喜马拉雅《宝宝巴士-奇妙三字经》

python爬取喜马拉雅FM雪中悍刀行整本有声小说~

Python 爬取喜马拉雅音频

Python爬虫--喜马拉雅音频爬取

Python爬虫|爬取喜马拉雅音频

喜马拉雅爬取

python爬虫-喜马拉雅_晚安妈妈睡前故事

Python爬虫 -- 喜马拉雅爬虫01

[python爬虫]多进程爬取喜马拉雅音乐

喜马拉雅说爬取音乐文件

类+进程池的方法爬取喜马拉雅

喜马拉雅全站音频爬取

python django打造自己的喜马拉雅 3（主页前端+数据库）

Python采集喜马拉雅的音频，随时随地,听我想听

python爬虫80行代码拿下喜马拉雅有声书

Python3简单爬虫之下载相关类型音乐（喜马拉雅网站）！

喜马拉雅

[python爬虫]喜马拉雅音乐

python下载想听的有声书，让喜马拉雅收费，我是程序员！

今日推荐

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

周排行

static方法和非static方法的区别（java）

如何查找计算机专业paper

java.lang.ClassFormatError: Incompatible magic value 0 in class file com/sitecha

跳跃游戏II

stm32_之【建立工程】

TeaWeb v0.0.9 发布，统计底层优化、主机监控功能改进

事件分发 -----控制字体大小

JavaScript DOM练习（动态表格添加） December 25，2019

JSF Scope & CDI

实现从零搭建一个登录注册页面（附源代码）

每日归档

更多

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)