b站视频排行榜爬取

bilibili排行榜爬取

众所周知,B站学习软件。哈哈哈哈,今天我们就爬取B站的排行榜。废话不多说了,直接开始了。

#分析:
我们看图一可以发现每个是视频的info都在li的标签里,我可以用xpath得到,在这里我想获得视频的封面,播放量,综合得分以及视频链接;除了封面,其它的都可以得到,后来我在另一个另一个链接中发现了,我在后面会讲到。
图一:
在这里插入图片描述

我们点开视频链接,进入视频播放页,F12一下,点击network,让视频播放,会发现有许多xhr文件不断刷新(如图二文件),它以m4s结尾
图二:
在这里插入图片描述

我们可推断视频是每段小段m4s的文件结合起来。我复制其中一个链接,打开后,如图三
图三:
在这里插入图片描述
这时我们该想另一件事,即使我们能获得这个文件,我们该怎么获取这样一个个链接,我找了好大一会,找不到,那我们就应该换一种思路,是不是有一个完整的视频链接,它会保存到什么地方,最后被我找到了,它其实隐藏在一开始的elements中,这是我们在里面搜索一下window,会发现图四:
图四:
在这里插入图片描述
这时我们可以打开页面源码,把进行查看,我第一眼感觉他是json文件,这里我们可以用正则获取,我们分析一下:


dic={
    
    "code":0,"message":"0","ttl":1,"data":{
    
    "from":"local","result":"suee","message":"","quality":80,"format":"flv","timelength":146787,"accept_format":"hdflv2,flv,flv720,flv480,mp4","accept_description":["高清 1080P+","高清 1080P","高清 720P","清晰 480P","流畅 360P"],"accept_quality":[112,80,64,32,16],"video_codecid":7,"seek_param":"start","seek_type":"offset","dash":{
    
    "duration":147,"minBufferTime":1.5,"min_buffer_time":1.5,"video":[{
    
    "id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":1288827,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640032","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1005","indexRange":"1006-1385"},"segment_base":{
    
    "initialization":"0-1005","index_range":"1006-1385"},"codecid":7},{
    
    "id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":777178,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1178","indexRange":"1179-1558"},"segment_base":{
    
    "initialization":"0-1178","index_range":"1179-1558"},"codecid":12},{
    
    "id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":937924,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640028","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1003","indexRange":"1004-1383"},"segment_base":{
    
    "initialization":"0-1003","index_range":"1004-1383"},"codecid":7},{
    
    "id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":567464,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{
    
    "initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{
    
    "id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":557917,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001F","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1007","indexRange":"1008-1387"},"segment_base":{
    
    "initialization":"0-1007","index_range":"1008-1387"},"codecid":7},{
    
    "id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":339786,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1182","indexRange":"1183-1562"},"segment_base":{
    
    "initialization":"0-1182","index_range":"1183-1562"},"codecid":12},{
    
    "id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":217071,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{
    
    "initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{
    
    "id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":353246,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001E","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
    
    "Initialization":"0-1028","indexRange":"1029-1408"},"segment_base":{
    
    "initialization":"0-1028","index_range":"1029-1408"},"codecid":7}],"audio":[{
    
    "id":30280,"baseUrl":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{
    
    "Initialization":"0-907","indexRange":"908-1299"},"segment_base":{
    
    "initialization":"0-907","index_range":"908-1299"},"codecid":0},{
    
    "id":30216,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":67328,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{
    
    "Initialization":"0-932","indexRange":"933-1324"},"segment_base":{
    
    "initialization":"0-932","index_range":"933-1324"},"codecid":0},{
    
    "id":30232,"baseUrl":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{
    
    "Initialization":"0-907","indexRange":"908-1299"},"segment_base":{
    
    "initialization":"0-907","index_range":"908-1299"},"codecid":0}]},"support_formats":[{
    
    "quality":112,"format":"hdflv2","new_description":"1080P 高码率","display_desc":"1080P","superscript":"高码率"},{
    
    "quality":80,"format":"flv","new_description":"1080P 高清","display_desc":"1080P","superscript":""},{
    
    "quality":64,"format":"flv720","new_description":"720P 高清","display_desc":"720P","superscript":""},{
    
    "quality":32,"format":"flv480","new_description":"480P 清晰","display_desc":"480P","superscript":""},{
    
    "quality":16,"format":"mp4","new_description":"360P 流畅","display_desc":"360P","superscript":""}]},"session":"b80375f9a61937c9ce93ee13909c1bca"}
for key,value in dic['data'].items():
    print(key,':',value)
print('===================================')
for key,value in dic['data']['dash'].items():
    print(key,':',value)
print('===================================')
for key,value in dic['data']['support_formats'][0].items():
    print(key,':',value)

dic是我们得到json数据,经过我一成一成剥开,发现他的视频与音频是两个文件,那就是分开的,我们可以下载后合成。我们看下我分析的结果:
图五:
在这里插入图片描述
accept_description指的是视频画质,accept_quality指的是视频画质对应的id,这里我没有会员,所以最高获取高清 1080的画质视频,视频文件在video的baseUrl中,音频文件在audio的baseUrl。
同时我带着试试的想法吧图一红线的那一串字符复制,在视频链接的elements中搜寻,居然找到(如图七),我打开了链接就是原先封面,并且我在其它视频链接中试试,得到的都是视频封面,我们用正则就可以得到。
图七:
在这里插入图片描述
我们的的分析完成了,接下来上代码。

代码:

1:引入库


import re
from random import randint
import requests
from lxml import etree
from time import sleep
import json
import os

2:建立session,共享cookie


# 建立session
print('建立session')
session = requests.Session()
base_url = 'https://www.bilibili.com/'
base_headers = {
    
    
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'cookie': 自己的cookie,
    'referer': 'https://www.google.com/',
}
session.get(url=base_url, headers=base_headers)
sleep(randint(3,5))

3:爬取视频排行榜:(在这里我感觉headers加上referer是非常重要的,referer也就是你上一级网页链接)


# 爬取排行榜视频:
print('爬取排行榜视频')
dic={
    
    }
leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'
leaderboard_headers = {
    
    
    'referer': leaderboard_url,
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
    'cache-control': 'max-age=0',
}
response = session.get(url=leaderboard_url, headers=leaderboard_headers)
sleep(randint(3,5))
content = response.content
html = etree.HTML(content)
info_list = html.xpath('//ul[@class="rank-list"]/li')
for li in info_list:
    name = li.xpath('div[2]/div[2]/a/text()')[0]             #视频名字
    href = 'https:'+li.xpath('div[2]/div[2]/a/@href')[0]     #视频链接
    score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0]+'综合得分'               #综合得分
    play_volume=li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip()        #播放量
    list=[href,score,play_volume]
    dic[name]=list
    # print(name,href,score,play_volume)
    # print(dic)

在这里我把视频的name作为字典的key,而视频链接,综合得分,播放量放在列表里,list作为字典的value。

4:在这里我爬取时有时候session没法用,我就勇try一下,如果session可以,就不要except,不可以,我就勇request.get求求,不要忘了加入cookie。

我在进行爬取时,把视频链接与音频链接放入一个列表,再把这个列表放入前面的列表中


#得到音频链接
print('视频爬取')
video_headers={
    
    
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'cache-control': 'max-age=0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'referer':leaderboard_url,
}
num=0
for i in dic.keys():
video_url=dic[i][0]
#获取封面链接
try:
    response=session.get(url=video_url,headers=video_headers)
except:
    video_headers['cookie'] = 自己的cookie,
    response=requests.get(url=video_url,headers=video_headers)
text = response.text
img_url=re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">',text).group(1)
dic[i].append(img_url)              #照片链接添加到列表里
data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)
data = json.loads(data)
# print(data)

try:
    time = data['data']['dash']['duration']
    minute = int(time) // 60
    second = int(time) % 60
    #视频链接
    video_url = data['data']['dash']['video'][0]['baseUrl']
    #音频链接
    audio_url = data['data']['dash']['audio'][0]['baseUrl']
    list=[video_url,audio_url]
    dic[i].append(list)
    print(video_url)
    print(audio_url)
    print('视频时长{}分{}秒'.format(minute, second))
except KeyError:
    time = data['data']['timelength'] // 1000
    minute = int(time) // 60                   # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。
    second = int(time) % 60
    video_url = data['data']['durl'][0]['url']
    list = [video_url]
    dic[i].append(list)
    print('视频时长{}分{}秒'.format(minute, second))  

5:视频音频下载

'origin': 'https://www.bilibili.com',
'referer': 'https://www.bilibili.com/',

都有这两个,然后我添加进去成功了


#下载视频与音频
print('下载')
headers={
    
    
    'cookie':自己的cookie,
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'origin': 'https://www.bilibili.com',
    'referer': 'https://www.bilibili.com/',
}

path=r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)
bool=mkdir(path)
if bool==1:
    video_path=path+'\_video.mp4'
    audio_path=path+'\_audio.mp4'
    save_path=path+'\{}.mp4'.format(num)
    info_path=path+'\{}.text'.format(num)
    img_path=path+'\{}.png'.format(num)
    num += 1
    print('{}视频开始爬取'.format(i))

    with open(video_path, 'wb') as f:  # 视频部分
        response = requests.get(dic[i][-1][0], headers=headers)
        print(response.status_code)
        f.write(response.content)
    print('{}视频爬取完成'.format(i))

    print('{}音频开始爬取'.format(i))
    with open(audio_path, 'wb') as f:  # 音频部分
        response = requests.get(dic[i][-1][-1], headers=headers)
        f.write(response.content)
    print('{}音频爬取完成'.format(i))

6:封面下载与info保存:


#封面下载
with open(img_path, 'wb') as f:
    headers = {
    
    
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
    url = 'http://i2.hdslb.com/bfs/archive/273ed274d5cf2556e162f8d1f7eef3b63bd2f31b.jpg'
    response = requests.get(url=dic[i][3], headers=headers)
    f.write(response.content)
#info保存
with open(info_path,'w') as f:
    info=i+'\n'+dic[i][1]+'\n'+dic[i][2]
    f.write(info)

7:视频合成

先要视频合成必须以管理员身份运行编辑器,我用的是pycharm,还有就是编辑器编码要变成’gbk’,不能’utf-8’

cmd=r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path,audio_path,save_path)
    p = os.popen(cmd)

全部代码:

import re
from random import randint
import requests
from lxml import etree
from time import sleep
import json
import os


def get_link_and_img():
    # 建立session
    print('建立session')
    session = requests.Session()
    base_url = 'https://www.bilibili.com/'
    base_headers = {
    
    
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
        'cookie': 自己的cookie,
        'referer': 'https://www.google.com/',
    }
    session.get(url=base_url, headers=base_headers)
    sleep(randint(3, 5))

    # 爬取排行榜视频:
    print('爬取排行榜视频')
    dic = {
    
    }
    leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'
    leaderboard_headers = {
    
    
        'referer': leaderboard_url,
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
        'cache-control': 'max-age=0',
    }
    response = session.get(url=leaderboard_url, headers=leaderboard_headers)
    sleep(randint(3, 5))
    content = response.content
    html = etree.HTML(content)
    info_list = html.xpath('//ul[@class="rank-list"]/li')
    for li in info_list:
        name = li.xpath('div[2]/div[2]/a/text()')[0]  # 视频名字
        href = 'https:' + li.xpath('div[2]/div[2]/a/@href')[0]  # 视频链接
        score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0] + '综合得分'  # 综合得分
        play_volume = li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip()  # 播放量
        list = [href, score, play_volume]
        dic[name] = list
        # print(name,href,score,play_volume)
        # print(dic)

        # 视频爬取
        print('视频爬取')
        video_headers = {
    
    
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
            'cache-control': 'max-age=0',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
            'referer': leaderboard_url,
        }
        num = 0
        for i in dic.keys():
            video_url = dic[i][0]
            # 获取封面链接
            try:
                response = session.get(url=video_url, headers=video_headers)
            except:
                video_headers['cookie'] = 自己的cookie
                response = requests.get(url=video_url, headers=video_headers)
            text = response.text
            img_url = re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">', text).group(1)
            dic[i].append(img_url)  # 照片链接添加到列表里
            data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)
            data = json.loads(data)
            # print(data)

            try:
                time = data['data']['dash']['duration']
                minute = int(time) // 60
                second = int(time) % 60
                video_url = data['data']['dash']['video'][0]['baseUrl']
                audio_url = data['data']['dash']['audio'][0]['baseUrl']
                list = [video_url, audio_url]
                dic[i].append(list)
                # print(video_url)
                # print(audio_url)
                # print('视频时长{}分{}秒'.format(minute, second))
            except KeyError:
                time = data['data']['timelength'] // 1000
                minute = int(time) // 60  # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。
                second = int(time) % 60
                video_url = data['data']['durl'][0]['url']
                list = [video_url]
                dic[i].append(list)
                # print('视频时长{}分{}秒'.format(minute, second))

            # 下载视频与音频
            print('下载视频音频')
            headers = {
    
    
                'cookie': 自己的cookie,
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
                'origin': 'https://www.bilibili.com',
                'referer': 'https://www.bilibili.com/',
            }

            path = r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)
            bool = mkdir(path)
            # print(bool)
            # print(path)

            if bool==1:
                video_path = path + '\_video.mp4'
                audio_path = path + '\_audio.mp4'
                save_path = path + '\{}.mp4'.format(num)
                info_path = path + '\{}.text'.format(num)
                img_path = path + '\{}.png'.format(num)
                print('{}视频开始爬取'.format(i))

                with open(video_path, 'wb') as f:  # 视频部分
                    response = requests.get(dic[i][-1][0], headers=headers)
                    print(response.status_code)
                    f.write(response.content)
                print('{}视频爬取完成'.format(i))

                print('{}音频开始爬取'.format(i))
                with open(audio_path, 'wb') as f:  # 音频部分
                    response = requests.get(dic[i][-1][-1], headers=headers)
                    f.write(response.content)
                print('{}音频爬取完成'.format(i))

                # 封面下载
                with open(img_path, 'wb') as f:
                    headers = {
    
    
                        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
                    }
                    response = requests.get(url=dic[i][3], headers=headers)
                    f.write(response.content)

                # info保存
                with open(info_path, 'w') as f:
                    info = i + '\n' + dic[i][1] + '\n' + dic[i][2]
                    f.write(info)

                # 音频视频合成
                composite(video_path, audio_path, save_path)
                sleep(randint(5, 8))

            else:
                print('{}已经被爬取'.format(i))
            num = num + 1


def mkdir(path):
    folder = os.path.exists(path)
    if not folder:                      # 判断是否存在文件夹如果不存在则创建为文件夹
        os.makedirs(path)
        return 1
    else:
        return 0


def composite(video_path, audio_path, save_path):
    cmd = r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path, audio_path, save_path)
    p = os.popen(cmd)
    # print(p.read())


get_link_and_img()

这里面的下载视频与音频还有封面,以及合成视频音频可以再def一个函数,看起来比较好看,容易读。

这里我把字典的对应表示出来key:[href,sorce,play_volume,[video_url,audio_url]]。

另外可以见到我里面有sleep,为什么呢?因为我们是讲武德的。
在这里插入图片描述
好了,这一期爬虫就到处为止,如果你有不懂得。
下面是我微信公众号。可以关注一下
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_45886778/article/details/109697435