Python:火山小视频-无水印视频-多线程-批量采集实现和完整代码

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/ucsheep/article/details/89419343

QQ群:812930741,获取可运行程序
完整代码在本文最后

采集某用户所有的无水印视频得以实现,依赖于两个问题的解决

  • 如何根据用户id获取到该用户的所有视频信息(需要包含播放地址)
  • 如何获得一个用户的id

1.根据用户id获取该用户所有视频信息

第一次请求

curl 
-H 'Host: api-a.huoshan.com' 
-H 'Cookie: xxxxxxxxxxxxxx"' 
-H 'X-SS-REQ-TICKET: xxxxxxxxxxxx'
 -H 'sdk-version: 1' 
 -H 'X-SS-TC: 0' 
 -H 'User-Agent: xxxxxxxxxxx' 
 -H 'X-Pods: ' 
 --compressed 'https://api-a.huoshan.com/hotsoon/user/109311764519/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=1555726295'

请求地址为

https://api-a.huoshan.com/hotsoon/user/109311764519/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=1555726295

重要参数

参数名 示例 含义
url user/109311764519/items/ 这个路径信息中包含用户id,109311764519
min_time 0 第一次请求

请求响应如下
第一次请求响应

参数名 含义
data 视频信息,20个视频
extra 分页信息
has_more 还有没有下一页
total 总数
max_time 请求下一页的参数

我们看一下第二次请求

https://api-a.huoshan.com/hotsoon/user/109311764519/items/?
max_time=1555157100000&offset=20&count=20
&req_from=feed_loadmore&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726323246&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=1555726323

我们看到,关键参数和第一次请求的关联

参数名 示例 含义
url user/109311764519/items/ 没有变化
max_time 1555157100000 这里就不是min_time了,这个值也是上一次请求返回的

你已经发现,只要has_more有值,就一直去请求下一页数据,直到拿完,这样就拿到该用户所有的视频了。
python 代码如下:

    def get_user_videos(self, user_id):
        url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
            int(time.time()))
        response = requests.get(url, headers=http_headers, timeout=10)
        data = json.loads(response.text)['data']
        extra = json.loads(response.text)['extra']
        self._join_download_queue(data, user_id)
        video_list = data
        while (extra['has_more']):
            try:
                url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?max_time=" + str(extra[
                                                                                                          'max_time']) + "&offset=20&count=20&req_from=feed_loadmore&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726323246&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
                    int(time.time()))
                response = requests.get(url, headers=http_headers, timeout=10)
                data = json.loads(response.text)['data']
                extra = json.loads(response.text)['extra']
                self._join_download_queue(data, user_id)
                video_list = video_list + data
            except:
                pass
        return len(video_list)

2.每个视频的播放地址

首先,看一下json

{
	"data": {
		"allow_comment": true,
		"allow_dislike": true,
		"allow_share": true,
		"at_users": [],
		"author": {
			"allow_be_located": true,
			"allow_find_by_contacts": true,
			"allow_others_download_video": true,
			"allow_others_download_when_sharing_video": true,
			"allow_share_show_profile": true,
			"allow_show_in_gossip": true,
			"allow_show_my_action": true,
			"allow_strange_comment": true,
			"allow_unfollower_comment": true,
			"anchor_level": {
				"experience": 403,
				"highest_experience_this_level": 460,
				"level": 10,
				"lowest_experience_this_level": 311,
				"profile_dialog_bg": {
					"uri": "hotsoon-resource/anchor_level_1.1_3x.png",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.1_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.1_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.1_3x.png"]
				},
				"profile_dialog_bg_back": {
					"uri": "hotsoon-resource/anchor_level_1.2_3x.png",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.2_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.2_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.2_3x.png"]
				},
				"small_icon": {
					"uri": "hotsoon-resource/anchor_level_small_10_3x.png",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_small_10_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_small_10_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_small_10_3x.png"]
				},
				"stage_level": {
					"uri": "hotsoon-resource/anchor_level_1_3x.png",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1_3x.png"]
				},
				"task_decrease_experience": 0,
				"task_end_time": 1548691140,
				"task_start_experience": 0,
				"task_start_time": 1546145769,
				"task_target_experience": 0
			},
			"avatar_jpg": {
				"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
				"url_list": ["http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.jpg", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.jpg", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.jpg"]
			},
			"avatar_large": {
				"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
				"url_list": ["http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~1080x1080.webp", "http://p9-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~1080x1080.webp", "http://p9-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~1080x1080.webp"]
			},
			"avatar_medium": {
				"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
				"url_list": ["http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~720x720.webp", "http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~720x720.webp", "http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~720x720.webp"]
			},
			"avatar_thumb": {
				"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
				"url_list": ["http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.webp", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.webp", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.webp"]
			},
			"bg_img_url": "",
			"birthday": 0,
			"birthday_description": "90后",
			"birthday_valid": false,
			"block_status": 0,
			"city": "唐山",
			"comment_restrict": 1,
			"constellation": "",
			"disable_ichat": 0,
			"enable_ichat_img": 1,
			"encrypted_id": "MS4wLjABAAAARp_KVU2BZK4BOo5xHhk0u5R-6vT7sQSf0teStWy8yjk",
			"exp": 0,
			"fan_ticket_count": 27583,
			"fold_stranger_chat": false,
			"follow_status": 0,
			"gender": 1,
			"hotsoon_verified": false,
			"hotsoon_verified_reason": "",
			"ichat_restrict_type": 1,
			"id": 109311764519,
			"id_str": "109311764519",
			"income_share_percent": 0,
			"is_follower": false,
			"is_following": false,
			"level": 1,
			"need_profile_guide": false,
			"nickname": "唐山赵鹏",
			"pay_grade": {
				"diamond_icon": {
					"uri": "mosaic-legacy/12400003aba3dd42e213",
					"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/12400003aba3dd42e213", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/12400003aba3dd42e213", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/12400003aba3dd42e213"]
				},
				"grade_banner": "28级可开启豪华入场",
				"grade_describe": "距升级还需消费35钻",
				"grade_icon_list": [{
					"icon": {
						"uri": "mosaic-legacy/3b65000678eac77af1d9",
						"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9"]
					},
					"icon_diamond": 100,
					"level": 6,
					"level_str": "Lv.6"
				}, {
					"icon": {
						"uri": "mosaic-legacy/3b65000678eac77af1d9",
						"url_list": ["http://p9-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9"]
					},
					"icon_diamond": 200,
					"level": 7,
					"level_str": "Lv.7"
				}, {
					"icon": {
						"uri": "mosaic-legacy/3b620006b1e388185513",
						"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/3b620006b1e388185513", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b620006b1e388185513", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b620006b1e388185513"]
					},
					"icon_diamond": 300,
					"level": 8,
					"level_str": "Lv.8"
				}],
				"icon": {
					"uri": "mosaic-legacy/30eb0000a101d40eea0c",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/30eb0000a101d40eea0c", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/30eb0000a101d40eea0c", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/30eb0000a101d40eea0c"]
				},
				"im_icon": {
					"uri": "mosaic-legacy/2ea8000962099e965ff0",
					"url_list": ["http://p9-hs.bytecdn.cn/obj/mosaic-legacy/2ea8000962099e965ff0", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/2ea8000962099e965ff0", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/2ea8000962099e965ff0"]
				},
				"im_icon_with_level": {
					"uri": "mosaic-legacy/78a1007d7263887d923b",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/78a1007d7263887d923b", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a1007d7263887d923b", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a1007d7263887d923b"]
				},
				"level": 7,
				"live_icon": {
					"uri": "mosaic-legacy/30ee0007ccef28b99639",
					"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/30ee0007ccef28b99639", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/30ee0007ccef28b99639", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/30ee0007ccef28b99639"]
				},
				"name": "树苗",
				"new_im_icon_with_level": {
					"uri": "mosaic-legacy/78a200737bef2df0fee9",
					"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a200737bef2df0fee9", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/78a200737bef2df0fee9", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/78a200737bef2df0fee9"]
				},
				"new_live_icon": {
					"uri": "mosaic-legacy/78a10056e336cb6eb911",
					"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a10056e336cb6eb911", "http://p9-hs.bytecdn.cn/obj/mosaic-legacy/78a10056e336cb6eb911", "http://p9-hs.bytecdn.cn/obj/mosaic-legacy/78a10056e336cb6eb911"]
				},
				"new_nav_live_icon": {
					"uri": "hotsoon-resource/new_nva_level_icon_7.png",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/new_nva_level_icon_7.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/new_nva_level_icon_7.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/new_nva_level_icon_7.png"]
				},
				"next_diamond": 500,
				"next_icon": {
					"uri": "mosaic-legacy/12400003aae89daccd69",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/12400003aae89daccd69", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/12400003aae89daccd69", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/12400003aae89daccd69"]
				},
				"next_name": "树叶",
				"now_diamond": 265,
				"pay_diamond_bak": 0,
				"profile_dialog_bg": {
					"uri": "hotsoon-resource/user_level_1.1_3x.png",
					"url_list": ["http://p1-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.1_3x.png", "http://p3-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.1_3x.png", "http://p3-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.1_3x.png"]
				},
				"profile_dialog_bg_back": {
					"uri": "hotsoon-resource/user_level_1.2_3x.png",
					"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.2_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.2_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.2_3x.png"]
				},
				"screen_chat_type": 2,
				"this_grade_max_diamond": 299,
				"this_grade_min_diamond": 200,
				"total_diamond_count": 275,
				"upgrade_need_consume": 35
			},
			"pay_scores": 80,
			"push_comment_status": true,
			"push_digg": true,
			"push_follow": true,
			"push_friend_action": true,
			"push_ichat": true,
			"push_status": true,
			"push_video_post": true,
			"push_video_recommend": true,
			"short_id": 651258764,
			"signature": "",
			"type_a1": 1,
			"verified": false,
			"verified_mobile": true,
			"verified_reason": ""
		},
		"comment_delay": -1,
		"create_time": 1555714446,
		"description": "",
		"disable_watermark": false,
		"extra_scheme_url": "sslocal://webview?url=https%3A%2F%2Fhotsoon.snssdk.com%2Ffalcon%2Flive_inapp%2Fpage%2Fpush_hot%2Findex.html%23%2F%3Fitem_id%3D6681742665865907464\u0026hide_nav_bar=1\u0026hide_more=1\u0026disable_bounces=1",
		"follow_display": false,
		"follow_status_tag": "",
		"friend_action_list": null,
		"id": 6681742665865907464,
		"id_str": "6681742665865907464",
		"item_log_extra": "{\"item_type\":\"item\"}",
		"location": "",
		"media_type": 4,
		"prefetch_comment": false,
		"prefetch_profile": false,
		"share_description": "这个视频居然有 3044 次播放,快来围观\u003e\u003e",
		"share_enable": true,
		"share_strong_guide": 0,
		"share_title": "「唐山赵鹏」的这个视频好6,快来围观!",
		"share_url": "http://reflow.huoshan.com/share/item/6681742665865907464/?tag=0\u0026timestamp=1555726295\u0026watermark=2\u0026media_type=4\u0026",
		"song": {
			"album": "",
			"author": "汤潮",
			"cover_large": {
				"uri": "douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d",
				"url_list": ["http://sf1-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~720x720.webp", "http://sf3-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~720x720.webp", "http://sf6-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~720x720.webp"]
			},
			"cover_thumb": {
				"uri": "douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d",
				"url_list": ["http://sf1-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~100x100.webp", "http://sf3-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~100x100.webp", "http://sf6-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~100x100.webp"]
			},
			"duration": 228,
			"id": 6581309914948307719,
			"play_url": {
				"uri": "9fe4000380eebcd5f4c7",
				"url_list": ["http://p3-hs.bytecdn.cn/obj/9fe4000380eebcd5f4c7", "http://p1-hs.bytecdn.cn/obj/9fe4000380eebcd5f4c7", "http://p6-hs.bytecdn.cn/obj/9fe4000380eebcd5f4c7"]
			},
			"share_description": "玩视频上火山,快来围观!",
			"share_title": "玩视频上火山,快来围观!",
			"share_url": "https://reflow.huoshan.com/share/music/6581309914948307719/",
			"source_platform": 25,
			"status": 1,
			"title": "美了美了",
			"video_cnt": 60786
		},
		"stats": {
			"comment_count": 17,
			"digg_count": 77,
			"play_count": 3044,
			"share_count": 2
		},
		"status": 102,
		"tips": "",
		"tips_url": "https://hotsoon.snssdk.com/hotsoon/in_app/pyramid_selling/?source=money",
		"title": "",
		"user_bury": 0,
		"user_digg": 0,
		"video": {
			"allow_cache": true,
			"cover": {
				"avg_color": "#EBCEE1",
				"uri": "tplv-hs-large/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b",
				"url_list": ["http://p3-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-large.webp", "http://p1-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-large.webp", "http://p6-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-large.webp"]
			},
			"cover_animated": null,
			"cover_medium": {
				"avg_color": "#7A6D53",
				"uri": "tplv-hs-medium/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b",
				"url_list": ["http://p3-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-medium.webp", "http://p1-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-medium.webp", "http://p6-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-medium.webp"]
			},
			"cover_thumb": {
				"avg_color": "#3D3D3D",
				"uri": "tplv-hs-live:100:100/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b",
				"url_list": ["http://p3-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-live:100:100.webp", "http://p1-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-live:100:100.webp", "http://p6-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-live:100:100.webp"]
			},
			"download_url": ["https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=0\u0026app_id=1112\u0026vquality=normal\u0026watermark=2\u0026long_video=0\u0026sf=3\u0026ts=1555726295", "https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=1\u0026app_id=1112\u0026vquality=normal\u0026watermark=2\u0026long_video=0\u0026sf=3\u0026ts=1555726295"],
			"duration": 22.385,
			"gif_uri": "1f29c0004ca82ca37b85b",
			"gif_url_list": ["http://p3-hs.bytecdn.cn/img/mosaic-legacy/1f29c0004ca82ca37b85b~noop.image", "http://p1-hs.bytecdn.cn/img/mosaic-legacy/1f29c0004ca82ca37b85b~noop.image", "http://p1-hs.bytecdn.cn/img/mosaic-legacy/1f29c0004ca82ca37b85b~noop.image"],
			"h265_uri": "h265/v0300cde0000bit52uar6q7snu08rn50_720p",
			"h265_url": ["http://v3-hs.ixigua.com/72cde5258e0e1bce5c28d71169077b6d/5cba8dfd/video/m/2208c1736b80fb04ca4a1bee6a9fe4320551161cf75e00009ce3770cff79/?rc=M3E7eHZ0dXg4bDMzaGYzM0ApQHRAbzg5Njo8MzQzMzY0NDUzNDVvQGg2dSlAZjV1KWZzcHcxeW9mNTRAMm8tazQzXi9xXy0tYS0wc3MtbyNvIzIuNC0yMS0uMi4tLTE2LTojbyM6YS1xIzpgaF4rYmZiZjojLi5e", "http://v6-hs.ixigua.com/dfa36dfcb74ffe4358dfe9eed886a122/5cba8dfd/video/m/2208c1736b80fb04ca4a1bee6a9fe4320551161cf75e00009ce3770cff79/?rc=M3E7eHZ0dXg4bDMzaGYzM0ApQHRAbzg5Njo8MzQzMzY0NDUzNDVvQGg2dSlAZjV1KWZzcHcxeW9mNTRAMm8tazQzXi9xXy0tYS0wc3MtbyNvIzIuNC0yMS0uMi4tLTE2LTojbyM6YS1xIzpgaF4rYmZiZjojLi5e", "https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=0\u0026app_id=1112\u0026vquality=normal\u0026quality=720p\u0026codec=h265\u0026sf=3\u0026origin=0\u0026ts=1555726295"],
			"height": 1024,
			"preload_size": 375000,
			"uri": "v0300cde0000bit52uar6q7snu08rn50",
			"url_list": ["https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=0\u0026app_id=1112\u0026vquality=normal\u0026watermark=0\u0026long_video=0\u0026sf=3\u0026ts=1555726295", "https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=1\u0026app_id=1112\u0026vquality=normal\u0026watermark=0\u0026long_video=0\u0026sf=3\u0026ts=1555726295"],
			"video_id": "v0300cde0000bit52uar6q7snu08rn50",
			"watermark": true,
			"width": 576
		},
		"weibo_share_title": "#玩视频上火山#唐山赵鹏在火山上分享了视频,快来围观!传送门戳我\u003e\u003ehttp://reflow.huoshan.com/share/item/6681742665865907464/?tag=0\u0026timestamp=1555726295\u0026watermark=2\u0026media_type=4\u0026"
	},
	"rid": "2019042010113501001405907731655",
	"tags": [],
	"type": 3
}

明显: item[‘data’][‘video’][‘url_list’] 内存放播放地址

3.获取用户id

当每一个用户的主页链接被分享时,会产生一个短链接如下

http://reflow.huoshan.com/hotsoon/s/vAzc0get700/

然而当用户实际打开的时候,会经过301(缺少结尾/时)、302重定向到真实的页面
重定向
302重定向,该请求响应的Location为

http://reflow.huoshan.com/share/user/23790988726/?timestamp=1555745525&share_ht_uid=-1&did=53955772475&utm_medium=huoshan_android&tt_from=copy_link&iid=69961357053&app=live_stream&utm_source=copy_link&schema_url=sslocal%3A%2F%2Fprofile%3Fid%3D23790988726

包含我们要的user_id=23790988726
通过如下代码根据短链接获取到user_id

response = requests.get(short_url, headers=http_headers, timeout=10)
user_id = str(response.url).split("/")[5]

4.完整代码

综上,Python实现多线程批量采集火山小视频的代码如下

from six.moves import queue as Queue
import requests
import time
import json
from threading import Thread
import os
import sys
import codecs

http_headers = { 'Accept': '*/*','Connection': 'keep-alive', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36'}
THREADS = 10

def make_sure_path(path):
    try:
        if not os.path.exists(path):
            os.mkdir(path)
    except:
        pass

def download(item, user_id):
    local_path = os.getcwd()
    mp4_path = os.path.join(local_path, "download")
    make_sure_path(mp4_path)
    user_path = os.path.join(mp4_path, str(user_id))
    make_sure_path(user_path)
    dl_url = item['data']['video']['url_list'][0]
    dl_vid = item['data']['video']['video_id']
    video_path = os.path.join(user_path, str(dl_vid) + '.mp4')
    if os.path.exists(video_path):
        return
    print("Downloading %s from %s.\n" % (dl_vid, dl_url))
    try:
        r = requests.get(dl_url)
        with open(video_path, "wb") as code:
            code.write(r.content)
    except:
        pass


class DownloadWorker(Thread):
    def __init__(self, queue):
        Thread.__init__(self)
        self.queue = queue

    def run(self):
        while True:
            item, user_id = self.queue.get()
            download(item, user_id)
            self.queue.task_done()

class CrawlerScheduler(object):
    def __init__(self, user_url):
        self.short_url = user_url
        self.queue = Queue.Queue()
        self.scheduling()

    def scheduling(self):
        for x in range(THREADS):
            worker = DownloadWorker(self.queue)
            worker.daemon = True
            worker.start()
        self.download_user_videos()

    def download_user_videos(self):
        response = requests.get(self.short_url, headers=http_headers, timeout=10)
        user_id = str(response.url).split("/")[5]
        video_count = self.get_user_videos(user_id)
        self.queue.join()
        print("\n火山用户- %s, 视频数量- %s\n\n" % (user_id, str(video_count)))
        print("\n下载完成- %s\n\n" % user_id)

    def get_user_videos(self, user_id):
        url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
            int(time.time()))
        response = requests.get(url, headers=http_headers, timeout=10)
        data = json.loads(response.text)['data']
        extra = json.loads(response.text)['extra']
        self._join_download_queue(data, user_id)
        video_list = data
        while (extra['has_more']):
            try:
                url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?max_time=" + str(extra[
                                                                                                          'max_time']) + "&offset=20&count=20&req_from=feed_loadmore&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726323246&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
                    int(time.time()))
                response = requests.get(url, headers=http_headers, timeout=10)
                data = json.loads(response.text)['data']
                extra = json.loads(response.text)['extra']
                self._join_download_queue(data, user_id)
                video_list = video_list + data
            except:
                pass
        return len(video_list)

    def _join_download_queue(self, list, user_id):
        for item in list:
            self.queue.put((item, user_id))

def parse_txt(fileName):
    with open(fileName, "rb") as f:
        txt = f.read().rstrip().lstrip()
        txt = codecs.decode(txt, 'utf-8')
        txt = txt.replace("\t", ",").replace(
            "\r", ",").replace("\n", ",").replace(" ", ",")
        txt = txt.split(",")
    numbers = list()
    for raw_site in txt:
        site = raw_site.lstrip().rstrip()
        if site:
            numbers.append(site)
    return numbers

if __name__ == "__main__":
    if os.path.exists("url.txt"):
        content = parse_txt("url.txt")
    else:
        print("找不到 url.txt")
        sys.exit(1)
    for user_url in content:
        CrawlerScheduler(user_url)

猜你喜欢

转载自blog.csdn.net/ucsheep/article/details/89419343
今日推荐