Python crawling video (using artifact: you-get)

What is you-get

Online:
Download audio and video from popular websites, such as YouTube, Youku, Niconico, and more.
Watch online videos in your favorite media player, away from browsers and advertisements
Download pictures you like on web pages
Download any non-HTML content, such as binary files

Personally think:
Which one is better to learn excavator technology, Shandong, China, look for Lanxiang!
Crawl videoWhich one is strong, python Dafa finds yg!

Simply put, you-get is an instructional tool for downloading videos.

installation

There are many ways to install, but Baidu, only one method is listed below.

pip3 install you-get
pip3 install --update you-get

use

you-get: version 0.4.1456, a tiny downloader that scrapes the web.
usage: you-get [OPTION]... URL...

A tiny downloader that scrapes the web

optional arguments:
  -V, --version         Print version and exit  获取版本
  -h, --help            Print this help message and exit  获取使用帮助

Dry-run options:
  (no actual downloading)

  -i, --info            Print extracted information	获取视频信息
  -u, --url             Print extracted information with URLs	获取视频的URL
  --json                Print extracted URLs in JSON format	获取视频的Json数据

Download options:
  -n, --no-merge        Do not merge video parts	不合并视频
  --no-caption          Do not download captions (subtitles, lyrics, danmaku,
                        ...)	不下载弹幕、字幕等
  -f, --force           Force overwriting existing files	重写视频文件
  --skip-existing-file-size-check
                        Skip existing file without checking file size	不检查存在视频文件的大小
  -F STREAM_ID, --format STREAM_ID
                        Set video format to STREAM_ID	设置视频格式为STREAM_ID
  -O FILE, --output-filename FILE
                        Set output filename	设置下载视频的名称
  -o DIR, --output-dir DIR
                        Set output directory	设置下载视频的文件夹
  -p PLAYER, --player PLAYER
                        Stream extracted URL to a PLAYER	用播放器播放视频
  -c COOKIES_FILE, --cookies COOKIES_FILE	
                        Load cookies.txt or cookies.sqlite	使用cookies文件加载cookies
  -t SECONDS, --timeout SECONDS
                        Set socket timeout	设置交互超时时间
  -d, --debug           Show traceback and other debug info	调试
  -I FILE, --input-file FILE
                        Read non-playlist URLs from FILE	从文件中读取非播放列表的url
  -P PASSWORD, --password PASSWORD
                        Set video visit password to PASSWORD	使用密码
  -l, --playlist        Prefer to download a playlist	下载多个视频(电视剧n集)
  -a, --auto-rename     Auto rename same name different files	自动命名
  -k, --insecure        ignore ssl errors	忽略ssl 错误

Proxy options:
  -x HOST:PORT, --http-proxy HOST:PORT
                        Use an HTTP proxy for downloading 使用http代理下载
  -y HOST:PORT, --extractor-proxy HOST:PORT	
                        Use an HTTP proxy for extracting only	使用http代理只提取
  --no-proxy            Never use a proxy	不使用代理
  -s HOST:PORT, --socks-proxy HOST:PORT
                        Use an SOCKS5 proxy for downloading	使用SOCKS5代理下载

Simple Demo (no proxy and cookies involved, you can update by yourself)

import ssl

ssl._create_default_https_context = ssl._create_unverified_context
import you_get
from you_get.extractors import *  # 可以获取到各个网站的下载器

'''
optional arguments:
  -V, --version         Print version and exit
  -h, --help            Print this help message and exit

Dry-run options:
  (no actual downloading)

  -i, --info            Print extracted information
  -u, --url             Print extracted information with URLs
  --json                Print extracted URLs in JSON format

Download options:
  -n, --no-merge        Do not merge video parts
  --no-caption          Do not download captions (subtitles, lyrics, danmaku,
                        ...)
  -f, --force           Force overwriting existing files
  --skip-existing-file-size-check
                        Skip existing file without checking file size
  -F STREAM_ID, --format STREAM_ID
                        Set video format to STREAM_ID
  -O FILE, --output-filename FILE
                        Set output filename
  -o DIR, --output-dir DIR
                        Set output directory
  -p PLAYER, --player PLAYER
                        Stream extracted URL to a PLAYER
  -c COOKIES_FILE, --cookies COOKIES_FILE
                        Load cookies.txt or cookies.sqlite
  -t SECONDS, --timeout SECONDS
                        Set socket timeout
  -d, --debug           Show traceback and other debug info
  -I FILE, --input-file FILE
                        Read non-playlist URLs from FILE
  -P PASSWORD, --password PASSWORD
                        Set video visit password to PASSWORD
  -l, --playlist        Prefer to download a playlist
  -a, --auto-rename     Auto rename same name different files
  -k, --insecure        ignore ssl errors

Proxy options:
  -x HOST:PORT, --http-proxy HOST:PORT
                        Use an HTTP proxy for downloading
  -y HOST:PORT, --extractor-proxy HOST:PORT
                        Use an HTTP proxy for extracting only
  --no-proxy            Never use a proxy
  -s HOST:PORT, --socks-proxy HOST:PORT
                        Use an SOCKS5 proxy for downloading
'''
if __name__ == '__main__':
    print("\033[37;41m 输入0退出 \033[0m")
    print('1、查看版本')
    print('2、查看使用手册')
    print('3、获取网页的视频信息')
    print('4、下载视频')
    print('5、获取视频的URL')
    print('6、获取视频Json格式的信息')
    print('7、下载多个视频(类似电视剧)')
    userChoose = input('输入你的选择(数字):')
    if not re.fullmatch('[0-7]', userChoose):
        print("\033[37;41m 请按照规矩输入\033[0m")
        exit(1)
    if userChoose.__eq__('0'):
        print("\033[37;41m Bye~ \033[0m")
        exit(1)
    if userChoose.__eq__('1'):
        sys.argv = ['you_get', '-V']
    if userChoose.__eq__('2'):
        sys.argv = ['you_get', '-h']
    if userChoose.__eq__('3'):
        URL = input('输入URL:')
        URL = URL.strip()
        sys.argv = ['you_get', '-i', URL]
    if userChoose.__eq__('4'):
        URL = input('输入URL:')
        path = input('输入视频存储地址:')
        URL = URL.strip()
        sys.argv = ['you_get', '-o', path, URL]
    if userChoose.__eq__('5'):
        URL = input('输入URL:')
        URL = URL.strip()
        sys.argv = ['you_get', '-u', URL]
    if userChoose.__eq__('6'):
        URL = input('输入URL:')
        URL = URL.strip()
        sys.argv = ['you_get', '--json', URL]
    if userChoose.__eq__('7'):
        URL = input('输入URL:')
        path = input('输入视频存储地址:')
        URL = URL.strip()
        sys.argv = ['you_get', '-o', path, '-l', URL]
    you_get.main()
    print("\033[37;41m Done!\033[0m")

Guess you like

Origin blog.csdn.net/Mr_Qian_Ives/article/details/107857774