【Python爬虫】使用urllib.request下载已知链接的网络资源

如果有这样一个场景，我们的EXCEL某一列记录了好多（图片、视频、音频）链接A，另外一列记录了链接名称B，现在我们想要自动下载这些链接的文件，我们应该怎样处理？
1.循环去excel取值,将A和B存入到一个二维列表中
2.根据链接后缀不同情况（.jpg,.mp4,mp3等）用urllib.request去下载内容

具体代码如下：

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
#作者：cacho_37967865
#博客：https://blog.csdn.net/sinat_37967865
#文件：getFile.py
#日期：2018-11-24
#备注：获取excel文件中下载信息存入到列表，然后循环去取数据下载文件（mp4,mp3,jpg,pdf等）
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

import xlrd
import urllib.request
import os

def get_excel_cell(xlsFile,num,nrows):
    data = xlrd.open_workbook(xlsFile)
    table = data.sheets()[0]
    cellData = []

    # 获取指定列数据
    for i in range(num, nrows):              # 控制行数（开始i=num处理），（结束i=nrows不处理）
        row = []
        className = table.cell_value(i, 3)   # 第4列课程名称
        row.append(className)
        classUrl = table.cell_value(i, 4)    # 第5列课程下载路径
        row.append(classUrl)

        cellData.append(row)
    return cellData


def get_video(folder,url,fileName,fileType):
    os.chdir(folder)                           # 切换到将要存放文件的目录
    file = open(fileName + fileType, "wb")     # 打开文件
    try:
        req = urllib.request.Request(url=url)
        req.add_header("User-Agent","Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36")
        video = urllib.request.urlopen(req, timeout=40)
        mp4 = video.read()                     # 将文件转换为bytes格式
        file.write(mp4)                        # 文件写入
        print(type(file),type(req),type(video),type(mp4))
    except Exception as f:
        print(str(f))
    file.close()


if __name__ == '__main__':
    videoInfo = get_excel_cell('F:\PythonProject\Pacong\docs\yuyus185.xls',182,183)
    for i in range(len(videoInfo)):
        fileName = videoInfo[i][0]
        url = videoInfo[i][1]
        fileType = url[-4:]          # 截取最后4位，可以判断内容的类型（.jpg,.mp4,mp3等）
        print(fileName,fileType,url)
        get_video('F:\SoftwareTest',url,fileName,fileType)

【Python爬虫】使用urllib.request下载已知链接的网络资源

猜你喜欢