前言：

学习爬虫，怎么能不拿王者荣耀来练手呢，正好CSDN上关于爬王者荣耀的帖子很多，正好方便学习，不懂的地方看一下大神的代码，这就是逛CSDN的乐趣。

链接：

https://pvp.qq.com/web201605/wallpaper.shtml

因为有分页，想找到下一页的超链接，发现怎么也找不到思路。看了一下CSDN其他大神的爬取过程，果断选取直接抓包，先把效果敲出来。

特别详细的URL：

http://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page=10&iOrder=0 &iSortNumClose=1&jsoncallback=jQuery17106927574791770883_1525742053044&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1525742856493

链接有点长，直接看参数表

这个参数也是很好懂，要不同的页面就给page传入不同的数字就行，0就是第一页。

代码实现：

'''
 爬虫练习---爬王者荣耀壁纸

 version：01
 author：金鞍少年
 date:2020-02-19

'''
import requests
from urllib import request
import os

path = '../res/王者荣耀/'  # 文件存档路径

class wallpapers():

    def __init__(self):
        self.count = 1  # 抓取的起始页
        self.headers = {
            'user-agent': '/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
            'referer': 'https://pvp.qq.com/web201605/wallpaper.shtml'}

    def get_page(self):
        while self.count <= 21: #21是目前总计页面数
            url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?' \
                    'activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page='+ str(
                self.count) + '&i' \
                    'Order=0&iSortNumClose=1&iAMSAc' \
                    'tivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&' \
                    'iActId=2735&iModuleId=2735&_=1582113303429'

            self.count += 1
            response = requests.get(url, headers=self.headers)
            if response.content:
                response = response.json()
                page_data = response['List']
                yield page_data

    def get_data(self, lists_data):
        hero_lists = []
        for data in lists_data:
            for index, jpg_url in enumerate(range(1, 9)):
                jpg_url = requests.utils.unquote(data["sProdImgNo_{}".format(jpg_url)]).replace('200', '0')  # url解码
                name = requests.utils.unquote(data["sProdName"])  # url解码
                hero_lists.append((index, name, jpg_url))
        yield hero_lists

    def write_data(self, hero_lists):
        try:
            for i in hero_lists:
                index = i[0]
                name = i[1].replace(":", "-")
                jpg_url = i[2]
                dir_path = path + name + '/'

                if not os.path.exists(dir_path):  # 创建文件夹
                    os.mkdir(dir_path)
                img = requests.get(jpg_url)
                with open(dir_path + '%s.jpg' % (index + 1), 'wb') as f:
                    f.write(img.content)
                    print('第{}张{}壁纸，！'.format(index, name))
        except request.URLError as e:
            if hasattr(e, 'reason'):
                    print(f'抓取失败，失败原因：{e.reason}')

    # 核心业务
    def fun(self):
        for data in self.get_page():
            for lists_data in self.get_data(data):
                self.write_data(lists_data)



if __name__ =='__main__':
    g = wallpapers()
    g.fun()
    print('下载成功！')

部分代码讲解：

1、因为拿到的JSON文件中的图片URL 和图片名是经urlencode编码的，所以需要用：requests.utils.unquote 来解码

requests.utils.unquote(url)  # 解码

requests.utils.quote(url)  # 编码

2、enumerate 函数的使用

enumerate函数将一个可迭代对象，组成一个索引序列，利用它可以同时获得索引和值。

li = ['锄禾日当午','汗滴禾下土','谁知盘中餐','粒粒皆辛苦']
for i in enumerate(li):
    print(i)  
  
'''
结果：
    (0, '锄禾日当午')
　　(1, '汗滴禾下土')
　　(2, '谁知盘中餐')
　　(3, '粒粒皆辛苦')

'''

for index, jpg_url in enumerate(range(1, 9)):的使用，是为了将图片url绑定一个索引值，方便后期存储图片时候命名

出现的BUG：

1、王者荣耀网站壁纸命名部分有重复的（如下图），会导致图片被覆盖。回头多学习一下，再完善代码。

2、由于抓取频率过高，抓到一半会被拦截，最好设置requests代理服务器，目前不会，日后研究一下再完善。

金鞍少年

发布了46 篇原创文章 · 获赞 37 · 访问量 4526

私信关注

python爬虫入门练习——爬王者荣耀壁纸

前言：

链接：

特别详细的URL：

代码实现：

部分代码讲解：

1、因为拿到的JSON文件中的图片URL 和图片名是经urlencode编码的，所以需要用：requests.utils.unquote 来解码

2、enumerate 函数的使用

出现的BUG：

猜你喜欢

python爬虫入门练习——爬王者荣耀壁纸

前言：

链接：

特别详细的URL：

代码实现：

部分代码讲解：

1、因为拿到的JSON文件中的图片URL 和图片名 是经urlencode编码的，所以需要用：requests.utils.unquote 来解码

2、enumerate 函数的使用

出现的BUG：

猜你喜欢

1、因为拿到的JSON文件中的图片URL 和图片名是经urlencode编码的，所以需要用：requests.utils.unquote 来解码