抓取猫眼榜单top100源代码分享 - 代码天地

抓取猫眼榜单top100源代码分享

其他 2018-10-27 01:46:09 阅读次数: 0

视频链接如下：
https://edu.hellobi.com/course/156/lessons

import requests
from multiprocessing import Pool
import re
import json
from requests.exceptions import RequestException

def get_one_page(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return response.text
        return None
    except RequestException:
        return None
    


def parse_one_page(html):
    pattern = re.compile('<dd>.*?board-index.*?>(\d+)</i>.*?data-src="(.*?)".*?name"><a'#注意这里data-src检查元素跟查看网页源代码结果不同
                         +'.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>'
                         +'.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>',re.S)
    items = re.findall(pattern,html)
    for item in items:
        yield {
            "index":item[0],
            "image":item[1],
            "title":item[2],
            "actor":item[3].strip()[3:],
            "time":item[4].strip()[5:],
            "score":item[5]+item[6]
        }

def write_to_file(content):
    with open("猫眼top100.txt","a",encoding = "utf-8")as f:
        f.write(json.dumps(content,ensure_ascii=False) + '\n')
        f.close()
        
def main(offset):
    url = 'http://maoyan.com/board/4?offset='+str(offset)
    html = get_one_page(url)
    for item in parse_one_page(html):
        write_to_file(item)
    
if __name__ == '__main__':
    for i in range(10):
        main(i*10)
        
    '''
    pool = Pool()
    pool.map(main,[i*10 for i in range(10)])
    '''

猜你喜欢

转载自blog.csdn.net/yj13811596648/article/details/83044340

抓取猫眼榜单top100源代码分享

Python学习--猫眼电影TOP100榜单抓取

抓取猫眼电影排行top100

抓取猫眼TOP100电影信息

爬虫_抓取猫眼电影TOP100

猫眼电影top100抓取案例

python实战---猫眼榜单：TOP100榜

爬取猫眼电影榜单TOP100

猫眼top100

笔记-多进程抓取猫眼TOP100

Python爬虫之一：抓取猫眼电影TOP100

正则匹配的抓取猫眼电影排行Top100

00_抓取猫眼电影排行TOP100

spider(猫眼电影Top100信息抓取)

Python 抓取猫眼电影TOP100数据

【python】爬取猫眼电影TOP100代码分享

猫眼电影top100

《崔庆才Python3网络爬虫开发实战教程》学习笔记（3）：抓取猫眼电影榜单TOP100电影，并存入Excel表格

python爬虫实战：利用beautiful soup爬取猫眼电影TOP100榜单内容-1

python爬虫实战：利用pyquery爬取猫眼电影TOP100榜单内容-1

【python爬虫自学笔记】（实战）----爬取猫眼电影榜单Top100

50行Python爬取猫眼电影TOP100榜单信息

利用requests和正则爬取猫眼电影top100榜单

爬取猫眼电影榜单Top100—利用requests、正则表达式

爬猫眼电影top100

爬取猫眼Top100

python-猫眼爬虫Top100

网络爬虫-猫眼电影top100

猫眼电影TOP100榜

python猫眼top100实例

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)