Python网络爬虫（三）——Requests案例

1. 案例1：京东商品页面的爬取

import requests
url = "https://item.jd.com/2967929.html"
try:
    r = requests.get(url)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[:1000]) #内容前1000字符
except:
    print("爬取失败")

2. 案例2：亚马逊商品的爬取

按案例1爬取时，网站会识别为Python爬取，而不是人为访问，应而不可爬取相关数据，因此需要设置为某浏览器类型，访问

import requests
url = "https://www.amazon.cn/gp/product/B01M8L5z3Y"
try:
    kv = {'user-agent':'Mozilla/5.0'}
    r = requests.get(url,headers=kv)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[1000:2000]) 
except:
    print("爬取失败")

3. 案例3：百度关键字搜索

import requests
keyword = "Python"
try:
    kv = {'wd':keyword}
    r = requests.get("http://www.baidu.com/s",params=kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except:
    print("爬取失败")

4. 案例4：网络图片的爬取和存储

import requests
import os
url = "http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg"
root = "F://picture//"
path = root + url.split('/')[-1] #获取下载图片名
try:
    if not os.path.exists(root): #如果文件不存在，新建文件
        os.mkdir(root)
    if not os.path.exists(path): #如果图片不存在，及下载至本地
        r = requests.get(url)
        with open(path,'wb') as f: #读取文件
            f.write(r.content)
            f.close()
            print("文件保存成功")
    else:
        print("文件已存在")
except:
    print("爬取失败")

5. 案例5：IP地址归属地的自动查询

网址：http://www.ip138.com

查询格式：http://www.ip138.com/ips138.asp?ip=ipaddress

import requests
url = "http://www.ip138.com/ips138.asp?ip="
try:
    r = requests.get(url,'202.204.80.112')
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[-2500:-1000])
except:
    print("爬取失败")

菜鸟之志

发布了53 篇原创文章 · 获赞 117 · 访问量 2万+

私信关注

Python网络爬虫（三）——Requests案例

1. 案例1：京东商品页面的爬取

2. 案例2：亚马逊商品的爬取

3. 案例3：百度关键字搜索

4. 案例4：网络图片的爬取和存储

5. 案例5：IP地址归属地的自动查询

猜你喜欢