python 爬虫方式获取数据

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/weixin_44580977/article/details/102056198
urllib.request模块
from urllin import request
resp = request.urlopen("http://image.baidu.com/")
print(resp.read().decode())

<!DOCTYPE html>              <!--STATUS OK-->  <head>   <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta name="description" content="百度图片使用世界前沿的人工智能技术,为用户甄选海量的高清美图,用更流畅、更快捷、更精准的搜索体验,带你去发现多彩的世界。"> <meta http-equiv="X-UA-Compatible" content="IE=Edge"/> <meta name="baidu-site-verification" content="2ltGWMzql9"/>  <script>
    var bdimgdata = {
        logid: '11007013867272265913',
        sid: 'dc1c38881068b98784a4a5fc83d5a92f6b2743ee',
        wh: window.screen.width + 'x' + window.screen.height,
        sampid: '-1',
        protocol: window.location.protocol.replace(':', ''),
        spat: 0 + '-' + ''
    }
......   

获取内容要用read()方法,因为内容是二进制要解码decode()成字符串

urllib3 库

推荐使用的urllib3库

import urllib3
http = urllib3.PoolManager();
resp_dat = http.request('GET', "http://image.baidu.com/")
print(resp_dat.data.decode())
实战例程

爬取东方财富网股票信息

#访问行业板块数据
http = urllib3.PoolManager();

pages = 4
conts = []
for p in range(1,pages+1):
    url = "http://nufm.dfcfw.com/EM_Finance2014NumericApplication/JS.aspx?cb=jQuery1124012582582823807198_1554554782636&type=CT&token=4f1862fc3b5e77c150a2b985b12db0fd&sty=FPGBKI&js=({data:[(x)],recordsFiltered:(tot)})&cmd=C._BKHY&st=(ChangePercent)&sr=-1&p=%d"%p
    url += "&ps=20&_=1554554783027"
    try:
        resp_dat = http.request('GET', url)
        pattern = re.compile(r'BK(.*?)"')
        bk_list = re.findall(pattern,resp_dat.data.decode())
        for bk in bk_list:
            conts.append(bk)
        print(resp_dat.data.decode())
    except Exception as e:
        print(resp_dat.status)
        print(e)

print(conts)
#截取部分内容

df = pd.DataFrame(np.zeros((len(conts), 7)), columns=[u'板块名称', u'BK涨跌幅', u'总市值', u'换手率', u'涨跌家数', u'领涨股票', u'SK涨跌幅'])

for num, bk_dat in enumerate(conts) :
    bk_dat = bk_dat.split(',')
    df.loc[df.index[num], u'板块名称'] = bk_dat[1]
    df.loc[df.index[num], u'BK涨跌幅'] = bk_dat[2]
    df.loc[df.index[num], u'总市值'] = bk_dat[3]
    df.loc[df.index[num], u'换手率'] = bk_dat[4]
    df.loc[df.index[num], u'涨跌家数'] = bk_dat[5]
    df.loc[df.index[num], u'领涨股票'] = bk_dat[8]
    df.loc[df.index[num], u'SK涨跌幅'] = bk_dat[10]
    
df.to_csv("table-bk.csv", columns=df.columns, index=True, encoding='gb2312')

猜你喜欢

转载自blog.csdn.net/weixin_44580977/article/details/102056198