python reptile Batch download pictures

Today overtime ah, damned hard! !

Bored, written in python caught a reptile picture, I feel very nice, ha ha

First pasting the code: (python Version: 2.7.9)

__author__ = 'bloodchilde'
 
 
import  urllib
import urllib2
import  re
import os
 
class Spider:
    def __init__(self):
        self.siteUrl="http://sc.chinaz.com/biaoqing/"
        self.user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko'
        self.headers = { 'User-Agent' : self.user_agent }
 
    def getPage(self,pageIndex):
        url = self.siteUrl+"index_"+str(pageIndex)+".html"
        request = urllib2.Request(url,headers = self.headers)
        response = urllib2.urlopen(request)
        return response.read().decode("utf-8")
 
    def getContents(self,pageIndex):
        page = self.getPage(pageIndex)
 
        pattern = re.compile('''<div.*?class='num_1'.*?>.*?<p>.*?<a.*?href='.*?'.*?target='_blank'.*?title='(.*?)'.*?><img.*?src2="(.*?)".*?>.*?</a>.*?</p>.*?</div>''',re.S)
 
        items = re.findall(pattern,page)
 
        contents=[]
 
        for item in items:
            contents.append([item[0],item[1]])
        return contents
 
    def mk_dir(self,path):
 
 
        isExisist = os.path.exists(path)
 
        if not isExisist:
            os.makedirs(path)
            return True
        else:
            return False
 
    def downImage(self,url,dirname):
        imageUrl = url
        request = urllib2.Request(imageUrl,headers = self.headers)
        response = urllib2.urlopen(request)
        imageContents = response.read()
 
        urlArr = imageUrl.split(u"/")
        imageName = str(urlArr[len(urlArr)-1])
 
        print imageName
 
        path = u"C:/Users/bloodchilde/Desktop/image_python/"+dirname
 
        self.mk_dir(path)
 
        imagePath = path+u"/"+imageName
 
        f = open(imagePath, 'wb')
        f.write(imageContents)
        f.close()
 
    def downLoadAllPicture(self,PageIndex):
        contents = self.getContents(PageIndex)
 
        for list in contents:
            dirname = list[0]
            imageUrl = list[1]
            self.downImage(imageUrl,dirname)
 
 
 
 
demo = Spider()
 
for page in range(3,100):
    demo.downLoadAllPicture(page)
 

Results are as follows:


Download so many pictures, and instantly get to analyze the following procedures:

First of all, my goal page is:

http://sc.chinaz.com/biaoqing/index_3.html

Program features to this page to download emoticons

Program ideas:

1, access to the source code of web page information

2, parse the source code to obtain the URL to download pictures (regular process)

3, repositioning get information url url url to initiate a request for this picture, this picture is actually the url information content contents

4, obtained by the image above URL can also take a picture of the name (name suffix) imageName

5, create a file in the local imageName to get the name, the contents of the contents can be written into the file

Open http://sc.chinaz.com/biaoqing/index_3.html, view source, find the code segment to be addressed as follows:

Corresponding regular is:

'''<div.*?class='num_1'.*?>.*?<p>.*?<a.*?href='.*?'.*?target='_blank'.*?title='(.*?)'.*?><img.*?src2="(.*?)".*?>.*?</a>.*?</p>.*?</div>'''

Us from obtaining title and snippet src2, title as a folder name, src2 picture as a target the URL of
----------------
Disclaimer: This article is CSDN blogger "Little Wei "the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/dai_jing/article/details/46661969

Published 91 original articles · won praise 47 · views 90000 +

Guess you like

Origin blog.csdn.net/qq_30007885/article/details/102521390