python reptile Batch download pictures

Today overtime ah, damned hard! !

Bored, written in python caught a reptile picture, I feel very nice, ha ha

First pasting the code: (python Version: 2.7.9)

__author__ = 'bloodchilde'

import urllib
import urllib2
import re
import os

class Spider:
def __init__(self):
self.siteUrl="http://sc.chinaz.com/biaoqing/"
self.user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko'
self.headers = { 'User-Agent' : self.user_agent }

def getPage(self,pageIndex):
url = self.siteUrl+"index_"+str(pageIndex)+".html"
request = urllib2.Request(url,headers = self.headers)
response = urllib2.urlopen(request)
return response.read().decode("utf-8")

def getContents(self,pageIndex):
page = self.getPage(pageIndex)

pattern = re.compile('''<div.*?class='num_1'.*?>.*?<p>.*?<a.*?href='.*?'.*?target='_blank'.*?title='(.*?)'.*?><img.*?src2="(.*?)".*?>.*?</a>.*?</p>.*?</div>''',re.S)

items = re.findall(pattern,page)

contents=[]

for item in items:
contents.append([item[0],item[1]])
return contents

def mk_dir(self,path):

isExisist = os.path.exists(path)

if not isExisist:
os.makedirs(path)
return True
else:
return False

def downImage(self,url,dirname):
imageUrl = url
request = urllib2.Request(imageUrl,headers = self.headers)
response = urllib2.urlopen(request)
imageContents = response.read()

urlArr = imageUrl.split(u"/")
imageName = str(urlArr[len(urlArr)-1])

print imageName

path = u"C:/Users/bloodchilde/Desktop/image_python/"+dirname

self.mk_dir(path)

imagePath = path+u"/"+imageName

f = open(imagePath, 'wb')
f.write(imageContents)
f.close()

def downLoadAllPicture(self,PageIndex):
contents = self.getContents(PageIndex)

for list in contents:
dirname = list[0]
imageUrl = list[1]
self.downImage(imageUrl,dirname)

demo = Spider()

for page in range(3,100):
demo.downLoadAllPicture(page)

Results are as follows:

Download so many pictures, and instantly get to analyze the following procedures:

First of all, my goal page is:

http://sc.chinaz.com/biaoqing/index_3.html

Program features to this page to download emoticons

Program ideas:

1, access to the source code of web page information

2, parse the source code to obtain the URL to download pictures (regular process)

3, repositioning get information url url url to initiate a request for this picture, this picture is actually the url information content contents

4, obtained by the image above URL can also take a picture of the name (name suffix) imageName

5, create a file in the local imageName to get the name, the contents of the contents can be written into the file

Open http://sc.chinaz.com/biaoqing/index_3.html, view source, find the code segment to be addressed as follows:

Corresponding regular is:

'''<div.*?class='num_1'.*?>.*?<p>.*?<a.*?href='.*?'.*?target='_blank'.*?title='(.*?)'.*?><img.*?src2="(.*?)".*?>.*?</a>.*?</p>.*?</div>'''

Us from obtaining title and snippet src2, title as a folder name, src2 picture as a target the URL of
----------------
Disclaimer: This article is CSDN blogger "Little Wei "the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/dai_jing/article/details/46661969

Fisherman centuries

Published 91 original articles · won praise 47 · views 90000 +

Private letter concerns

python reptile Batch download pictures

Guess you like