1. crawling Jingdong information
2. crawled pages of information on
many sites there are restrictions on crawling, relatively invisible, view network head, is not a reptile request is denied.
View header information, you can visit to see the head , may be declined
so we built key-value pairs, the change in header information on the url..
kV = { 'User-Agent': 'the Mozilla / 5.0'}
3. Baidu submit / 360 keyword search
Baidu keyword word Interface:
http://www.baidu.com/s?wd=keyword
360 interfaces Keywords:
http://www.so.com/s?q= keyword
so we can construct url can be extracted for keyword