路飞学城-——Python爬虫实战密训-——第2章

一:学习体会

     因为之前没学过Flask 相关的知识,看视频感觉有点懵,第一次作业题目比较简单,加上老师指点,做出来了。这次作业就有点难了。最近又是考试周,flask的知识也来不及补了,感觉完不成作业了。唉~  只能好好记录知识点了,考完试之后在耐心看看吧。

二:知识点

  1. tag = soup.find('a')
    print(tag)
    tag = soup.find(name='a', attrs={'class': 'sister'}, recursive=True, text='Lacie')
    tag = soup.find(name='a', class_='sister', recursive=True, text='Lacie')
    print(tag)
    
    tags = soup.find_all('a')
    print(tags)
    
    tags = soup.find_all('a',limit=1)              
    print(tags)
    
    tags = soup.find_all(name='a', attrs={'class': 'sister'}, recursive=True, text='Lacie')
    # tags = soup.find(name='a', class_='sister', recursive=True, text='Lacie')
    print(tags)
    
    
    ####### 列表 #######
    v = soup.find_all(name=['a','div'])           #
    print(v)
    
    v = soup.find_all(class_=['sister0', 'sister'])
    print(v)
    
    v = soup.find_all(text=['Tillie'])
    print(v, type(v[0]))
    
    
    v = soup.find_all(id=['link1','link2'])
    print(v)
    
    v = soup.find_all(href=['link1','link2'])
    print(v)
    
    # ####### 正则 #######
    import re
    rep = re.compile('p')
    rep = re.compile('^p')                   
    v = soup.find_all(name=rep)
    print(v)
    
    rep = re.compile('sister.*')
    v = soup.find_all(class_=rep)               
    print(v)
    
    rep = re.compile('http://www.oldboy.com/static/.*')
    v = soup.find_all(href=rep)
    print(v)
    
    
    def func(tag):
    return tag.has_attr('class') and tag.has_attr('id')    
    v = soup.find_all(name=func)
    print(v)
    
     ## get
    tag = soup.find('a')
    v = tag.get('id')
    print(v)
    
    
    tag = soup.find('body')
    v = tag.index(tag.find('div'))
    print(v)
    
    tag = soup.find('body')
    for i,v in enumerate(tag):
    print(i,v)
    
    
    
    soup.select("title")
    
    soup.select("p nth-of-type(3)")
    
    soup.select("body a") 
    
    soup.select("html head title")
    
    tag = soup.select("span,a")
    
    soup.select("head > title")
    
    soup.select("p > a")
    
    soup.select("p > a:nth-of-type(2)")
    
    soup.select("p > #link1")
    
    soup.select("body > a")
    
    soup.select("#link1 ~ .sister")  # #号代表id=link1
    
    soup.select("#link1 + .sister")
    
    soup.select(".sister")
    
    soup.select("[class~=sister]")
    
    soup.select("#link1")
    
    soup.select("a#link2") 
    
    soup.select('a[href]')
    
    soup.select('a[href="http://example.com/elsie"]')
    
    soup.select('a[href^="http://example.com/"]')  
    
    soup.select('a[href$="tillie"]')  
    
    soup.select('a[href*=".com/el"]')

猜你喜欢

转载自www.cnblogs.com/andydong/p/9281578.html
今日推荐