python3 开发面试题（collections中的Counter）6.7

'''
编写Python脚本，分析xx.log文件，按域名统计访问次数

xx.log文件内容如下：
https://www.sogo.com/ale.html
https://www.qq.com/3asd.html
https://www.sogo.com/teoans.html
https://www.bilibili.com/2
https://www.sogo.com/asd_sa.html
https://y.qq.com/
https://www.bilibili.com/1
https://dig.chouti.com/
https://www.bilibili.com/imd.html
https://www.bilibili.com/

输出：
4 www.bilibili.com
3 www.sogo.com
1 www.qq.com
1 y.qq.com
1 dig.chouti.com

'''

首先我们拿到题目进行需求分析：

1、先获取数据就是域名

获取数据我们可以用正则，或者域名还是有相同点可以用split切分

2、统计域名访问的次数

可以用Python的内置模块来统计，

3、然后就是输出要求的格式

sorted内置函数用来排序

然后开始最轻松的活，开始码字：

#第一种方式
import re
from collections import Counter
with open("xx.log","r",encoding="utf-8") as f:
    data=f.read()
    res=re.findall(r"https://(.*?)/.*?",data)
    dic=Counter(res)
      
ret=sorted(dic.items(),key=lambda x:x[1],reverse=True)

for k,v in ret:
    print(v,k)

#第二种方式
dic={}
with open("xx.log","r",encoding="utf-8") as f:
    for line in f:
        line=line.split("/")[2]
        if line not in dic:
            data[line]=1
        else:
            data[line]+=1
ret=sorted(data.items(),key=lambda x:x[1],reverse=True)
for k,v in ret:
    print( v,k)

这道题目考了这些知识点，re模块，匿名函数，内置函数sorted，collections中的Counter

这些在基础篇都找得到相应的博客，

我们就来说说collections中的Counter

我们直接打开源码

Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型，以字典的键值对形式存储，其中元素作为key，其计数作为value。计数值可以是任意的Interger（包括0和负数）

再看源码中的使用方法：

>>> c = Counter('abcdeabcdabcaba') # count elements from a string 生成计数对象

>>> c.most_common(3) # three most common elements 这里的3是找3个最常见的元素
[('a', 5), ('b', 4), ('c', 3)]

>>> c.most_common(4) 这里的4是找4个最常见的元素
[('a', 5), ('b', 4), ('c', 3), ('d', 2)]

>>> sorted(c) # list all unique elements 列出所有独特的元素
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements())) # list elements with repetitions
'aaaaabbbbcccdde'

这里的elements 不知道是什么？那就继续看源码：

def elements(self):
　　'''Iterator over elements repeating each as many times as its count.

迭代器遍历元素，每次重复的次数与计数相同

>>> sum(c.values()) # total of all counts 计数的总和
15

>>> c['a'] # count of letter 'a' 字母“a”的数

>>> for elem in 'shazam': # update counts from an iterable 更新可迭代计数在新的可迭代对象
... c[elem] += 1 # by adding 1 to each element's count 在每个元素的计数中增加1
>>> c['a'] # now there are seven 'a' 查看‘a’的计数，加上上面刚统计的2个，总共7个“a”
7
>>> del c['b'] # remove all 'b' 删除所有‘b’的计数
>>> c['b'] # now there are zero 'b'
0

>>> d = Counter('simsalabim') # make another counter
>>> c.update(d) # add in the second counter 在第二个计数器中添加

>>> c['a'] # now there are nine 'a'
9

>>> c.clear() # empty the counter qingg
>>> c
Counter()

Note: If a count is set to zero or reduced to zero, it will remain
in the counter until the entry is deleted or the counter is cleared:

如果计数被设置为零或减少到零，它将保持不变

在计数器中，直到条目被删除或计数器被清除:

>>> c = Counter('aaabbc')
>>> c['b'] -= 2 # reduce the count of 'b' by two
>>> c.most_common() # 'b' is still in, but its count is zero
[('a', 3), ('c', 1), ('b', 0)]

大约就这几个用法：大家拓展可以自己翻看源码

python3 开发面试题（collections中的Counter）6.7

猜你喜欢