python简单易懂英文词频词汇统计

1.需求与分析:

特殊符号的处理:通过空格把特殊符号替换

2.源码实现:

def getText():
    txt = open(r"../lib/words_count.txt", "r", encoding="utf-8").read()
    txt = txt.lower()
    for ch in '''~!`@#$%^&*()_+}{[]"';:-=*<>,.:|\ ''':    # 清除特殊字符
        txt = txt.replace(ch, " ")
    return txt

def main():
    hamletTxt = getText()
    words = hamletTxt.split()   # 切割词汇
    counts = {}
    for word in words:
        counts[word] = counts.get(word, 0) + 1
    item = list(counts.items())
    item.sort(key=lambda x:x[1], reverse=True)
    for i in range(10):         # 输出前十的词汇
        word, conut = item[i]
        print('{0:<10}{1:>5}'.format(word, conut))


if __name__ == "__main__":
    main()
3.说明:

转载请说明出处!!!

发布了37 篇原创文章 · 获赞 91 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/weixin_43386443/article/details/105362255