完成这个任务,我们需要掌握以下python的知识点:
1.jieba模块的使用;
2.python中对txt文本文件的操作;
3.python四大容器的使用;
4.openpyxl模块的使用。
代码如下:
import jieba
with open('./toefl100.txt','r',encoding='utf-8') as f:
contents = f.read()
content_list = list(jieba.cut(contents))
word_list =[]
for content in content_list:
if content.isalpha() and len(content)>4:
word_list.append(content.lower())
word_set = set(word_list)
word_dict = {}
max_lengh = 0
for word in word_set:
count = 0
if len(word)>max_lengh:
max_lengh = len(word)
for element in word_list:
if word == element:
count += 1
word_dict[word] = count
print(word_dict)
from openpyxl import Workbook
workbook = Workbook()
ws=workbook.create_sheet()
worksheet= workbook['Sheet']
del workbook['Sheet1']
row_num = 1
for key in sorted(word_dict,key=word_dict.__getitem__,reverse=True):
worksheet['A'+str(row_num)] = key
worksheet['B'+str(row_num)]= word_dict[key]
row_num += 1
workbook.save(filename='Peterwords.xlsx')
运行代码之后,我们可以得到一个“Peterwords.xlsx”的excel表格文件,我们打开之后,就可以看到统计好了的托福词频表了,效果如下: