【算法】英语短文单词词频统计

目录

题目

示例短文

输出示例

算法分析

源代码


题目

    1. 提供三篇英语短文,分别统计每篇短文中每个单词出现的次数

    2. 每个单词用空格、换行或标点符号隔开,忽视大小写

    3. 打印出现频率最高的5个单词,打印单词和出现的次数

    4. 单词的打印优先次数,再是根据单词字母在字典中的顺序

扫描二维码关注公众号,回复: 15051208 查看本文章

    5. 介词、冠词、连词、副词、代词不统计

示例短文

test1.txt

In the flood of darkness, hope is the light. It brings comfort, faith, and confidence. 
It gives us guidance when we are lost, and gives support when we are afraid.
And the moment we give up hope, we give up our lives. 
The world we live in is disintegrating into a place of malice and hatred, where we need hope and find it harder. 
In this world of fear, hope to find better, but easier said than done, the more meaningful life of faith will make life meaningful.

test2.txt

No one can help others as much as you do. 
No one can express himself like you. 
No one can express what you want to convey. 
No one can comfort others in your own way. 
No one can be as understanding as you are. 
No one can feel happy, carefree, and no one can smile as much as you do. 
In a word, no one can show your features to anyone else.

test3.txt

Keep faith and hope for the future. 
Make your most sincere dreams, and when the opportunities come, they will fight for them. 
It may take a season or more, but the ending will not change. Ambition, best, become a reality. 
An uncertain future, only one step at a time, the hope can realize the dream of the highest. 
We must treasure the dream, to protect it a season, let it in the heart quietly germinal. 
However, we have to gently protect our hearts deep expectations, slowly dream, will achieve new life.

输出示例

test1.txt: 
hope 4
faith 2
find 2
give 2
gives 2

test2.txt: 
can 8
no 8
one 8
as 6
do 2

test3.txt: 
dream 3
future 2
hope 2
protect 2
season 2

算法分析

1. 文件读取,将文件中的内容以字符串形式读取存入text字符串变量中

2. 字符串分割,将text文件字符串内容以 " ,.\n" 进行分割

3. 通过map的特性,将分割的字符串按要求存入map的同时统计次数(map默认根据key排序)

4. 将map的数据存入vector中,通过stable_sort()进行词频排序(稳定排序)

5. 打印词频出现最多的5个单词以及出现次数,已经在vector中排序完成

源代码

#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <cstring>
#include <cstdlib>
#include <algorithm>
#include <vector>
using namespace std;

// 需要删掉的 介词、冠词、连词、副词、代词
vector<string> g_delWord = {
    "to", "in", "on", "for", "of", "from", "between", "behind", "by", "about", "at", "with", "than",
    "a", "an", "the", "this", "that",
    "and", "but", "or", "so", "yet",
    "often", "very", "then", "therefore",
    "i", "you", "we", "he", "she", "my", "your", "hes", "her", "our", "us", "it",
    "am", "is", "are",
    "when", "where", "who", "what",
    "will", "would"
};

struct compare
{
    bool operator()(const pair<int, string>& l, const pair<int, string>& r)
    {
        return l.first > r.first;
    }
};

int main()
{
    for (int i = 1; i <= 3; ++i)
    {
        // 获取文件名
        string fileName = "test";
        fileName += '0' + i;
        fileName += ".txt";

        // 读取文件信息
        fstream file;
        file.open(fileName, ios::in);   // 以只读方式打开文件,ios::out(只写),ios::app(追加)
        char text[4096];
        file.read(text, 4096);
        // cout << fileName << ": " << endl;
        // cout << text << endl << endl;

        // 字符串分割,将分割的结果存入map中
        map<string, int> mWords;
        const char* s = " ,.\n";
        char* p = strtok(text, s);
        while (p)
        {
            string word = static_cast<string>(p);
            string lwrWord;
            transform(word.begin(), word.end(), back_inserter(lwrWord) ,::tolower);     // 字符串大写转小写

            // 排除 介词、连词、副词、代词
            if (find(g_delWord.begin(), g_delWord.end(), lwrWord) == g_delWord.end())
            {
                mWords[lwrWord]++;       // map的 "[]" 的重载,有插入/查询/修改功能,返回值为键值对的second值或false
            }
            p = strtok(NULL, s);
        }

        // 遍历map
        // int cnt = 0;
        // for (const auto& e: mWords)
        // {
        //     cout << "(" << e.first << ", " << e.second << ")    ";
        //     ++cnt;
        //     if (cnt % 5 == 0)
        //     {
        //         cout << endl;
        //     }
        // }
        // cout << endl <<endl;

        // 将map中的数据存入vector中
        vector< pair<int, string> > vWords;     // "> >"之间空格,防止与部分编译的 ">>" 重载冲突
        for (const auto& e: mWords)
        {
            vWords.push_back(make_pair(e.second, e.first));
        }

        // 排序,sort排序存在不稳定缺陷,可以自定义sort排序规则,也可以使用stable_sort
        stable_sort(vWords.begin(), vWords.end(), compare());
        cout << fileName << ": " << endl;
        for (int j = 0; j < 5; ++j)
        {
            cout << vWords[j].second << " " << vWords[j].first << endl;
        }
        cout << endl;
    }

    return 0;
}

猜你喜欢

转载自blog.csdn.net/phoenixFlyzzz/article/details/130475119
今日推荐