实现一个简单的代码字计数器(三)

上一篇文章里面我们已经实现了一个简单的计数单词的代码程序,实现的结果就是以代码中的空格作为分割符号,统计各个单词的出现频数。但是如果是这样的情况:一个单词的组成是几个单词的连在一起,除了第一个大写字母外,所有单词都以大写字母开头。我们希望能将这样的单词也给分解了,因为其中可能包含了我们需要的关键词。举个例子:代码中出现的大量的countVec,coutLink,countInt,countDouble,如果我们常规操作的话这些单词的个数都是1,而且对我们理解这一段代码毫无用处,但是如果分解后变成了4个count,我们就有理由相信这段代码可能和计数功能有关。

我们原先是利用

auto symbols = std::vector<std::string>{};
boost::split(symbols, code, isDelimiter);
symbols.erase(std::remove(begin(symbols), end(symbols), ""), end(symbols));

即利用空格进行划分,现在我们需要根据下面两个要求来更改相关的程序:1.我们需要知道确定一个单词的范围,找到其中的大写字母进行分割;同时单词与单词之间的空格也要分割。2.循环找到下一个单词

确定单词范围

我们可以利用两个迭代器:beginWord指向单词的第一个字母,endWord指向单词结尾的字母,这里的单词是指有大写字母或者空格分割的:

auto const beginWord = std::find_if_not(begin(code), end(code), isDelimiter);
auto const endWord = std::find_if(std::next(beginWord), end(code), [](char c){ return isDelimiter(c) || isupper(c); });

确定了范围就将分割的单词放进words暂存起来:words.emplace_back(beginWord,endWord)

循环找单词

auto beginWord = std::find_if_not(begin(code), end(code), isDelimiter);
while (beginWord != end(code))
{
    auto endWord = std::find_if(std::next(beginWord), end(code), [](char c){ return isDelimiter(c) || isupper(c); });
    words.emplace_back(beginWord, endWord);
    beginWord = std::find_if_not(endWord, end(code), isDelimiter);
}

下面放上整个代码:

#include<iostream>
#include<iomanip>
#include<string>
#include<map>
#include<vector>
#include<iterator>
#include<boost/algorithm/string.hpp>

using WordCount = std::vector<std::pair<std::string, size_t>>;
WordCount getWordCount(std::string const& code);

bool isDelimiter(char c)
{
    auto const isAllowedInName = isalnum(c) || c == '_';
    return !isAllowedInName;
}

std::map<std::string, size_t> countWords(std::vector<std::string> const& words)
{
    auto wordCount = std::map<std::string, size_t>{};
    for (auto const& word : words)
    {
        ++wordCount[word];
    }
    return wordCount;
}


std::vector<std::string> getCaseWordsFromCode(std::string const& code)
{
    auto words = std::vector<std::string>{};
    auto beginWord = std::find_if_not(begin(code), end(code), isDelimiter);
    while (beginWord != end(code))
    {
        auto endWord = std::find_if(std::next(beginWord), end(code), [](char c) { return isDelimiter(c) || isupper(c); });
        words.emplace_back(beginWord, endWord);
        beginWord = std::find_if_not(endWord, end(code), isDelimiter);
    }
    return words;
}


WordCount getWordCount(std::string const& code)
{
    /*auto symbols = std::vector<std::string>{};
    boost::split(symbols, code, isDelimiter);
    symbols.erase(std::remove(begin(symbols), end(symbols), ""), end(symbols));*/

    auto const symbols = getCaseWordsFromCode(code);

    auto const wordCount = countWords(symbols);

    auto sortedWordCount = WordCount(begin(wordCount), end(wordCount));  //类型转换
    std::sort(begin(sortedWordCount), end(sortedWordCount), [](auto const& p1, auto const& p2) { return p1.second > p2.second; });

    return sortedWordCount;
}

//void print(WordCount const& entries)
//{
//  for (auto const& entry : entries)
//  {
//      std::cout << std::setw(30) << std::left << entry.first << '|' << std::setw(10) << std::right << entry.second << '\n';
//  }
//}

void print(WordCount const& entries)
{
    if (entries.empty()) return;
    auto const longestWord = *std::max_element(begin(entries), end(entries), [](auto const& p1, auto const& p2) { return p1.first.size() < p2.first.size(); });
    auto const longestWordSize = longestWord.first.size();
    for (auto const& entry : entries)
    {
        std::cout << std::setw(longestWordSize + 1) << std::left << entry.first << '|' << std::setw(10) << std::right << entry.second << '\n';
    }
}

static constexpr auto code = R"(
bool isDelimiter(char c)
{
auto const isAllowedInName = isalnum(c) || c == '_';
return !isAllowedInName;
}
std::map<std::string, size_t> countWords(std::vector<std::string> const& words)
{
auto wordCount = std::map<std::string, size_t>{};
for (auto const& word : words)
{
++wordCount[word];
}
return wordCount;
}
WordCount getWordCount(std::string const& code)
{
auto symbols = std::vector<std::string>{};
boost::split(symbols, code, isDelimiter);
symbols.erase(std::remove(begin(symbols), end(symbols), ""), end(symbols));
auto const wordCount = countWords(symbols);
auto sortedWordCount = WordCount(begin(wordCount), end(wordCount));
std::sort(begin(sortedWordCount), end(sortedWordCount), [](auto const& p1, auto const& p2){ return p1.second > p2.second; });
return sortedWordCount;
}
})";

int main()
{
    print(getWordCount(code));
    system("pause");
}

由下图结果可以看出Count这个函数出现次数最多,所以我们的程序应该是个计数程序,这也与我们的出发点是一致的。

image.png

猜你喜欢

转载自www.cnblogs.com/yunlambert/p/10173987.html