How to find total count of Words, total count of Vowels, total count of Special Character in a text file using java 8

Nikhi K. Bansal :

I have a text file and i want to check
- total words count in file
- total vowels count in file
- total special character in file

By using Java 8 Streams.

i want output as a Map in a single iteration if possible i.e

{"totalWordCount":10,"totalVowelCount":10,"totalSpecialCharacter":10}

i tried below code

    Long wordCount=Files.lines(child).parallel().flatMap(line -> Arrays.stream(line.trim().split(" ")))
                            .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
                            .filter(word -> !word.isEmpty())
                            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).values().stream().reduce(0L, Long::sum)

but it is giving me only total word count, i am thinking if its possible to return a single map which contain output as above with all count.

Holger :

If we only had to count special characters and vowels, we could use something like this:

Map<String,Long> result;
try(Stream<String> lines = Files.lines(path)) {
    result = lines
        .flatMap(Pattern.compile("\\s+")::splitAsStream)
        .flatMapToInt(String::chars)
        .filter(c -> !Character.isAlphabetic(c) || "aeiou".indexOf(c) >= 0)
        .mapToObj(c -> "aeiou".indexOf(c)>=0? "totalVowelCount": "totalSpecialCharacter")
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

First we flatten the stream of lines to a stream of words, then to a stream of characters, to group them by their type. This works smoothly as “special character” and “vowel” are mutual exclusive. In principle, the flattening to words could have been omitted if we just extend the filter to skip white-space characters, but here, it helps getting to a solution counting words.

Since words are a different kind of entity than characters, counting them in the same operation is not that straight-forward. One solution is to inject a pseudo character for each word and count it just like other characters at the end. Since all actual characters are positive, we can use -1 for that:

Map<String,Long> result;
try(Stream<String> lines = Files.lines(path)) {
    result = lines.flatMap(Pattern.compile("\\s+")::splitAsStream)
        .flatMapToInt(w -> IntStream.concat(IntStream.of(-1), w.chars()))
        .mapToObj(c -> c==-1? "totalWordCount": "aeiou".indexOf(c)>=0? "totalVowelCount":
                Character.isAlphabetic(c)? "totalAlphabetic": "totalSpecialCharacter")
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

This adds a "totalAlphabetic" category in addition to the others into the result map. If you do not want that, you can insert a .filter(cat -> !cat.equals("totalAlphabetic")) step between the mapToObj and collect steps. Or use a filter like in the first solution before the mapToObj step.

As an additional note, this solution does more work than necessary, because it splits the input into lines, which is not necessary as we can treat line breaks just like other white-space, i.e. as a word boundary. Starting with Java 9, we can use Scanner for the job:

Map<String,Long> result;
try(Scanner scanner = new Scanner(path)) {
    result = scanner.findAll("\\S+")
        .flatMapToInt(w -> IntStream.concat(IntStream.of(-1), w.group().chars()))
        .mapToObj(c -> c==-1? "totalWordCount": "aeiou".indexOf(c)>=0? "totalVowelCount":
                Character.isAlphabetic(c)? "totalAlphabetic": "totalSpecialCharacter")
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

This will split the input into words in the first place without treating line breaks specially. This answer contains a Java 8 compatible implementation of Scanner.findAll.

The solutions above consider every character which is neither white-space nor alphabetic as “special character”. If your definition of “special character” is different, it should not be too hard to adapt the solutions.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=150592&siteId=1