Don Don - 7 months ago 15
Java Question

Stream of Strings isn't sorted?

i would like to find the set of all words in a file.This set should be sorted.
Upper and Lower Case doesn't matter.
Here is my approach:

public static Set<String> setOfWords(String fileName) throws IOException {

Set<String> wordSet;
Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));

wordSet = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.sorted()
.map(String::toLowerCase)
.collect(Collectors.toSet());
stream.close();
return wordSet;
}


Test file:

This is a file with
five lines.It has two sentences,
and the word file is contained
in multiple lines of this file.
This file can be used for testing?

When printing the set, i get the following output:

Set of words:
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word


Can anybody tell me, why the set is not sorted in it's natural order(for Strings lexiographic)?

Thanks in advance

Answer

Since the ordering is case sensitive, you should map to lower case before sorting.

Besides that, you should collect that output into an ordered collection (such as a List or some SortedSet implementation, though if you use a SortedSet there's no need to execute sorted(), since the Set will be sorted anyway) :

List<String> wordSet = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .map(String::toLowerCase)
            .sorted()
            .collect(Collectors.toList());
Comments