Java TreeSet with length Comparator bug?

Georgios F. :

I have the below code which creates a TreeSet using a comparator based on string length.

public class TreeSetComparator {
    public static void main(String[] args) {
        SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
        sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
        System.out.println(sortedSet);
    }
}

To my surprise the output of the above is

[aa]

While I would expect

[aa, bb]

or

[bb, aa]

The "bb" part disappears, which seems to be contrary to the SortedSet contract. The comparator is supposed to only sort the elements and not determine their uniqueness, which is normally determined by equals.

On the other hand, if I enhance the comparator to always return non-zero for unequal items like below, only then do I get the correct results.

    SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length).reversed().thenComparing(String::toString));
    sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
    System.out.println(sortedSet);

The output now is [aa, bb] as I would expect.

Is the above a bug in the TreeSet implementation?

My environment is as follows:

mvn --version                                                                                                                                            21:40:22
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T19:33:14+01:00)
Maven home: /home/aaaa/.sdkman/candidates/maven/current
Java version: 10.0.2, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-10-jdk
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.14.60-1-manjaro", arch: "amd64", family: "unix"

UPDATE

Here is a related post along with suggestions on how to fix the issue in a future version of Java: https://yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method.html

Marvin :

This it not a bug. At least not in TreeSet.

From the javadoc, emphasis by me:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

So because "aa" and "bb" both have a length of 2 they are deemed equal by compareTo and thus by the TreeSet.

By definition, consistent with equals means:

The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=71922&siteId=1