Paul Gestwicki's Blog: Two kinds of equality: Thinking about Java's equals method

Saturday, November 28, 2015

Two kinds of equality: Thinking about Java's equals method

My planning document for this coming Monday's CS222 class read something like, "Do something with equals/hashcode/toString." I took some time this morning to flesh that out into a real lesson plan, and I came across some interesting tidbits along the way. This started as a Facebook post, but I realized I wanted something more archival, so I've upgraded to the blog.

StackOverflow led me to EqualsVerifier, a recent open-source project to automate testing of the contract between equals and hashcode in Java. It looks interesting, although I have not plumbed into its documented possible false-positives. The same post led me to EqualsTester, which has been part of Guava for quite some time, although I never came across it before. I use Guava in practically every Java project, and so it looks like EqualsTester is going to have to go into my back of tricks. Writing tests for equals is kind of a drag and—in retrospect—clearly automatable, although I never really thought about automating it before, so I never looked for library support.

The most fascinating thing I came across, however, is this 2009 article by Odersky et al., which I understand to be an updated extract from Programming in Scala. The article describes four common pitfalls when writing equality-testing methods in Java, and as they point out, three of the four are covered in Bloch's classic Effective Java. That fourth one, though, hit me hard, forcing me to recognize that I had been conflating two distinct concepts in my Java programs. The first I will call content equality, which is when two objects should be considered equal because they represent the same concept in the problem domain. The second I will awkwardly call JRE equality, because it's the kind of equality that the equals method contract really specifies.

It's easy to illustrate this with an example. Ignoring all other design considerations for a moment, we can create a class like this:

public class Achievement {

    private String name;
    
    public Achievement(String name) {
        this.name=name;
    }
    
    public void setName(String name) {
        this.name=name;
    }
    
}

It might be reasonable then to expect a test like this to pass:

@Test
    public void testEquals() {
        Achievement a1 = new Achievement("Blogging");
        Achievement a2 = new Achievement("Blogging");
        assertTrue(a1.equals(a2));
    }

If you've been doing Java for a while, you know this will fail with the default equals() implementation. Let's take the same approach that most of my students tend toward: let Eclipse generate the equals and hashcode methods for us!

public class Achievement {

    private String name;
    
    public Achievement(String name) {
        this.name=name;
    }
    
    public void setName(String name) {
        this.name=name;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Achievement other = (Achievement) obj;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }
    
}

OK, now the test passes, so we are happy, right? Well, maybe not. What if we add this test?

@Test
    public void testCollectionsIntegration() {
        Set<Achievement> set = new HashSet<>();
        Achievement a = new Achievement("Blogging");
        set.add(a);
        a.setName("Writing");
        assertTrue(set.contains(a));
    }

This test fails, because we've hit what Odersky et al. list as Common Equality Pitfall #3: Define equals in terms of mutable fields. By changing the name of the achievement object, we alter its hash code, which means it "disappeared" from the set.

At the surface, it looks like there's no way out: you cannot make both tests pass. This makes it a beautiful error! The problem was never in the code, it's in how we think about the problem. The word "equals" is overloaded, as any student of programming languages knows, but that doesn't mean we can walk away from our fuzzy English understanding of the concept. The problem is that the seemingly innocuous unit test I introduced first assumes that Java's equals method should represent content equality, but that's not true. That is, the method is not about our tacit understanding of "equality": it's about making a complex runtime environment work in predictable ways.

It is fascinating that modern IDEs like Eclipse make it so incredibly easy to write these methods incorrectly. Indeed, the format of the equals method provided by Eclipse looks a lot like the template the Bloch himself provides in Effective Java.

The solution suggested by Odersky et al. is simple and elegant: because there are distinct concepts for equals, have two implementations. The equals method inherited from Object will continue to do what it needs to do, and we introduce a new method to represent content equality. Following their example, we might introduce a method called equalContents, which we could test using code almost identical to our earlier misconceived test:

    @Test
    public void testEqualContent() {
        Achievement a1 = new Achievement("Blogging");
        Achievement a2 = new Achievement("Blogging");
        assertTrue(a1.equalsContent(a2));
    }

This leads to a simple implementation of our domain class:

public class Achievement {

    private String name;
    
    public Achievement(String name) {
        this.name=name;
    }
    
    public void setName(String name) {
        this.name=name;
    }
    
    public boolean equalsContent(Achievement other) {
        if (other==null) return false;
        if (other==this) return true;
        return this.name.equals(other.name);
    }
    
}

Nice and clean, without all that crufty tool-generated code to boot, and both testEqualContent and testCollectionsIntegration pass.

I know I cannot bring a cohort of sophomores with me on this adventure on Monday, so they will only be dipping their toes into it.

6 comments:

MichaelEGRNovember 28, 2015 at 2:39 PM
As an interesting aside on hashCode and equality is the Enum conundrum on Android versus the actual JDK specification for Enum->hashCode. The Android SDK version has been broken since day one because of assuming a really bad hashCode implementation for Enum from Apache Harmony.

The JDK specification for Enum is to return the Object identity hashCode as seen here:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/Enum.java?av=f#137

On Android however even with the latest 6.0 SDK source code and all other versions here is the Enum hashCode implementation:

public final int hashCode() {
return ordinal + (name == null ? 0 : name.hashCode());
}

This Android version is tragically flawed and causes massive problems when using Enums for keys in most Map implementations (HashMap, etc.). There are potential performance issues by invoking the name.hashCode() method for the first time (not so bad). The tragic defect though is that two separate Enums on Android that happen to have the same name at the same ordinal spot will have the same hashCode value causing clashing in most Map implementations.

One must use an IdentityHashMap to force the object identity hashCode to be used for Enums on Android to avoid clashes. In the entry object (IdentityHashMapEntry) hashCode is handled as the following:

public int hashCode() {
return System.identityHashCode(key)
^ System.identityHashCode(value);
}

It's shocking that this has not been caught for like 8+ years of Android.

I ran into this problem with my component architecture implementation in TyphonRT around '12 when I added the ability to store multiple components of the same type by a specialized enum. On the desktop things chugged along fine, but when loading a large runtime on Android things failed and components where overwritten. I had to switch to IdentityHashMap for storing components.
ReplyDelete
Replies
Chris JohnsonNovember 29, 2015 at 7:27 AM
Mutability seems to be the source of all our troubles! But I don't see how equalsContent fixes the problem with HashSet.contains and the slippery hash code. The achievement's entry in the set is still keyed on the original hash code, but the contains will look using the modified one.
ReplyDelete
Replies

Add comment