Saturday, November 28, 2015

Two kinds of equality: Thinking about Java's equals method

My planning document for this coming Monday's CS222 class read something like, "Do something with equals/hashcode/toString." I took some time this morning to flesh that out into a real lesson plan, and I came across some interesting tidbits along the way. This started as a Facebook post, but I realized I wanted something more archival, so I've upgraded to the blog.

StackOverflow led me to EqualsVerifier, a recent open-source project to automate testing of the contract between equals and hashcode in Java. It looks interesting, although I have not plumbed into its documented possible false-positives. The same post led me to EqualsTester, which has been part of Guava for quite some time, although I never came across it before. I use Guava in practically every Java project, and so it looks like EqualsTester is going to have to go into my back of tricks. Writing tests for equals is kind of a drag and—in retrospect—clearly automatable, although I never really thought about automating it before, so I never looked for library support.

The most fascinating thing I came across, however, is this 2009 article by Odersky et al., which I understand to be an updated extract from Programming in Scala. The article describes four common pitfalls when writing equality-testing methods in Java, and as they point out, three of the four are covered in Bloch's classic Effective Java. That fourth one, though, hit me hard, forcing me to recognize that I had been conflating two distinct concepts in my Java programs. The first I will call content equality, which is when two objects should be considered equal because they represent the same concept in the problem domain. The second I will awkwardly call JRE equality, because it's the kind of equality that the equals method contract really specifies.

It's easy to illustrate this with an example. Ignoring all other design considerations for a moment, we can create a class like this:

public class Achievement {

    private String name;
    
    public Achievement(String name) {
        this.name=name;
    }
    
    public void setName(String name) {
        this.name=name;
    }
    
}


It might be reasonable then to expect a test like this to pass:
@Test
    public void testEquals() {
        Achievement a1 = new Achievement("Blogging");
        Achievement a2 = new Achievement("Blogging");
        assertTrue(a1.equals(a2));
    }

If you've been doing Java for a while, you know this will fail with the default equals() implementation. Let's take the same approach that most of my students tend toward: let Eclipse generate the equals and hashcode methods for us!
public class Achievement {

    private String name;
    
    public Achievement(String name) {
        this.name=name;
    }
    
    public void setName(String name) {
        this.name=name;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Achievement other = (Achievement) obj;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }
    
}

OK, now the test passes, so we are happy, right? Well, maybe not. What if we add this test?
@Test
    public void testCollectionsIntegration() {
        Set<Achievement> set = new HashSet<>();
        Achievement a = new Achievement("Blogging");
        set.add(a);
        a.setName("Writing");
        assertTrue(set.contains(a));
    }

This test fails, because we've hit what Odersky et al. list as Common Equality Pitfall #3: Define equals in terms of mutable fields. By changing the name of the achievement object, we alter its hash code, which means it "disappeared" from the set.

At the surface, it looks like there's no way out: you cannot make both tests pass. This makes it a beautiful error! The problem was never in the code, it's in how we think about the problem. The word "equals" is overloaded, as any student of programming languages knows, but that doesn't mean we can walk away from our fuzzy English understanding of the concept. The problem is that the seemingly innocuous unit test I introduced first assumes that Java's equals method should represent content equality, but that's not true. That is, the method is not about our tacit understanding of "equality": it's about making a complex runtime environment work in predictable ways.

It is fascinating that modern IDEs like Eclipse make it so incredibly easy to write these methods incorrectly. Indeed, the format of the equals method provided by Eclipse looks a lot like the template the Bloch himself provides in Effective Java. 

The solution suggested by Odersky et al. is simple and elegant: because there are distinct concepts for equals, have two implementations. The equals method inherited from Object will continue to do what it needs to do, and we introduce a new method to represent content equality. Following their example, we might introduce a method called equalContents, which we could test using code almost identical to our earlier misconceived test:

    @Test
    public void testEqualContent() {
        Achievement a1 = new Achievement("Blogging");
        Achievement a2 = new Achievement("Blogging");
        assertTrue(a1.equalsContent(a2));
    }

This leads to a simple implementation of our domain class:

public class Achievement {

    private String name;
    
    public Achievement(String name) {
        this.name=name;
    }
    
    public void setName(String name) {
        this.name=name;
    }
    
    public boolean equalsContent(Achievement other) {
        if (other==null) return false;
        if (other==this) return true;
        return this.name.equals(other.name);
    }
    
}

Nice and clean, without all that crufty tool-generated code to boot, and both testEqualContent and testCollectionsIntegration pass.

I know I cannot bring a cohort of sophomores with me on this adventure on Monday, so they will only be dipping their toes into it.

6 comments:

  1. As an interesting aside on hashCode and equality is the Enum conundrum on Android versus the actual JDK specification for Enum->hashCode. The Android SDK version has been broken since day one because of assuming a really bad hashCode implementation for Enum from Apache Harmony.

    The JDK specification for Enum is to return the Object identity hashCode as seen here:
    http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/Enum.java?av=f#137

    On Android however even with the latest 6.0 SDK source code and all other versions here is the Enum hashCode implementation:

    public final int hashCode() {
    return ordinal + (name == null ? 0 : name.hashCode());
    }

    This Android version is tragically flawed and causes massive problems when using Enums for keys in most Map implementations (HashMap, etc.). There are potential performance issues by invoking the name.hashCode() method for the first time (not so bad). The tragic defect though is that two separate Enums on Android that happen to have the same name at the same ordinal spot will have the same hashCode value causing clashing in most Map implementations.

    One must use an IdentityHashMap to force the object identity hashCode to be used for Enums on Android to avoid clashes. In the entry object (IdentityHashMapEntry) hashCode is handled as the following:

    public int hashCode() {
    return System.identityHashCode(key)
    ^ System.identityHashCode(value);
    }

    It's shocking that this has not been caught for like 8+ years of Android.

    I ran into this problem with my component architecture implementation in TyphonRT around '12 when I added the ability to store multiple components of the same type by a specialized enum. On the desktop things chugged along fine, but when loading a large runtime on Android things failed and components where overwritten. I had to switch to IdentityHashMap for storing components.

    ReplyDelete
    Replies
    1. Wow, I was not familiar with that story! What a weird bug to let linger, when the solution seems so simple. I wonder if there is some other, more difficult piece of the system that relies upon that non-standard approach?

      Thanks for sharing!

      Delete
    2. Sadly I think Hanlon's Razor may apply here.. ;P

      Here is the last commit to Apache Harmony in '11 long after the code was incorporated into Android:
      https://github.com/apache/harmony/blob/java6/classlib/modules/luni/src/main/java/java/lang/Enum.java#L102

      My take is that because the Android / Google team always has villainized enums which continues today (see the fireside chat video from the Android Dev Summit for instance) and the fact they never incorporated or use enums internally to the Android SDK for the most part that they never ran into the issue. My guess is that no CTS tests were ever written for enums due to an avoidance by the Android team.

      My take is that enums are perfectly usable on Android and all the talk from the Android team is fluff and at worst misguided. It shows how biases in usage affects testing.

      This is akin to the horrible performance of annotations on Android because the Android team eschewed using them. IE Class.getAnnotations() has / had horrible performance on Android due to no caching (which the JDK always did, but Apache Harmony didn't). I haven't actually verified if this has been fixed on 5.0 / 6.0; though it affected all versions of Android prior. In that case with TyphonRT under the covers of my API I do annotation caching and any usage in my platform is accelerated on all versions of Android.

      Basically the Android team is lazy when it comes to identifying errors in the SDK where usage of Java does not match their development practices. I doubt there are any instances where under the hood Android itself relies on this behavior.


      Delete
  2. Mutability seems to be the source of all our troubles! But I don't see how equalsContent fixes the problem with HashSet.contains and the slippery hash code. The achievement's entry in the set is still keyed on the original hash code, but the contains will look using the modified one.

    ReplyDelete
    Replies
    1. Agreed on mutability. That's precisely what I was thinking of when I hedged with "ignoring other design considerations." I really just wanted to buck the system and use some example besides students, employees, or banks.

      Notice that the final Achievement class is simply inheriting the default hashcode implementation. As I understand it, the Java API does not require any particular implementation, and so the behavior will be JVM-specific. Using Oracle Java 8, the set containment test passes, although it is troublesome that this is not necessarily portable.

      Delete
    2. Ah, I didn't grasp that hashCode was removed. This makes sense. However, Oracle's own docs say this of the Object.hashCode that Achievement now inherits: "This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language."

      I'm not sure the Set containment test would pass if a different but equivalent achievement reference was checked.

      Delete