Tuesday, February 16, 2016

Equality vs Identity?

When storing objects in a Set, it is important that the same object can never be added twice. That is the core definition of a Set. In java, two methods are used to determine whether two referenced objects are the same or if they can both exist in the same Set; equals() and hashCode(). In this article I will explain the difference between equality and identity and also take up some of the advantages they have over each other.

Java offers a standard implementation of both these methods. The standard equals()-method is defined as an "identity" comparing method. It means that it compares the two memory references to determine if they are the same. Two identical objects that are stored in different locations in the memory will therefore be deemed unequal. This comparison is done using the ==-operator, as can be seen if you look at the source code of the Object-class.


public boolean equals(Object obj) {
    return (this == obj);
}

The hashCode()-method is implemented by the virtual machine as a native operation so it is not visible in the code, but it is often realized as simply returning the memory reference (on 32-bit architectures) or a modulo 32 representation of the memory reference (on a 64-bit architecture).
One thing many programmers choose to do when designing classes is to override this method with a different equality definition where instead of comparing the memory reference, you look at the values of the two instances to see if they can be considered equal. Here is an example of that:


import java.util.Objects;
import static java.util.Objects.requireNonNull;

public final class Person {

    private final String firstname;
    private final String lastname;

    public Person(String firstname, String lastname) {
        this.firstname = requireNonNull(firstname);
        this.lastname  = requireNonNull(lastname);
    }

    @Override
    public int hashCode() {
        int hash = 7;
        hash = 83 * hash + Objects.hashCode(this.firstname);
        hash = 83 * hash + Objects.hashCode(this.lastname);
        return hash;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null) return false;
        if (getClass() != obj.getClass()) return false;
        final Person other = (Person) obj;
        if (!Objects.equals(this.firstname, other.firstname)) {
            return false;
        } else return Objects.equals(this.lastname, other.lastname);
    }
}

This comparison is called "equality" (compared to the previous "identity"). As long as two persons have the same first- and lastname, they will be considered equal. This can for an example be used to sort out duplicates from a stream of input. Remember that if you override the equals()-method, you should always override the hashCode()-method as well!

Equality

Now, if you choose equality over identity, there are some things you will need to think about. The first thing you must ask yourself is: are two instances of this class with the same properties necessarily the same? In the case of Person above, I would say no. It is very likely that you will someday have two people in your system with the same first- and lastname. Even if you continue to add more personal information like birthday or favorite color, you will sooner or later have a collision. On the other hand, if your system are handling cars and each car contains a reference to a "model", it can be safely assumed that if two cars both are black Tesla model S, they are probably the same model even if the objects are stored in different places in the memory. That is an example of a case when equality can be good.


import java.util.Objects;
import static java.util.Objects.requireNonNull;

public final class Car {
    
    public static final class Model {
        
        private final String name;
        private final String version;
        
        public Model(String name, String version) {
            this.name    = requireNonNull(name);
            this.version = requireNonNull(version);
        }

        @Override
        public int hashCode() {
            int hash = 5;
            hash = 23 * hash + Objects.hashCode(this.name);
            hash = 23 * hash + Objects.hashCode(this.version);
            return hash;
        }

        @Override
        public boolean equals(Object obj) {
            if (this == obj) return true;
            if (obj == null) return false;
            if (getClass() != obj.getClass()) return false;
            final Model other = (Model) obj;
            if (!Objects.equals(this.name, other.name)) {
                return false;
            } else return Objects.equals(this.version, other.version);
        }
    }
    
    private final String color;
    private final Model model;
    
    public Car(String color, Model model) {
        this.color = requireNonNull(color);
        this.model = requireNonNull(model);
    }
    
    public Model getModel() {
        return model;
    }
}

Two cars are only considered the same if they have the same memory address. Their models on the other hand is considered the same as long as they have the same name and version. Here is an example of this:


final Car a = new Car("black", new Car.Model("Tesla", "Model S"));
final Car b = new Car("black", new Car.Model("Tesla", "Model S"));

System.out.println("Is a and b the same car? " + a.equals(b));
System.out.println("Is a and b the same model? " + a.getModel().equals(b.getModel()));

// Prints the following:
// Is a and b the same car? false
// Is a and b the same model? true

Identity

One risk of choosing equality over identity is that it can be an invitation to allocating more objects than necessarily on the heap. Just look at the car example above. For every car we create we also allocate space in memory for a model. Even if java generally optimizes string allocation to prevent duplicates, it is still a certain waste for objects that will always be the same. A short trick to turn the inner object into something that can be compared using identity comparing method and at the same time avoid unnecessary object allocation is to replace it with an enum:


public final class Car {
    
    public enum Model {
        
        TESLA_MODEL_S ("Tesla", "Model S"),
        VOLVO_V70     ("Volvo", "V70");
        
        private final String name;
        private final String version;
        
        Model(String name, String version) {
            this.name    = name;
            this.version = version;
        }
    }
    
    private final String color;
    private final Model model;
    
    public Car(String color, Model model) {
        this.color = requireNonNull(color);
        this.model = requireNonNull(model);
    }
    
    public Model getModel() {
        return model;
    }
}

Now we can be sure that each model will only ever exist at one place in memory and can therefore safely be compared using identity comparison. An issue with this however is that is really limits our extendability. Before with could define new models on the fly without modifying the source code in the Car.java-file, but now we have locked ourselves into an enum that should generally be kept unmodified. If those properties are desired, an equals comparison is probably better for you.

A finishing note, if you have overridden the equals() and hashCode()-methods of a class and later want to store it in a Map based on identity, you can always use the IdentityHashMap structure. It will use the memory address to reference its keys, even if the equals()- and hashCode()-methods have been overridden.

Wednesday, February 10, 2016

Cleaner Responsibilities - Get rid of equals, compareTo and toString

Have you ever looked at the javadoc of the Object-class in Java? Probably. You tend to end up there every now and then when digging your way down the inheritance tree. One thing you might have noticed is that it has quite a few methods that every class must inherit. The favorite methods to implement yourself rather than stick with the original ones are probably .toString(), .equals() and .hashCode() (why you should always implement both of the latter is described well by Per-Åke Minborg in this post).

But these methods are apparently not enough. Many people mix in additional interfaces from the standard libraries like Comparable and Serializable. But is that really wise? Why do everyone want to implement these methods on their own so badly? Well, implementing your own .equals() and .hashCode() methods will probably make sense if you are planning on storing them in something like a HashMap and want to control hash collisions, but what about compareTo() and toString()?

In this article I will present an approach to software design that we use on the Speedment open source project where methods that operate on objects are implemented as functional references stored in variables rather than overriding Javas built in methods. There are several advantages to this. Your POJOs will be shorter and more concise, common operations can be reused without inheritance and you can switch between different configurations in a flexible matter.

Original Code

Let us begin by looking at the following example. We have a typical Java class named Person. In our application we want to print out every person from a Set in the order of their firstname followed by lastname (in case two persons share the same firstname).

Person.java
public class Person implements Comparable<Person> {
    
    private final String firstname;
    private final String lastname;
    
    public Person(String firstname, String lastname) {
        this.firstname = firstname;
        this.lastname  = lastname;
    }

    public String getFirstname() {
        return firstname;
    }

    public String getLastname() {
        return lastname;
    }
    
    @Override
    public int hashCode() {
        int hash = 7;
        hash = 83 * hash + Objects.hashCode(this.firstname);
        hash = 83 * hash + Objects.hashCode(this.lastname);
        return hash;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null) return false;
        if (getClass() != obj.getClass()) return false;
        final Person other = (Person) obj;
        if (!Objects.equals(this.firstname, other.firstname)) {
            return false;
        }
        return Objects.equals(this.lastname, other.lastname);
    }

    @Override
    public int compareTo(Person that) {
        if (this == that) return 0;
        else if (that == null) return 1;

        int comparison = this.firstname.compareTo(that.firstname);
        if (comparison != 0) return comparison;

        comparison = this.lastname.compareTo(that.lastname);
        return comparison;
    }

    @Override
    public String toString() {
        return firstname + " " + lastname;
    }
}
Main.java
public class Main {
    public static void main(String... args) {
        final Set people = new HashSet<>();
        
        people.add(new Person("Adam", "Johnsson"));
        people.add(new Person("Adam", "Samuelsson"));
        people.add(new Person("Ben", "Carlsson"));
        people.add(new Person("Ben", "Carlsson"));
        people.add(new Person("Cecilia", "Adams"));
        
        people.stream()
            .sorted()
            .forEachOrdered(System.out::println);
    }
}
Output
run:
Adam Johnsson
Adam Samuelsson
Ben Carlsson
Cecilia Adams
BUILD SUCCESSFUL (total time: 0 seconds)

Person implements several methods here to control the output of the stream. The hashCode() and equals() method make sure that duplicate persons can't be added to the set. The compareTo() method is used by the sorted action to produce the desired order. The overridden toString()-method is finally controlling how each Person should be printed when System.out.println() is called. Do you recognize this structure? You can find it in almost every java project out there.

Alternative Code

Instead of putting all functionality into the Person class, we can try and keep it as clean as possible and use functional references to handle these decorations. We remove all the boilerplate with equals, hashCode, compareTo and toString and instead we introduce two static variables, COMPARATOR and TO_STRING.

Person.java
public class Person {
    
    private final String firstname;
    private final String lastname;
    
    public Person(String firstname, String lastname) {
        this.firstname = firstname;
        this.lastname  = lastname;
    }

    public String getFirstname() {
        return firstname;
    }

    public String getLastname() {
        return lastname;
    }
    
    public final static Comparator<Person> COMPARATOR =
        Comparator.comparing(Person::getFirstname)
            .thenComparing(Person::getLastname);
    
    public final static Function<Person, String> TO_STRING =
        p -> p.getFirstname() + " " + p.getLastname();
}
Main.java
public class Main {
    public static void main(String... args) {
        final Set people = new TreeSet<>(Person.COMPARATOR);
        
        people.add(new Person("Adam", "Johnsson"));
        people.add(new Person("Adam", "Samuelsson"));
        people.add(new Person("Ben", "Carlsson"));
        people.add(new Person("Ben", "Carlsson"));
        people.add(new Person("Cecilia", "Adams"));
        
        people.stream()
            .map(Person.TO_STRING)
            .forEachOrdered(System.out::println);
    }
}
Output
run:
Adam Johnsson
Adam Samuelsson
Ben Carlsson
Cecilia Adams
BUILD SUCCESSFUL (total time: 0 seconds)

The nice thing with this approach is that we can now replace the order and the formatting of the print without changing our Person class. This will make the code more maintainable and easier to reuse, not to say faster to write.

Wednesday, February 3, 2016

Definition of the Trait Pattern in Java

In this article I will present the concept of traits and give you a concrete example of how they can be used in Java to achieve less redundancy in your object design. I will begin by presenting a fictional case where traits could be used to reduce repetition and then finish with an example implementation of the trait pattern using Java 8.

Suppose you are developing a message board software and you have identified the following as your data models: “topics”, “comments” and “attachments”. A topic has a title, a content and an author. A comment has a content and an author. An attachment has a title and a blob. A topic can have multiple comments and attachments. A comment can also have multiple comments, but no attachments.

Soon you realise that no matter how you implement the three models, there will be code repetition in the program. If you for an example want to write a method that adds a new comment to a post, you will need to write one method for commenting topics and one for commenting comments. Writing a method that summarizes a discussion by printing out the discussion tree will have to take into consideration that a node can be either a topic, a comment or an attachment.

Since the inception of Java over 20 years ago, object-oriented programming has been the flesh and soul of the language, but during this time, other languages has experimented with other tools for organizing the structure of a program. One such tool that we use in Speedment Open Source is something called “Traits”. A trait is kind of a “micro interface” that describes some characteristic of a class design that can be found in many different components throughout the system. By referring to the traits instead of the implementing class itself you can keep the system decoupled and modular.

Let’s look at how this would change our example with the message board.

Now the different traits of each entity has been separated into different interfaces. This is good. Since Java allows us to have multiple interfaces per class, we can reference the interfaces directly when writing our business logic. In fact, the classes will not have to be exposed at all!

Traits have existed for many years in other programming languages such as Scala, PHP, Groovy, and many more. To my knowledge there is no consensus regarding what is considered a trait between different languages. On the Wikipedia page regarding traits it says that:

“Traits both provide a set of methods that implement behaviour to a class and require that the class implement a set of methods that parameterize the provided behaviour”

The following properties are named as distinctive for traits:

  • traits can be combined (symmetric sum)
  • traits can be overriden (asymmetric sum)
  • traits can be expanded (alias)
  • traits can be excluded (exclusion)

Since Java 8 you can actually fulfill most of these criteria using interfaces. You can for an example cast an implementing class of an unknown type to a union of traits using the and (&) operator, which satisfies the symmetric sum criteria. A good example of this is described here. By creating a new interface and using default implementations you can override some methods to fulfill the asymmetric sum criteria. Aliases can be created in a similar way. The only problem is exclusion. Currently java has no way of removing a method from inheritance so there is no way to prevent a child class from accessing a method defined in a trait.

If we return to the message board example, we could for an example require the implementing class to have a method getComments, but all additional logic regarding adding, removing and streaming over comments could be put in the interface.


public interface HasComments<R extends HasComments<R>> {
    
    // one method that parameterize the provided behaviour
    List<Comment> getComments();

    // two methods that implement the behaviour
    default R add(Comment comment) {
        getComments().add(comment);
        return (R) this;
    }

    default R remove(Comment comment) {
        getComments().remove(comment);
        return (R) this;
    }
}

If we have an object and we want to cast it to a symmetric sum of HasComments and HasContent, we could do it using the and (&) operator:


final Object obj = ...;
Optional.of(obj)
    .map(o -> (HasComments<?> & HasContent<?>) o)
    .ifPresent(sum -> {/* do something */});

That was all for this time!

PS: If you want to read more about traits as a concept, I really suggest you to read the Traits: Composable Units of Behaviour paper from 2003 by N. Schärli et al.