Monday, November 21, 2016

Hack Speedment into Your Own Personal Code Generator

Spire and Duke dressed as Neo from Matrix

Speedment is an Open Source toolkit that can be used to generate Java entities and managers for communicating with a database. This is great if you need an Object Relational Mapping of the domain model, but in some cases, you might want to generate something completely different using your database as the template. In this article I am going to show you a hack that you can use to take over that code generator in Speedment and use it for your own personal purposes. At the end of the article we will have a completely blank code generator that we can program to do our bidding!

Background

Speedment is designed to operate as a plugin to Maven. By invoking various new Maven goals we can instruct Speedment to connect to a database, generate source code and also remove any generated files from our project. It also contains a graphical user interface that makes it easy to configure the generation job based on metadata collected from our database. Now, imagine all this information that we can collect from analyzing that metadata. We know what tables exist, we know of all the constraints they have and what types the individual columns have. There are probably millions of use-cases where we can benefit from automatically generating stuff from that metadata. Following the steps in this article, we can do all those things.

Screenshot of the Speedment User Interface

Step 1: Set Up a Regular Speedment Project

Create a new Maven Project and add the following to the pom.xml-file:

pom.xml

<properties>
    <speedment.version>3.0.1</speedment.version>
    <mysql.version>5.1.39</mysql.version>
</properties>


<dependencies>
    <dependency>
        <groupId>com.speedment</groupId>
        <artifactId>runtime</artifactId>
        <version>${speedment.version}</version>
        <type>pom</type>
    </dependency>
</dependencies>


<build>
    <plugins>
        <plugin>
            <groupId>com.speedment</groupId>
            <artifactId>speedment-maven-plugin</artifactId>
            <version>${speedment.version}</version>
            <dependencies>
                <dependency>
                    <groupId>mysql</groupId>
                    <artifactId>mysql-connector-java</artifactId>
                    <version>${mysql.version}</version>
                </dependency>
            </dependencies>
        </plugin>
    </plugins>
</build>

We have added Speedment as a runtime dependency and configured the Maven plugin to use the standard MySQL JDBC driver to connect to our database. Great! You now have access to a number of new Maven goals. For an example, if we wanted to launch the Speedment UI we could do that by running:


mvn speedment:tool

If we did that now, Speedment would launch in normal mode allowing us to connect to a database and from it generate entities and managers for communicating with that database using Java 8 streams. That is not what we want to do this time around. We want to hack it so that it does exactly what we need it to do. We therefore continue to modify the pom.

Step 2: Modify the Plugin Declaration

Speedment is built up in a modular fashion with different artifacts responsible for different tasks. All the pre-existing generator tasks are located in an artifact called "com.speedment.generator:generator-standard". That is where we are going to strike! By removing that artifact from the classpath we can prevent Speedment from generating anything we don’t want it to.

We modify the pom as follows:


...
<plugin>
    <groupId>com.speedment</groupId>
    <artifactId>speedment-maven-plugin</artifactId>
    <version>${speedment.version}</version>
    <dependencies>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql.version}</version>
        </dependency>
       
        <!-- Add this: -->
        <dependency>
            <groupId>com.speedment</groupId>
            <artifactId>tool</artifactId>
             <version>${speedment.version}</version>
             <type>pom</type>
             <exclusions>
                 <exclusion>
                     <groupId>com.speedment.generator</groupId>
                     <artifactId>generator-standard</artifactId>
                 </exclusion>
            </exclusions>
        </dependency>
    </dependencies>
</plugin>
...

What is that? We exclude a dependency by adding one? How can that even work? Well, Speedment is designed to include as little code as possible unless it is explicitly needed by the application. The "com.speedment:tool-artifact" is a dependency to the maven plugin already, and by mentioning it in the <dependencies>-section of the maven plugin we can append settings to its configuration. In this case, we say that we want the plugin to have access to the tool except we don’t want the standard generator.

There is a problem here though. If we try and launch the speedment:tool goal, we will get an exception. The reason for this is that Speedment expects the standard translators to be on the classpath.

Here is where the second ugly hack comes in. In our project, we create a new package called com.speedment.generator.standard and in it define a new java file called StandardTranslatorBundle.java. As it turns out, that is the only file that Speedment really needs to work. We give it the following content:

StandardTranslatorBundle.java

package com.speedment.generator.standard;


import com.speedment.common.injector.InjectBundle;
import java.util.stream.Stream;


public final class StandardTranslatorBundle implements InjectBundle {
    @Override
    public Stream<Class<?>> injectables() {
        return Stream.empty();
    }
}

Next, we need to replace the excluded artifact with our own project so that the plugin never realizes that the files are missing. We return to the pom.xml-file and add our own project to the <dependencies>-section of the speedment-maven-plugin. The full pom file looks like this:

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
    http://maven.apache.org/xsd/maven-4.0.0.xsd">
    
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.github.pyknic</groupId>
  <artifactId>speedment-general-purpose</artifactId>
  <version>1.0.0-SNAPSHOT</version>
  <packaging>jar</packaging>
    
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <speedment.version>3.0.1</speedment.version>
  </properties>
    
  <dependencies>
    <dependency>
      <groupId>com.speedment</groupId>
      <artifactId>runtime</artifactId>
      <version>${speedment.version}</version>
      <type>pom</type>
    </dependency>
  </dependencies>
    
  <build>
    <plugins>
      <plugin>
        <groupId>com.speedment</groupId>
        <artifactId>speedment-maven-plugin</artifactId>
        <version>${speedment.version}</version>
        <dependencies>
          <dependency>
            <groupId>com.speedment</groupId>
            <artifactId>tool</artifactId>
            <version>${speedment.version}</version>
            <type>pom</type>
            <exclusions>
              <exclusion>
                <groupId>com.speedment.generator</groupId>
                <artifactId>generator-standard</artifactId>
              </exclusion>
            </exclusions>
          </dependency>
          <dependency>
            <groupId>com.github.pyknic</groupId>
            <artifactId>speedment-general-purpose</artifactId>
            <version>1.0.0-SNAPSHOT</version>
          </dependency>   
          <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.39</version>
          </dependency>
        </dependencies>
      </plugin>
    </plugins>
  </build>
</project>

If we now build our project and then run the goal speedment:tool, we should be able to launch the graphical user interface without problem. If we connect to the database and then press "Generate", nothing will happen at all! We have successfully hacked Speedment into doing absolutely nothing!

Step 3: Turn Speedment Into What You Want It To Be

Now when we have a fresh, clean Speedment, we can start turning it into the application we want it to be. We still have a powerful user interface where we can configure the code generation based on a database model. We have an expressive library of utilities and helper classes to work with generated code. And above all, we have a structure for analyzing the database metadata in an object-oriented fashion.

To learn more about how to write your own code generation templates and hook them into the platform, check out this article. You should also check out the Speedment GitHub page to learn how the existing generators work (the ones we just disabled) and maybe get some inspiration on how you can build your own.

Until next time, continue hacking!

Monday, November 14, 2016

Auto-Generate Optimized Java Class Specializations

Spire and Duke are playing with specialized geometric shapes

If you visited JavaOne this year you might have attended my presentation on “How to Generate Customized Java 8 Code from your Database”. In that talk I showcased how the Speedment Open Source toolkit is used to generate all kinds of Java code using a database as the domain model. One thing we didn’t have time to go into though is the fact that Speedment is not only making code generation easier, it is also itself made up by generated code. In this article I will show you have we set up Speedment to generate specialized versions of many classes to reduce the memory footprint of performance critical parts of the system.

Background

As you might know, Java has a number of built-in value types. These are bytes, shorts, integers, longs, floats, doubles, booleans and characters. Primitive value types are different from ordinary objects primarily in that they can be allocated directly on the memory stack, reducing the burden on the Garbage Collector. A problem with not inheriting from Object is that they can’t be put into collections or passed as parameters to methods that take object parameters without being wrapped. Typical wrapper classes are therefore “Integer”, “Double”, “Boolean” etc.

Wrapping a value type is not always a bad thing. The JIT (Just-In-Time) compiler is very good at optimizing away wrapper types if they can safely be replaced with primitive values, but that is not always possible. If this occur in a performance critical section of your code like in an inner loop, this can heavily impact the performance of the entire application.

That’s what happened to us when working on Speedment. We had special predicates and functions that contained metadata about their purpose. That metadata needed to be analyzed very rapidly inside an inner loop, but we was slowed down by the fact that most of this metadata was wrapped inside generic types so that they could be instantiated dynamically.

A common solution to this problem is to create a number of “specializations” of the classes that contain the wrapper types. The specializations are identical to the original class except that they use one of the primitive value types instead of a generic (object only) type. A good example of specializations are the various Stream interfaces that exist in Java 8. In addition to “Stream” we also have an “IntStream”, a “DoubleStream” and a “LongStream”. These specializations are more efficient for their particular value type since they don’t have to rely on wrapping types in objects.

The problem with specialization classes is that they add a lot of boilerplate to the system. Say that the parts that need to be optimized consist of 20 components. If you want to support all the 8 primitive variations that Java has you suddenly have 160 components. That is a lot of code to maintain. A better solution would be to generate all the extra classes.

Template Based Code Generation

The most common form of code generation in higher languages is template based. This means that you write a template file and then do keyword replacement to modify the text depending on what you are generating. Good examples of these are Maven Archetypes or Thymeleaf. A good template engine will have support for more advanced syntax like repeating sections, expressing conditions etc. If you want to generate specialization classes using a template engine you would replace all occurrences of “int”, “Integer”, “IntStream” with a particular keyword like “${primitive}”, “${wrapper}”, “${stream}” and then specify the dictionary of words to associate with each new value type.

Advantages of template based code generation is that it is easy to setup and easy to maintain. I think most programmers that read this could probably figure out how to write a template engine fairly easy. A disadvantage is that a templates are difficult to reuse. Say that you have a basic template for a specialized, but you want floating types to also have an additional method. You could solve this with a conditional statement, but if you want that extra method to also exist in other places, you will need to duplicate code. A typical example of code that often need to be duplicated is hashCode()-methods or toString(). This is where model based code generation is stronger.

Model Based Code Generation

In model based code generation, you build an abstract syntax tree over the code you want generated and then render that tree using a suitable renderer. The syntax tree can be mutated depending on the context that it is being used in, for an example by adding or removing methods to implement a certain interface. The main advantage of this is higher flexibility. You can dynamically take an existing model and manipulate which methods and fields to include. The downside is that model based code generation generally take a bit longer to set up.

Case Study: Speedment Field Generator

At Speedment, we developed a code generator called CodeGen that uses the model based approach to automatically generate field specializations for all the primitive value types. In total around 300 classes are generated on each build.

Speedment CodeGen uses an abstract syntax tree built around the basic concepts of object oriented design. You have classes, interfaces, fields, methods, constructors, etc that you use to build the domain model. Below the method level you still need to write templated code. To define a new main class, you would write:


import com.speedment.common.codegen.model.Class; // Not java.lang.Class

...

Class createMainClass() {
  return Class.of("Main")
    .public_().final_()
    .set(Javadoc.of("The main entry point of the application")
      .add(AUTHOR.setValue("Emil Forslund"))
      .add(SINCE.setValue("1.0.0"))
    )
    .add(Method.of("main", void.class)
      .public_().static_()
      .add(Field.of("args", String[].class))
      .add(
        "if (args.length == 0) " + block(
          "System.out.println(\"Hello, World!\");"
        ) + " else " + block(
          "System.out.format(\"Hi, %s!%n\", args[0]);"
        )
      )
    );
}

This would generate the following code:


/**
 * The main entry point of the application.
 * 
 * @author Emil Forslund
 * @since  1.0.0
 */
public final class Main {
  public static void main(String[] args) {
    if (args.length == 0) {
      System.out.println("Hello, World!");
    } else {
      System.out.format("Hi, %s!%n", args[0]);
    }
  }
}

The entire model doesn’t have to be generated at once. If we for an example want to generate a toString()-method automatically, we can define that as an individual method.


public void generateToString(File file) {
  file.add(Import.of(StringBuilder.class));
  file.getClasses().stream()
    .filter(HasFields.class::isInstance)
    .filter(HasMethods.class::isInstance)
    .map(c -> (HasFields & HasMethods) c)
    .forEach(clazz -> 
      clazz.add(Method.of("toString", void.class)
        .add(OVERRIDE)
        .public_()
        .add("return new StringBuilder()")
        .add(clazz.getFields().stream()
          .map(f -> ".append(\"" + f.getName() + "\")")
          .map(Formatting::indent)
          .toArray(String[]::new)
        )
        .add(indent(".toString();"))
      )
    );
}

Here you can see how the Trait Pattern is used to abstract away the underlying implementation from the logic. The code will work for Enum as well as Class since both implement both the traits “HasFields” and “HasMethods”.

Summary

In this article I have explained what specialization classes are and why they are sometimes necessary to improve performance in critical sections of an application. I have also showed you how Speedment use model based code generation to automatically produce specialization classes. If you are interested in generating code with these tools yourself, go ahead and check out the latest version of the generator at GitHub!