Transilvania JUG

Sponsors

Partners

05
Feb
2014
4

Even faster Java Expression Evaluator

I’ve been reading the How to write one of the fastest expression evaluators in Java article (also published over at JCG) and thought to myself – there is an even faster way!

Thus I whipped up a Caliper benchmark which you can check out on my GitHub account.

The benchmark compiles the performance of the following libraries:

Before getting into the results, a couple of general words about the state of (open source) expression evaluator libraries in Java (I couldn’t test closed source ones because they impose restrictions – like “only 15 expressions evaluated” – which make it impossible to benchmark them):

  • None of them are in Maven central (or even in Maven for that matter – with the exception of Janino, but there you still need a different repo to get the latest version)
  • Their development is mostly stalled
  • They are lacking in features and/or are buggy

The benchmark consist in evaluating the 2 + (7-5) * 3.14159 * x + sin(0) expression (similar to the original article). Unfortunately the exponentiation part had to be eliminated because not all libraries support it. Also, x had to be fixed to 0 because parsii erroneously always considers it zero.

Most of the libraries (JEPLite and MathEval being the exception) separate the idea of “compiling” and “evaluating” the expression. This can be very useful if we wish to evaluate the same expression for a multitude of values of the contained variable. Thus the benchmark tests two cases: compile + evaluate and only evaluate (depending on your usecase one or the other is more relevant). Also, Jeval is by default excluded from the benchmark since it performs so poorly that it eclipses all the other results. If you wish to see just how poorly it performs, uncomment the relevant methods from BenchmarkExpressionEvaluation.

Without further ado, here are the results (smaller is better):

expression_evaluator_results

As you can see, the custom implementation beats parsii by a factor of 10x! So how does it work? You can always check out the source code, but I will also explain: It uses Janino to compile a class of the form:

import static java.lang.Math.*; // to get sin, cos, etc

public static final class JaninoCompiledFastexpr1234 implements UnaryDoubleFunction {
    public double evaluate(double x) {
        return (/* your expression here */);
    }
}

which you can later invoke. The advantages of this approach are:

  • raw speed – as you can see from the benchmark, the Java parser / compiler is very well optimized and hard to beat, even when rolling your own parser for a subset of cases
  • raw speed the second time – the JIT helps us immensely and gives optimizations like constant folding or vectorization for “free”
  • brevity – the whole proof of concept is implemented in 60 lines, probably less lines than this article contains
  • “garbage free” – after the initial compilation no allocations are made in the heap, but rather all parameters are passed on the stack (since they are primitives)

Of course the solutions is not without its drawbacks:

  • depending on the source of the expression we could be compiling and executing arbitrary Java code sent to us by a malicious user. Not a pretty prospect. We try to mitigate this in two ways:
    • whitelist the set of characters allowed in the expression
    • execute all expressions under a special classloader which uses a ProtectionDomain that has no permissions
  • the above isn’t wholly satisfactory however:
    • the compiler itself can have bugs (it has happened in the past)
    • neither of the safety measures efficiently prevent the potential denial of service conditions (like creating infinite loops or consuming all the memory)
  • also, this is just a proof of concept – for a more complete solution one would need to add more interfaces (like BinaryDoubleFunction, TernaryDoubleFunction and also UnaryLongFunction, etc) as well as some settings related to the naming of the parameters.

Still, this is a great example as to the power of the Java ecosystem and what can be achieved by thinking  a little outside of the box.

If you run the benchmark for yourself, you might find a couple of interesting things:

  • the “Janino” test continuously warns about recompilation. This is expected if you think a little bit about it – since at every invocation it “compiles” the expression. The same is true for JaninoFastexpr
  • both Janino and JaninoPrecompiled complain about excessive garbage collection. This is because they use auto-boxing (varargs) to pass the parameters which isn’t very efficient (generates a lot of garbage)

This is it folks, enjoy the source code and hopefully this will inspire people to find better solutions to their problems!

Update: as Andreas pointed out I was using parsii wrong (hence the variable being always considered zero). Also parsii is now on Maven Central (as every open source Java library should be!). Finally, as he pointed out, the Janino compiled solution won’t give you very good error messages in case your expression is syntactically wrong – not much we can do there. There is the possibility of doing hand-parsing first and then generating bytecode (works also as a better security concern). An other possible security measure would be to decompile the generated class (or look at the AST directly) to ensure that it only contains “whitelisted” operations. The Janino documentation gives some examples to get you started.

Update 2: Ashwin pointed me towards MVEL. It is a really promising projects, mainly because it has the option to generate bytecode for the expression and use it in subsequent evaluations. I added it to the testsuite and here are the results. Unfortunately there are some issues with the library:

  • Its development seems to have stalled. The main site isn’t updated anymore, development has moved to GitHub, the documentation examples don’t always work (recommendation: look at the unittests instead for working examples) and Maven Central has a newer version than the codehaus.org repository (which is a good think, but some – like myself – might be induced in error because the codehaus repository is linked from the project’s site).
  • It has some weird quirks, like x * (7-5) * 3.14159 works, but (7-5) * 3.14159 * x results in a parse error
  • It is much more than an expression evaluation library (in fact some unittests run a complete implementation of Quicksort in MVEL) which is nice, but it results in less-optimal code for the “expression evaluation” use-case (because of autoboxing, map-based variable lookups, etc)

The results are not too great but this library definitely shows some potential and if people would give it some love I’m sure it could be made the top-performing evaluation library.

4 Responses to Even faster Java Expression Evaluator

  1. Hi Attila-Mihaly,

    nice work! However, just a few minor things: parsii can handle variables correctly but you need to attach the scope to the expression: parsii.eval.Parser.parse(“2 + (7-5) * 3.14159 * x + sin(0)”, scope);

    Also, it is now (since 2 days) available via maven: com.scireum.parsii-1.0

    It is amazing how fast a compiled version really is (as you can see, once parsed, parsii doesn’t do very much around it). One other benefit of parsii and Janino is, that an invalid expression results in a proper error message (instead of a “java.lang.Security” exception for fastexpr). However, I understand that you code is a prototypical implementation and this issue could be fixed easily. I’d say the most elegant approach would be parsing “by hand” and then manually generating bytecodes for the JVM (there are good libs for it). This would given you raw speed + security (no System.exit(0)….).

    Also thanks for introducing me to caliper – didn’t knew it yet – Really cool and easy to use.

  2. You left out MVEL from your list.

    • Ashwin, thank you for bringing it to my attention. I added it to the benchmark suite and updated the article. The library shows great promise, but unfortunately it didn’t live up to it. I’m sure though that with some love it can be made to perform much better.

  3. Mike Brock says:

    As the author of MVEL, I should note that it’s not quite true that MVEL is no longer maintained. It is. It’s actively maintained by Red Hat and myself, as well as contributions from MuleSoft.

    It was recently updated for Java 8.

    I am not questioning the veracity of your benchmark. But it’s worth mentioning that MVEL is highly optimized. But not necessarily for the case you are testing. It is optimized for adding dynamic, integrated scripting support in Java applications.

    Where it excels, is it’s ability to interact with injected state from the application in an expressive and fast way.

    It is known that MVEL’s math expressions do not match the performance of native Java. For one thing, the type coercion rules around expressions are very similar to JavaScript and a lot of the internals are focused on producing predicable and correct results by doing the necessary numeric type narrowing. This means, certain operations will be auto-expanded to use the Java BigDecimal and BigInteger classes internally.

    Where MVEL *really* excels from a performance perspective is optimizing for field access and method dispatch into Java code. This is what it’s truly designed for and what it’s primarily used for.

    ie. injecting a ‘Person’ instance into an expression as ‘p’ and doing something like: p.name or "Undefined" — as say, an expression in template construction. The JIT is optimized for removing the overhead of calling into Java code in these cases, to bypass reflection.

    That said, it does do expression reduction operations on mathematical expressions as a first-pass optimization. But as I’ve said, it’s numeric type system is somewhat more dynamic than Java’s — this adds some intractable overhead. At least not without expensive static analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>