Feb
2014
Even faster Java Expression Evaluator
I’ve been reading the How to write one of the fastest expression evaluators in Java article (also published over at JCG) and thought to myself – there is an even faster way!
Thus I whipped up a Caliper benchmark which you can check out on my GitHub account.
The benchmark compiles the performance of the following libraries:
- parsii
- Jeval
- JEPLite
- MathEval
- expr
- Janino ExpressionEvaluator
- A hand-rolled solution based on Janino
Before getting into the results, a couple of general words about the state of (open source) expression evaluator libraries in Java (I couldn’t test closed source ones because they impose restrictions – like “only 15 expressions evaluated” – which make it impossible to benchmark them):
- None of them are in Maven central (or even in Maven for that matter – with the exception of Janino, but there you still need a different repo to get the latest version)
- Their development is mostly stalled
- They are lacking in features and/or are buggy
The benchmark consist in evaluating the 2 + (7-5) * 3.14159 * x + sin(0)
expression (similar to the original article). Unfortunately the exponentiation part had to be eliminated because not all libraries support it. Also, x had to be fixed to 0 because parsii erroneously always considers it zero.
Most of the libraries (JEPLite and MathEval being the exception) separate the idea of “compiling” and “evaluating” the expression. This can be very useful if we wish to evaluate the same expression for a multitude of values of the contained variable. Thus the benchmark tests two cases: compile + evaluate and only evaluate (depending on your usecase one or the other is more relevant). Also, Jeval is by default excluded from the benchmark since it performs so poorly that it eclipses all the other results. If you wish to see just how poorly it performs, uncomment the relevant methods from BenchmarkExpressionEvaluation.
Without further ado, here are the results (smaller is better):
As you can see, the custom implementation beats parsii by a factor of 10x! So how does it work? You can always check out the source code, but I will also explain: It uses Janino to compile a class of the form:
import static java.lang.Math.*; // to get sin, cos, etc public static final class JaninoCompiledFastexpr1234 implements UnaryDoubleFunction { public double evaluate(double x) { return (/* your expression here */); } }
which you can later invoke. The advantages of this approach are:
- raw speed – as you can see from the benchmark, the Java parser / compiler is very well optimized and hard to beat, even when rolling your own parser for a subset of cases
- raw speed the second time – the JIT helps us immensely and gives optimizations like constant folding or vectorization for “free”
- brevity – the whole proof of concept is implemented in 60 lines, probably less lines than this article contains
- “garbage free” – after the initial compilation no allocations are made in the heap, but rather all parameters are passed on the stack (since they are primitives)
Of course the solutions is not without its drawbacks:
- depending on the source of the expression we could be compiling and executing arbitrary Java code sent to us by a malicious user. Not a pretty prospect. We try to mitigate this in two ways:
- whitelist the set of characters allowed in the expression
- execute all expressions under a special classloader which uses a ProtectionDomain that has no permissions
- the above isn’t wholly satisfactory however:
- the compiler itself can have bugs (it has happened in the past)
- neither of the safety measures efficiently prevent the potential denial of service conditions (like creating infinite loops or consuming all the memory)
- also, this is just a proof of concept – for a more complete solution one would need to add more interfaces (like BinaryDoubleFunction, TernaryDoubleFunction and also UnaryLongFunction, etc) as well as some settings related to the naming of the parameters.
Still, this is a great example as to the power of the Java ecosystem and what can be achieved by thinking a little outside of the box.
If you run the benchmark for yourself, you might find a couple of interesting things:
- the “Janino” test continuously warns about recompilation. This is expected if you think a little bit about it – since at every invocation it “compiles” the expression. The same is true for JaninoFastexpr
- both Janino and JaninoPrecompiled complain about excessive garbage collection. This is because they use auto-boxing (varargs) to pass the parameters which isn’t very efficient (generates a lot of garbage)
This is it folks, enjoy the source code and hopefully this will inspire people to find better solutions to their problems!
Update: as Andreas pointed out I was using parsii wrong (hence the variable being always considered zero). Also parsii is now on Maven Central (as every open source Java library should be!). Finally, as he pointed out, the Janino compiled solution won’t give you very good error messages in case your expression is syntactically wrong – not much we can do there. There is the possibility of doing hand-parsing first and then generating bytecode (works also as a better security concern). An other possible security measure would be to decompile the generated class (or look at the AST directly) to ensure that it only contains “whitelisted” operations. The Janino documentation gives some examples to get you started.
Update 2: Ashwin pointed me towards MVEL. It is a really promising projects, mainly because it has the option to generate bytecode for the expression and use it in subsequent evaluations. I added it to the testsuite and here are the results. Unfortunately there are some issues with the library:
- Its development seems to have stalled. The main site isn’t updated anymore, development has moved to GitHub, the documentation examples don’t always work (recommendation: look at the unittests instead for working examples) and Maven Central has a newer version than the codehaus.org repository (which is a good think, but some – like myself – might be induced in error because the codehaus repository is linked from the project’s site).
- It has some weird quirks, like
x * (7-5) * 3.14159
works, but(7-5) * 3.14159 * x
results in a parse error - It is much more than an expression evaluation library (in fact some unittests run a complete implementation of Quicksort in MVEL) which is nice, but it results in less-optimal code for the “expression evaluation” use-case (because of autoboxing, map-based variable lookups, etc)
The results are not too great but this library definitely shows some potential and if people would give it some love I’m sure it could be made the top-performing evaluation library.
Update 3: the author of MVEL has some follow-up comments related to the benchmark. You can find them here.
Pingback: Java数学表达式计算(Expression Evaluator) – 源码巴士