Cool work! I'm interested in running benchmarks to reproduce the results reported in the paper. Any plans to release the evaluation code soon? Thanks