Moving to Criterion for benchmarking

TL;DR moved code to criterion for benchmarking, wanted to ask if I can make a PR

If I understand correctly it’s been a while since Zola has been benchmarked (not a criticism just an observation). I’ve not done much in terms of benchmarking before, but it is something I want to learn more about so I thought I’d take a look at the Zola benchmarks.

As I understand, the most standard way to do benchmarking in Rust is to use the criterion crate, which provides both support for statistics and for performance regression testing which is quite nice. Additionally, the current libtest code trips up clippy and rust-analyser (for me at least), so I moved the benchmarking code over to criterion and made a few necessary fixes along the way. I’ve also implemented a Makefile.toml so you can run the benchmarks with cargo make full-bench or cargo make small-bench (later will skip the larger test cases) using cargo-make. This will set up and clean up the test sites using the gen.py script. this is to make it easier for people to run the benchmarks for themselves, potentially if they are considering using Zola for themselves.

In terms of the configuration of criterion, I think that could be a lot better, especially the testing timeouts etc. but I thought it would be a good idea to try and get the move to criterion merged first. If people are open to it, I can spend some time tuning the configuration of it.

For the time being, I’ve only implemented the benchmarks. I have run them, but haven’t done any serious examination of the results. If people are interested I’d be willing to do a more extensive writeup on this and the current state of Zola performance.

I’m not sure if and how this would be a good idea, but maybe it’s possible to set up the benchmarking as an integration test for stable/major releases or something? just something to think about.

Thoughts?

I don’t mind switching to criterion but the benchmarks as you said haven’t been run for a long time and I’m not sure they are very representative since they are more an “integration” benchmark so it doesn’t tell us which part slowed down.
If we can create more granular benchmarks it would be useful than the current ones I think.

cargo-make

Let’s avoid adding more tools though, just criterion + an explanation in the README is enough

maybe it’s possible to set up the benchmarking as an integration test for stable/major releases or something?

It wouldn’t be reliable to run on CI but it could be useful to see obvious regressions.

Let’s avoid adding more tools though, just criterion + an explanation in the README is enough

Fair enough

If we can create more granular benchmarks it would be useful than the current ones I think.

I’m assuming that means you’d want more of the individual components benchmarked? I’m willing to take a crack at that once I have time (that probably won’t be for a while though).

In any case, while we can’t really retroactively do benchmarks (not reasonably at least imo), I think it would be good to get some baselines at least. Even if you can’t tell which part slowed you down, if you can at least tell performance is comparable that already gives you information.

Out of curiosity, I might also look into profiling the memory usage using Valgrind, since it doesn’t really look like Rust is going to add native tools for that soon. I’ll share if I find anything but I’m guessing you wouldn’t be interested in having that in the repo right? Given your causious attitute re adding more tools.

Yep it would be nice

Probably not in the repo. Eg I’m using heaptrack for that rather than valgrind so I’d rather avoid having too many tools in there since people have different preferences.