Incremental rebuilds

Hey folks, I’m new. I just spent all day switching my 2005 vintage blog from Nikola, written in Python, to Zola. I like Zola; I had few issues configuring it and making it do what I like.

One thing that disappointed me a little was the performance of rebuilds. It takes 7 to 8 seconds to regenerate all 233 pages of my blog. The fact that zola serve has no debouncing is not helping as every subconscious save on my part triggers a 7 second rebuild (though I see there is work being done on this recently). But even with debouncing it may be a little bit slow - we’ll see.

Nikola (which I have been using since 2013) has fast rebuilds because it does some clever change tracking. Sufficiently clever change tracking is very difficult implement, or is going to happen any time soon, but I thought I’d start a conversation about it, not hindered by any practical knowledge.

The problem is that templates can pick up information from a wide variety of places. Adding a page potentially affects the archive page, the tags page, the feed page, the landing page. To incrementally rerender only update those pages that are affected, you’d need to do some analysis of the data dependencies of each template and that’s definitely difficult.

But adding an article to the blog likely does not affect all other blog articles. Perhaps we could come up with a shortcut and have a way to explicitly declare somewhere that we don’t want to rerender all pages? I haven’t worked out the details - it’s tricky but perhaps more doable.

Let’s move on to another idea. The markdown content doesn’t allow action at a distance as far as I can see - no inclusion of information, even titles, into another. There is link checking, but that’s just about it, and checking the outgoing internal links of a single markdown file is going to be fast enough.

So markdown content changes cannot affect other content on the blog; correct? So we could introduce a system where we detect whether only markdown was changed, not anything in the configuration block, and if so, only rerender the markdown and the page that includes it. This means that if I save a file while I’m typing, I’m going to see an instant update (if I don’t touch the config block on top).

Of course there are issues with this idea: if you include full page content into an atom feed, then the atom feed also needs to be rebuilt each time you save, and that includes all the content of all the pages. But perhaps we can come up with enough restrictions to make something like this work? Anyway, I thought I’d bring it up for discussion.

Is it public somewhere? I’ve seen people with 100k+ pages build their site in 20s so something is wrong there.

Zola used to have incremental rebuilds but it is very hard to get it working correctly and at some point a good third of the issues where people not seeing the updated data in zola serve.

But adding an article to the blog likely does not affect all other blog articles.

Is it though? People can have some “related articles” section or whatever and that new article would be shown on other articles. If you don’t care too much about inconsistency when writing, you can use zola serve --fast that will only re-render the page/section affected.

So markdown content changes cannot affect other content on the blog; correct?

Nope it can. If you show the summary of a page in a section and you edit one of the page: the section needs to be re-rendered since its content might have changed.
Also since you can fetch any page/section from any template, it’s impossible to know for sure which template needs to be re-rendered. For quick re-render there’s zola serve --fast as mentioned before.

Is it public somewhere? I’ve seen people with 100k+ pages build their site in 20s
so something is wrong there.

I was wondering after I posted this - it’s an old blog but the amount of pages isn’t that huge.

I’ve made it public:

It’s using the tabi theme - I wonder how much the theme can impact things. I tried running a flamegraph over it but I think I need to make a debug build as it wasn’t very informative - any clues there?

Zola used to have incremental rebuilds but it is very hard to get it working
correctly and at some point a good third of the issues where people not seeing the
updated data in zola serve.

Yeah, that’s not optimal. I understand why it’s hard.

But adding an article to the blog likely does not affect all other blog articles.

Is it though? People can have some “related articles” section or whatever and that
new article would be shown on other articles. If you don’t care too much about
inconsistency when writing, you can use zola serve --fast that will only re-render
the page/section affected.

Yeah, sorry for being unclear but I do understand that. What I meant is that if there’s a way to mark dependencies or a lack of dependencies you can skip a lot of work. That said, it’s risky as any inconsistency would lead to unpredictable results.

So markdown content changes cannot affect other content on the blog; correct?

Nope it can. If you show the summary of a page in a section and you edit one of
the page: the section needs to be re-rendered since its content might have
changed.

Ah, I wasn’t aware of the summary option. I’d cursorily scanned the theme and it wasn’t using this feature so I missed it.

That said, tracking references to page.content and page.summary is more restricted than tracking everything. The problem is that to do that reliably probably requires a static analysis of the templates to see how variables are traced. Anyway, thanks for answering; I knew it was going to be hard but I had hoped the problem could be restricted to make it more tractable. If I have such ideas in the future I shall get back to you. But if I can speed up zola in general it’s not going to matter.

Also since you can fetch any page/section from any template, it’s impossible to
know for sure which template needs to be re-rendered. For quick re-render there’s
zola serve --fast as mentioned before.

That sounds like a useful option!It may be helpful to add a little bit of text to the cli-usage documentation to explain you can do this. I realize it doesn’t document all the CLI options but this sounds like a useful use case to mention.

To debug why my build is slow I’ve created an issue: