Memory consumption for large sites

Hi, gigantic site guy here. I’ve been quiet for a while but zola has been happily churning away on our massive site. The site is up to 575,000 pages and it takes about 2 minutes for a fresh build, and I’m very happy with that performance. (No, it’s not an AI generated site, if anyone was wondering.)

Memory consumption is getting pretty high, though, topping 20 GB towards the end of the build. It’s fine on workstations, but pipelines with lower memory quotas are starting to be a problem.

What I’ve tried so far:

  • Analyze memory usage. I tried heaptrack and massif. Both went OOM trying to measure the memory usage. I sort of got heaptrack working by saving raw allocations and creating a basic streaming analyzer, since heaptrack’s analyzer (heaptrack_interpret) does bulk.
  • Look for optimizations in tera. I found some enums in tera that have large variants, and Rust enums are the size of the largest variant. I tried Boxing the larger variants, such as Math(MathExpr) becoming Math(Box<MathExpr>). This reduced tera memory by about 44% in tera-only tests I did, but had very little effect on total memory used in our zola build.
  • Look for optimizations in zola. I think the main thing that would help is less collect() and more streaming, but overhauling the build architecture would be silly, just to help me, probably the only zola user on earth with this problem. I tried converting a few isolated areas to streaming, like get_all_orphan_pages (we have a LOT of orphans), but it didn’t move the needle.

After trying that stuff, I’m here looking for advice.

Can you (keats et al) suggest areas of zola that I could look at optimizing?

Maybe a few tactical places that would benefit from replacing a `Vec<T>` with an impl Iterator<Item = T>? Or places where I could add a manual drop() for certain parts of the site tree, once they are no longer needed (like the raw file content for example; could be freed after rendering)?

Thanks for any pointers. :slight_smile:

2 Likes

The site is up to 575,000 pages and it takes about 2 minutes for a fresh build,

Haha my biggest test site I was generating had 100k pages, guess I need to up that.

I’m starting to implement GitHub - Keats/tera2 in Zola (well not ready anytime soon) along with some internal refactoring but it should allow much lower memory usage and better perf. The main driver for those improvements would be the new tera::Value that just contains everything in Arc so cloning those will be cheap, unlike right now.

If you can think of good perf improvements for Tera itself, you can have a look at the repo: GitHub - Keats/tera2

I’ll ping you on this issue as soon as it is ready to test.

How are the pages distributed? Across tons of sections or a single one? Are you using taxonomies/pagination?

1 Like

The tera2 changes sound exciting!

The content is 70% orphans. No taxonomies and no pagination. As for sections, we have 3 sections whose only purpose is to set a template for items in that section. Most content has no section and has the template path hard-coded in each item’s front matter. We have 12 GB of stuff in static. Some macros but no recursive macros. load_data is on about 200,000 pages, if that matters, but the data being loaded is not large. Is there anything else that would be useful to know?

From the analysis I’ve done so far, it seems like most of the memory consumption comes from storing so many Pages in bulk, with their raw_content and content at the same time. If it were possible to process 1,000 pages at a time, for example, I bet memory consumption would go down a lot.

Macros/shortcodes are gone in tera2 so hopefully it won’t be too annoying to migrate.

The main issue is that a user can request anything from a template so you do need them loaded

1 Like

I just checked and we have no macros at all now! And there are no shortcodes, so that migration will be easy for us. :sweat_smile:

I’ve been daydreaming about the idea of streaming/batching the site build in Zola, for Pagesat least? This code from build.rs is currently:

site.load()?; // Load all .md files
messages::notify_site_size(&site);
messages::warn_about_ignored_pages(&site);
site.build() // aliases, orphans, sitemap, template rendering, etc etc etc

Would become something like this:

site.walk()?; // walk the file tree to find all md files but do not read them yet, just get their `PathBuf`s
messages::notify_site_size(&site);
// messages::notify_site_size(&site); // this may need to move later as we may not know whether a page is ignored until it's read from disk
site.load_and_build() // <-- more below

load_and_build would act like this:

  1. Check the list of md files produced by walk() for sections, and read only those at this point, to get metadata like templates.
  2. Load all templates in template directories.
  3. Take the list of md files produced by walk() and split them into N chunks where N is the number of cores.
  4. Start up N threads and give each one a chunk.
  5. Each thread loops through the PathBufs and:
    1. Read the file.
    2. Determine its section, from step 1.
    3. Process & render the file with its template or its section template.
    4. Write the resulting HTML.
  6. Perform other tasks, like render themes CSS, render 404 page, site map, taxonomies, etc.

Obviously that list is missing a lot of steps, and some data like taxonomies and the sitemap would have to be built up, bit by bit, until they are fully known when the thread pool has finished. Aliases may pose a problem.

I’ve implemented this exact Walk, Chunk, Thread Pool, pattern in the code for our site that preprosses content and feeds it into Zola, and it’s been very good in performance and memory usage. Zola is already the fastest SSG I’ve tried and the wall-clock time is fantastic even with such a large site, but I think a change like this would go a long way towards reducing the high memory usage on such a large site. It’s nice memory-wise because there are only ever N files in memory at a given time, one for each thread.

Still, I’m very aware that our site is probably the only one out there with this problem, so I wanted to get advice from you on whether you like this idea before I start working on it.

1 Like

I have an email from you 4 years ago but I don’t know if the email is still valid (@redhat.comredhat.com).

You can try zola with tera2 in the tera2 branch. You will also need to pull Zola tweaks by Keats · Pull Request #94 · Keats/tera2 · GitHub next to it. It’s far from finished but it does contain most of the optimizations I had in mind for memory usage with Tera2 so I’m curious to see if it changes anything. If you don’t do anything fancy in the templates I’m guessing not much but who knows.

Is it possible to share your site privately/a script to generate something similar? It would be very useful to have a real site at that scale to profile.

I’m 99.9% sure for example we could remove raw_content entirely without any issues.

Edit: i’ve just pushed a commit clearing it after processing.