Proposal for sharding of markdown content

  • Feature Name: Markdown shards
  • Start Date: 2025-07-26
  • Preliminary patch: github

Summary

Enable markdown to be broken into “shards” that can be individually accessed from Tera templates. Doing this allows the power of Tera templates to be applied to the main page content.

Motivation

Tera templates are great. Really powerful. You can do some intricate things with navigation, taxonomy etc. in zola using them. When it comes to dealing with the main content of the page, however, you just have {{page.content | safe}} as one big block, meaning that you can’t use that power for most of what you write.

This is fine for blog style pages, where you just want to wrap some big, simple chunk of markdownable content in some navigation, but it is limiting for more complex pages. You end up doing ugly kludges using shortcodes and archetypes to kinda do templates within templates within templates, and all that lovely Tera power is nowhere to be seen.

Various work-arounds have been talked about in the forums and in github issues. The main ones are abusing [extra] in the front matter, using load_data to load markdown structured inside YAML or suchlike and using Tera’s split filter. These are effective, but something feels fundamentally “off” and hacky about all of them.

It turns out you can go a very long way to providing the flexibility to use the main markdown content with Tera’s templates with a very minimal, fully backwards compatible, change to the codebase.

Guide-level explanation

The fundamental issue here was brought over from Hugo: the building of a page is triggered by the presence of a markdown file in the content file hierarchy. This markdown file is, however, not really a markdown file. It’s really a toml file (or yaml in Hugo’s case) with a monolithic “blessed” markdown asset appended. In many ways it would be better if the trigger was actually the presence of a toml file, and the main content was in a file referenced from the toml, just as the template is.

We are where we are, however, and by stealing a trick from multipart email, we can get a huge amount more flexibility out of that monolithic asset at almost no cost. Here’s what the markdown for a page using shards looks like:

+++
title = "Shard demo"
shard_marker = "-*-"
+++

Everything that goes here gets rendered out as normal and is available to the
templates as page.content

If a user doesn't opt in by specifying a shard_marker, everything else gets
rendered as normal and they wouldn't notice the feature.

-*- foo

The rendered html from this markdown is available to the templates as
page.shards.foo[0]

-*- foo

The rendered html from this markdown is available to the templates as
page.shards.foo[1]

-*- bar

The rendered html from this markdown is available to the templates as
page.shards.bar[0]

Your template can then do this sort of thing:

<h1>{{page.title|safe}}</h1>
<div class="main_content">{{page.content|safe}}</div>
<div class="columns">
    <div class="foo">
    {% for b in page.shards.foo %}
    <div class="foo_block">{{b|safe}}</div>
    {% endfor %}
    </div>
    <div class="bar">
    {% for b in page.shards.bar %}
    <div class="foo_block">{{b|safe}}</div>
    {% endfor %}
    </div>
</div>

If you don’t add that shard_marker = "xyz" to the front matter, you wouldn’t know the feature was there. Moreover, because you are selecting what the marker is, you can select something that you are sure is not in your actual markdown. Multipart email uses this trick: Content-Type: multipart/alternative; boundary="=-=-=" to solve the same problem.

Reference-level explanation

The code modification is very simple. In components/content/src/page.rs we modify the render_markdown function so that it checks for the existence of self.meta.shard_marker.

If that doesn’t exist, it procedes as it did before, giving full backward compatibility.

If it does exist, it splits the raw content on the marker and processes the first string in exactly the same way as before to expose the page.content variable. The remaining strings are split again on the first new line, and everything between the shard marker and the new line is used as the shard identifier. The shard content is then processed just as the main content was, and pushed to a vector keyed to that identifier in a hash map. This all amounts to about 15 lines of new code.

The rest of the modifications are very minor: make shard_marker a legitimate entry in the front matter, give the page in the library the new HashMap<String, Vec<String>> and expose this to the tera templates – another 5 lines of code.

Drawbacks

Adds complexity to the documentation. It is somewhat hard to explain why you might need this feature until you find that you need it.

Rationale and alternatives

Another suggestion in the github issues was to add an entire new TOML block, which would potentially give even more flexibility – new meta data could be introduced with each shard. I think that this would make for a much more complex implementation and wouldn’t gain enough to be worth it, since any shard level meta data could just be added to [extra].

Another alternative would be to switch to using TOML files as the trigger for page creation, with content introduced as asset files referenced in the TOML files. This would be a huge change though, and would drastically break backward compatability.

The shards could also be nestable in the same fashion as multi-part email is. This might add some more structure to the data than the simple hash map of vectors proposed, but I don’t believe the complexity would be worth the additional benefit.

A final option is to make sharded markdown something that load_data could parse. This actually has a lot going for it in terms of not touching the main codebase, but is probably a lot less likely to be discovered.

Not doing this is also fine. There are work arounds that, although ugly, work.

Prior art

The main prior art is multipart email, which solves a similar problem: how to break a simple text file that can contain pretty much any text into meaningful sections.

Unresolved questions

Do we want to do this just for pages, or do we need to do it for sections to? You can access the shards of each page from the sections anyway, so I don’t know whether you would gain much.

Future possibilities

Can’t really think of anything for this. It’s a pretty self contained change.

3 Likes

Strongly in favor of this!

Sorry but I don’t really get this tbh, can you help me understand? Let me explain my thought process and hopefully you can help me understand where I’m wrong: a md doc is only ever rendered within the context of a template. A template can be one-off (like a homepage) or reusable (like a blogpost).

If it’s a one-off template, then you would just use one of the various existing ways to load fragments of markdown (like load_data or whatever).

If the template is meant to be reused then that’s a different story, but then I have a hard time understanding the target user. It’s someone who will write a bunch of markdown that’s all going to be structured exactly the same way?

Take the prior art example you gave for email. It’s quite a bit different actually. Yes email has fragments of different documents encoded within it, each with an addressable boundary, but the use case is to just render the email verbatim in another client. There isn’t some kind of template that consumes that email’s different parts to alter how it’s laid out. Even if there were then that template would be useless the next email because each message would be different.

If my thinking is wrong would you be able address confusion and provide a straightforward interpretation of how you think this would be useful? It’s to make one-off templates better? Or the use case is a lot of md docs are going to all be written exactly the same way with the intent of having one template that will extract their “shards” and sprinkle them throughout the html?

The closest thing I can think of to where there’s a use case of the latter is for multilingual sites, where you have the same doc in many languages, but Zola already has a solution to that.

The point is that you don’t need to structure the markdown in any particular way; the shards can be in any order and the template will deal with how to arrange them. The template can also do different things if a particular shard is available or not. It’s probably easiest to illustrate with an example. Here’s a template I use for the home page of a website:

{% extends "base.html" %}
{% block header %}
<link rel="preload" as="image" href="images/hero.webp" fetchpriority="high">
{% endblock %}
{% block content %}
<div class="j-page__content" style="background-image: url(images/hero.webp);">
{{ page.content.main.hero[0] }}
{{ page.content.main.blurb[0] }}
{% if page.content.main.card %}
<div class="j-featured">
  <h1>FEATURED</h1>
  <hr>
  <div class="j-featured__cards">
    {% for card in page.content.main.card %}
    {{ card }}
    {% endfor %}
  </div>
</div>
{% endif %}
</div>
{% endblock %}

and here’s sharded data passed to it:

<!-- shard:hero -->
<div class="j-hero-title">
  <h1>CRAFTED BY TRADITION</h1>
  <h2>DESIGNED FOR TODAY</h2>
</div>
<!-- shard:blurb -->
<div class="j-homepage-blurb">
  <h2>Discover the beauty of craftsmanship that stands the test of time</h2>

  <p>Luxury headwear created by blending timeless craftsmanship with modern
    design. Using traditional straw techniques alongside carefully sourced
    materials, Emily honours heritage and endeavors to renew endangered crafts.
    All pieces are carefully crafted by hand to add a touch of elegance, while
    making a positive impact on the environment.</p>
</div>
<!-- shard:card -->
<div class="j-featured__card"
     style="background-image: url(images/featured-ss25.webp);">
  <a href="collections/ss25/index.html">Explore the SS25 Collection</a>
</div>
<!-- shard:card -->
<div class="j-featured__card"
     style="background-image: url(images/featured-about.webp);">
  <a href="about.html">About Emily Hurst</a>
</div>

These are actually from a little toy SSG that I wrote in Python, so the shards are HTML and the template is Jinja, but you get the idea. Adding extra feature cards or changing the blurb no longer needs you to touch the template, and equally moving things around in the template no longer requires you to mess around with the text content.

This may not sound like much of a win – you can, as you say, do this in other ways – but I think this is a particularly clean way of setting up pages.

This example was, of course, for a “one-off” template – the home page – but the ability to create different pages by supplying the template with different shards is even more powerful with “reusable” templates.

The prior art was just to show how you could go about sharding a blob of text that can contain absolutely anything. Email defines a marker in the header, so that if your standard marker is used somewhere in the email, you can use a different one to mark the boundaries. In my toy implementation, there is no problem because the <!-- shard:xyzzy --> format is very specific, and it needs the presence of one on the first line of the file to trigger sharding.

When would you want to use markdown AND to not structure it in a particular way? If you’re writing a markdown file, you’re doing so because you have an idea that is meant to be expressed from top to bottom and is supposed to be readable outside of any rendering. If you have things that you want to associate with that file but not in any particular order, then that is what extra is for. If you don’t mean for it to be read on its own and it has to be rendered to be understood then wouldn’t any other dictionary like format be better? Eg toml. But I don’t understand when I’d want markdown AND no structure.

But let’s look at your example.

These are actually from a little toy SSG that I wrote in Python, so the shards are HTML and the template is Jinja, but you get the idea.

I’m not sure if your example makes sense for me still, because in your example (with HTML) this would just be done with an include statement in Tera (or blocks). The moment the situation changes to markdown it becomes a different story altogether.

But let’s say for whatever reason that I need to do it in markdown. I would first want consider: is this something I need to do once? Or is it something I plan on doing many times.

If it’s once than I would just reach for extra in the front matter, right? The markdown could even just be saved as a string in the template itself.

On the other hand if I need to render a page multiple times then it sounds much more useful. But still I have a hard time imagining, outside of i8n, such a scenario. But you mentioned that scenario in your response so I will try

the ability to create different pages by supplying the template with different shards is even more powerful with “reusable” templates.

What kind of scenario were you imagining where someone would have many markdown files with different content but overlapping shard keys?

I guess one example I can think of would be maybe if you had like a tl;dr section in your document and you wanted to exclude it from being rendered. I don’t think this would be very easy in normal Zola. You could put it in extra but really the the whole point is you want it as part of the document. Someone reading the plaintext of the document might be happy to have it there but you may not want it literally rendered, for example.

I guess I could imagine the same for a few other cases where maybe my document has like an aside that should be rendered outside the normal flow of the document.

The thing though is to make something like that useful at the reusable template level just doesn’t make sense to me. So I think I’m missing the example of how this would be done better in the case of reusable templates, to better understand.

Even in the examples I gave I think it would make more sense to do what I described using shortcodes. It’s within the document itself that the specific context is available for how it should be rendered. So I would have a short code called ignore, for example, that goes around the tl;dr example I gave that just hides it. Or if there was an aside then I would just have an aside short code that uses the aside html element.

I guess it’s possible that you could have a lot of documents, let’s say posts, that always go a certain way and it gets aggravating typing the short code manually for each one, since you could just write the template once to handle them all. So in that case what you’re saying is the document would be annotated with these “shard keys” in all the parts that you want the template to handle? Let’s use my tl;dr as the example.

What I want to know next is if there’s anything really easier or better about manually typing “shard:tldr” vs manually “{% tldr %}”? They both have beginning and ending regions (just like email boundaries) and they take up about the same number of characters.

So that leads us to the one key difference! Shortcodes are rendered within the normal document flow and your shards proposal would allow for random access. I do think that is neat and could be potentially useful, especially if you need to change other parts of the layout based on the content. Maybe for example there’s an author shard, and outside of the rendering of the document you have some logic in your template that will load a little bio of the author with one name vs the bio of a different author of a different name. But the whole point is it could happen outside the normal flow of the article rendering

Did I understand this correctly? Is that about what you had in mind?

Also this is random but sharding is primarily a database term. I’ve never heard it in the context of a document, but I could be wrong there. However emails are broken into message parts, so maybe part could be good too? Accessible via page/section.part[key]?

In the context of an SSG, I view markdown as just a poor man’s way of writing HTML without writing HTML. The whole point is that it is going to be rendered. Surrendering your document structure to markdown works well in some cases, but I have found it tends to get pretty messy pretty quickly.

Yep. You can do markdown shards in [extra] and get the same effect. I think that is a pretty ugly solution though. Same for load_data with a TOML file where the markdown is placed into multiline strings. It works, but it’s ugly.

Again, you can do this, but your templates then become your content. Indeed, IIRC Hugo suggests that your home page is just a template, with the markdown file simply a TOML file with an .md extension. It’s an ugly solution, once again.

Essentially, any time your web page is not just a simple linear document with a bit of navigation wrapped around it, you’re currently going to be reaching for workarounds. I had a website with multiple pages that consisted of a title, description and then a series of “image rows”, where each image row consisted of an image on one side of the page and some descriptive text on the other. The sides alternated.

In order to make this work, I created a short code for an image row with the image on the left, another for an image row with the image on the right, and then carefully stacked those short codes inside a <div> in the markdown file. I then had to precisely copy this structure for all the other files. If I wanted to change the structure, I had to change all the files.

Contrast this to when shards are available: I put all the text in shards with the same identifier to create a vector of shards, and I put all the image urls in a vector in [extra]. My template can now do all the work of making the image rows, and if I want to change how they work, I just change one template. And yes, I could have just put all my markdown shards in [extra] as well, which would have probably been a better solution…

Personally I have grown to think of shortcodes as the last resort of a desperate man. You generally don’t need them with shards available. However, and this is an important however, there is nothing stopping you using them if you want to. Nothing about this proposal stops you from using precisely the document creation process that you are currently using. It’s fully opt-in, adding a string to the bow of those that want it for the cost of 1 if statement for those that don’t want it and 20 lines of Rust for those that do.

I guess document fragment is the more commonly used term. If you want to s/shard/fragment/g I have no objection.