Thoughts about asset prossessing

I posted earlier two separate feature requests that both had to do with how assets are processed. The first was about inlining small assets as base64-encoded data URLs, and the second one was about hashing the contents of an asset and having it as a part of the filename to improve caching.

Thinking about the implementation a bit more, I came to a conclusion that I want to discuss the overall design of how assets are handled in Zola, with @keats and other contributors.

In my understanding, the current system is relatively simple: the build result ends up in public, and it is sourced from content (markdown compilation + asset co-location), sass (sass compilation), static (copying assets as-is), plus templates & config that are used in the compilation.

At the moment, besides specialised steps such as markdown and sass compilation that form the “core” of Zola, there are hardly any asset processing, except for resize_image. The results of the compilation are placed automatically directly in public. resize_image is, on the other hand, handled differently: the results of the resize are placed in static, the contents of which are then copied to public. This poses (almost) no problems in the case the asset processing is a single-step process.

Actually, there is one problem even with simple prosessing: one might want to keep the original high-quality images from which to process the resized ones in repo, but not publish them. At the moment, the originals, if kept inside content, are published too. This is fixable using the ignored_content directive, but feels a bit ad-hoc.

However, if one wishes to do multi-step processing, this all seems to be a bit too simplistic: let’s say I want to first resize an image, and then inline it as a data URL. This is a two step process. (One could provide a helper that does it as a single step, but that would lose compositionality, and make the tools very special-cased; they wouldn’t in my mind, bear their weight.)

At the moment, the result of resize_image is the processed image in the static/processed_images dir and an absolute URL pointing to where the image should end up when the site gets deployed. However, in order for the steps to be able to composed, the steps between the original source file and the end result in public need to be well-behaved and somewhat standardised.

So, here’s what I perceive to be desirable changes to the system:

First of all, I don’t think that placing processed_images in static is ideal – I think there should be a clear separation between the “source” files, including unprocessed versions of assets, the intermediate steps, and the end result. Why? Because the role of these files are different. The source files are the most valuable to have in the repository, because they are the “source of truth”. The end result is what gets published, so the deploying system must understand that. And the middle steps are neither of those, but any composable helper/processing functions should be able to find and output them. Placing the resized images under static mixes up the “source of truth” kind of assets and the “middle step” assets. I think there should be a separate directory for the intermediate steps. (temp would be an obvious choice, but I have no hard preferences about the name as long as it’s understandable.)

Secondly, and this is if we want to do the hash-renaming of assets: the asset filenames cease to be predictable: I imagine that referring to assets from templates would then happen via a function call. For example: {{ asset("content/test.jpg") }} that resolves to /assets/test-3d7995745c319258643d2511b5d53fd644bc214b339db364ee2bfebd971ad4c1.jpg (In the markdown part this could be automatic.) Unlike resize_image, it is a general mechanism, so one could use it for any kind of asset. I imagine that it would be the “final step” of the asset processing, that besides hashing, would mark a file in temp as one that will get published and copied directly to public.

Third, because calling asset would be required (I’m imagining this in the case the user has opted in in this “asset pipeline” feature) for publishing the asset as a file, that means that not all assets in temp are published. Indeed, if all helper functions that do asset processing accept a path relative to the project root and place the end result in temp, outputting the path, e.g. temp/asset-af1d7ccd937811502e3b63f2daa6a5e7694feb1d4152237156717e769fc33aba.txt, the helpers become trivially composable and provided that they also use the hash-enabled filenames, cacheable.

Finally, I think that asset colocation is hard to access from templates at the moment; it would be helpful, if instead of an array of colocated assets, the pages could have a HashMap<Filename, Filepath>. Then, it would be easier to refer just to the filename in markdown while calling shortcodes, and still be able to easily get the full path of the original source asset in order to process or publish it.

So, what do you think about the overall picture of this design?

Some additional details that I thought about but not entirely sure of:

  1. asset would take in any path in the repo, in the case one wants to publish assets without any processing, so temp wouldn’t be special from the viewpoint of this function.
  2. static would make sense as a directory that get’s published regardless of whether asset was called for its files. However, files published from there would keep their original file names. (Of course because of the point 1. one could also publish a file in static with asset, in which case the filename would contain the hash)
  3. temp could be made to enjoy special treatment in the sense that Zola would trust the hash in its filenames to be correct without checking the contents.
  4. How to reconcile sass with this system? If one would want “full unification” without it getting any special treatment, I would guess that having opted in to the asset pipeline, user would link to the generated css like this {{ asset(sass("sass/styles.scss")) }}. If one wants to keep some special treatment, here’s some alternatives:
    • Provide a variable sass that already contains the result path of the compilation in temp and use it like {{ asset(sass) }}.
    • Provide a variable sass that already contains the result path of the compilation AND the “asset export” in public.
  5. Some other kinds of asset processing I’m thinking of, besides image processing and inlining, is web font optimisation, but it might go pretty far out of scope of what Zola wants to do.

We do already have the url_for function with a cachebust option which can append a ?h<sha256> path to the url. Does that solve the hashing for things coming from static? It also does work to get a link for a given content, not just assets.

The point of having the processed assets in the static folder is that they are committed in the repo and do not need to be reprocessed unless they change, putting them in a tmp folder would be annoying for me.

I’m not sure I like putting more asset processing in templates though. Images are a natural fit there because that’s where you are using it but sass?