I posted earlier two separate feature requests that both had to do with how assets are processed. The first was about inlining small assets as base64-encoded data URLs, and the second one was about hashing the contents of an asset and having it as a part of the filename to improve caching.
Thinking about the implementation a bit more, I came to a conclusion that I want to discuss the overall design of how assets are handled in Zola, with @keats and other contributors.
In my understanding, the current system is relatively simple: the build result ends up in public
, and it is sourced from content
(markdown compilation + asset co-location), sass
(sass compilation), static
(copying assets as-is), plus templates & config that are used in the compilation.
At the moment, besides specialised steps such as markdown and sass compilation that form the “core” of Zola, there are hardly any asset processing, except for resize_image
. The results of the compilation are placed automatically directly in public
. resize_image
is, on the other hand, handled differently: the results of the resize are placed in static
, the contents of which are then copied to public
. This poses (almost) no problems in the case the asset processing is a single-step process.
Actually, there is one problem even with simple prosessing: one might want to keep the original high-quality images from which to process the resized ones in repo, but not publish them. At the moment, the originals, if kept inside content
, are published too. This is fixable using the ignored_content
directive, but feels a bit ad-hoc.
However, if one wishes to do multi-step processing, this all seems to be a bit too simplistic: let’s say I want to first resize an image, and then inline it as a data URL. This is a two step process. (One could provide a helper that does it as a single step, but that would lose compositionality, and make the tools very special-cased; they wouldn’t in my mind, bear their weight.)
At the moment, the result of resize_image
is the processed image in the static/processed_images
dir and an absolute URL pointing to where the image should end up when the site gets deployed. However, in order for the steps to be able to composed, the steps between the original source file and the end result in public
need to be well-behaved and somewhat standardised.
So, here’s what I perceive to be desirable changes to the system:
First of all, I don’t think that placing processed_images
in static
is ideal – I think there should be a clear separation between the “source” files, including unprocessed versions of assets, the intermediate steps, and the end result. Why? Because the role of these files are different. The source files are the most valuable to have in the repository, because they are the “source of truth”. The end result is what gets published, so the deploying system must understand that. And the middle steps are neither of those, but any composable helper/processing functions should be able to find and output them. Placing the resized images under static
mixes up the “source of truth” kind of assets and the “middle step” assets. I think there should be a separate directory for the intermediate steps. (temp
would be an obvious choice, but I have no hard preferences about the name as long as it’s understandable.)
Secondly, and this is if we want to do the hash-renaming of assets: the asset filenames cease to be predictable: I imagine that referring to assets from templates would then happen via a function call. For example: {{ asset("content/test.jpg") }}
that resolves to /assets/test-3d7995745c319258643d2511b5d53fd644bc214b339db364ee2bfebd971ad4c1.jpg
(In the markdown part this could be automatic.) Unlike resize_image
, it is a general mechanism, so one could use it for any kind of asset. I imagine that it would be the “final step” of the asset processing, that besides hashing, would mark a file in temp
as one that will get published and copied directly to public
.
Third, because calling asset
would be required (I’m imagining this in the case the user has opted in in this “asset pipeline” feature) for publishing the asset as a file, that means that not all assets in temp
are published. Indeed, if all helper functions that do asset processing accept a path relative to the project root and place the end result in temp
, outputting the path, e.g. temp/asset-af1d7ccd937811502e3b63f2daa6a5e7694feb1d4152237156717e769fc33aba.txt
, the helpers become trivially composable and provided that they also use the hash-enabled filenames, cacheable.
Finally, I think that asset colocation is hard to access from templates at the moment; it would be helpful, if instead of an array of colocated assets, the pages could have a HashMap<Filename, Filepath>
. Then, it would be easier to refer just to the filename in markdown while calling shortcodes, and still be able to easily get the full path of the original source asset in order to process or publish it.
So, what do you think about the overall picture of this design?