[RFC] i18n


#1

Summary

This feature aims to make Zola support multilingual sites for both content and templates.
Note that content and template translations are mostly orthogonal and can be implemented separately.

Motivation

Some sites are in multiple languages and want to share templates/content while only translating bits of the UI of the actual markdown content. Some sites also have (some) of their content in multiple languages.

Guide-level explanation

Overview

Configuration

The default language is set by default_language in the config.toml which currently exists.
To allow supporting many languages, a new lang or translations directory can be added, containing files for each language. Format to be chosen is still up in the air but we don’t want to invent something new so we need to definitely go for something standard. See https://github.com/getzola/zola/pull/111#issuecomment-410634780 for some discussion on that.

A user should probably also need to define which languages they want to add to their site so we can warn on invalid languages (eg a typo in a filename should not create a new language in Zola).

Url

The default language base url will be equal to config.base_url.

Other languages will be available at {config.base_url}/{language}.

There will be an option (force_i18n_redirect or something similar) in the config to always redirect to a URL with a language code in it, ie base_url would redirect to {base_url}/{default_language}.

File organisation

The content files have to have the same name for multiple languages.

The language is defined in the extension prefix: {name}.{language_code}.{extension}

The language code can be omitted for the default language.


content

├── _index.md
├── _index.fr.md
├── about.md
├── about.fr.md
└── some_section
   ├── _index.md
   └── _index.fr.md

Templates

All templates will get an additional parameter: lang for the current language (en, fr, etc) and a new global Tera function will be added to get the i18n values out of the config.toml.


{{ trans(key="title", lang=lang) }}

If lang is not provided, it will look up the key in the default language map.

get_page, get_section and get_url will also have to take an optional lang parameter.

Each page/section will have a languages map pointing to al the other languages pages of the same content.

Content

Internal links in the markdown text will refer to the current lang and fallback to the actual file if nothing was found.

Reference-level explanation

Loading

We need to revamp the whole content loading:

  • load every page/section and add the language if there is one, we can set it to config.default_language automatically
  • if we have several languages, we try to reconcile pages/sections based on filename in a map {lang -> key} on each page/section to point to other version of the same content
  • we populate sections as usual except we filter by language for pages/subsections, eg a _index.fr.md will only get .fr.md pages

Template and rendering

Rendering will be done language by language and the lang will be passed in all Tera context.

RSS

A separate RSS file will be rendered for each language:

  • {base_url}/rss.xml

  • {base_url}/fr/rss.xml

and so on.

Taxonomies

How do we handle them? Are users going to use the same taxonomy name (let’s say authors for example) for content in English/French/Italian or people would create a auteurs taxonomy in French and so on. The issue with putting everything in the same taxonomy is that the URL would look like fr/authors with no way of changing the url apart from making the taxonomy config more complex.

Drawbacks

This change requires changing pretty much all the logic in Zola to account for multiple languages for a feature used by a minority of people. Not having it is a deal breaker for those people.

Rationale and Alternatives

This design is similar to Hugo and Lektor and is the simplest to use from a user point of view, as far as I know.

Alternative design: do nothing

A user could create a fr section and mirror the content of the default language for example.

This is even simpler than the solution in the RFC but it doesn’t work well with posts with assets as those will need to be duplicated or put in the static folder.

i18n in the templates could be solved by adding a lang attribute in the extra of the front-matter of section of pages by the user and using that.

Pros

  • Easy to implement: pretty much nothing to do

  • Easier to understand (imo)

Cons

  • No RSS per language

  • Harder to get started for a user

  • No way to link articles from different languages unless it is done manually in the front-matter

Unresolved questions

RSS

How to only render a RSS feed for a given language if generate_rss is set to true. Do we even care about that feature?

Content with url set

Some content can have a path hardcoded in the front-matter: what to do with multiple languages version of such content?

Prepending the language code to the URL is the obvious solution but is not intuitive and change implicitly the meaning of the url field so it’s not an option.
I think having 2 files with the same url in front-matter should not be allowed.


#2

I have started implementing it, it is a bit trickier than I thought!
Braindump / rubber ducking incoming.

Permalinks will be prefixed by the language: a file named content.fr.md at the root of content will be available at $base_url/fr/content unless the page defines its own path in the front-matter. The issue with that is that you cannot have multi-host languages setup like shown in https://gohugo.io/content-management/multilingual/#configure-multilingual-multihost

I can see the usecase for that but it would make things more complex (again) for a small set of users. Anyone needing that? I’m inclined to say no personally.

In terms of code, all the loading/populating sections will need to be done for each configured language and the default one.

I am still not sure what to do for the taxonomies.

By default, pages with the same location/name outside of the language code will be linked together. What about something like https://gohugo.io/content-management/multilingual/#bypassing-default-linking ? That does seem overkill.


#3

Pushed most of the content translation to GitHub.

The main points still missing are:

  • taxonomies
  • list of translations for the same content
  • per language RSS

#4

RSS added, listing translation of content is next.

Has anyone got some ideas on how it should work for taxonomies?


#5

As someone who is using Zola for a few static sites for a very small business that need to be multilingual I have to say I am super excited about this. The improvements suggested in this RFC will make my life so much easier.

I would suggest just using the same taxonomies for all languages and not worry about translating the url. Browsers are now fading away the path part of the url and only highlighting the domain name, so translating the path isn’t as important as it used to be.

One thing I want to ask about is if there will be a good way to access the url of other language versions of the current section/page in a template? I am mainly thinking about being able to make a language switcher that switches to another language version of the current page.


#6

Yep, that’s what listing translation of content is next. is about. Each page/section will get a list of TranslatedContent with the lang, permalink and title.


#7

Translated content list is now on the next branch: https://github.com/getzola/zola/pull/567

Please try it when you have time. Only missing the taxonomies now.


#8

I’m also interested in multilingual websites. I think translated URLs could be useful when used non-internally. Like on printed stuff or when being shared.

I’m going to try the next branch in the next days.


#9

You should be able to set the slug in the front-matter to have translated URLs. It is using the filename as source of truth for translations after all.

Unless you meant taxonomies?


#10

I thought taxonomies was related to translated URLs. I just read https://gohugo.io/content-management/taxonomies/ and I think I was mistaken.


#11

It’s still an open question: if you have taxonomies, you probably want to translate those as well.
If for example you have a version in French and the tags are “travel” instead of “voyage” for example, it might be confusing.

My idea:

  • each taxonomy can have an optional lang code for which it applies to
  • when rendering a taxonomy list, render it per languages so /tags will have travel but not voyage and fr/tags will have voyage and not travel. Setting travel on a French article would error since there is no tags of that name for that language.

#12

I just pushed the per-language taxonomies.

I think it’s all there for the multilingual content, let me know if I missed something. If you have ideas on how to improve it as well, I’ll take them.

Regarding the translation of template themselves, I think I’ll need more data: what level of “power” do you need? Is the current trans function + strings in config.toml good enough?
Do we need to add something like https://projectfluent.org/ to the templates? Etc…