[RFC] i18n


#1

Summary

This feature aims to make Zola support multilingual sites for both content and templates.
Note that content and template translations are mostly orthogonal and can be implemented separately.

Motivation

Some sites are in multiple languages and want to share templates/content while only translating bits of the UI of the actual markdown content. Some sites also have (some) of their content in multiple languages.

Guide-level explanation

Overview

Configuration

The default language is set by default_language in the config.toml which currently exists.
To allow supporting many languages, a new lang or translations directory can be added, containing files for each language. Format to be chosen is still up in the air but we don’t want to invent something new so we need to definitely go for something standard. See https://github.com/getzola/zola/pull/111#issuecomment-410634780 for some discussion on that.

A user should probably also need to define which languages they want to add to their site so we can warn on invalid languages (eg a typo in a filename should not create a new language in Zola).

Url

The default language base url will be equal to config.base_url.

Other languages will be available at {config.base_url}/{language}.

There will be an option (force_i18n_redirect or something similar) in the config to always redirect to a URL with a language code in it, ie base_url would redirect to {base_url}/{default_language}.

File organisation

The content files have to have the same name for multiple languages.

The language is defined in the extension prefix: {name}.{language_code}.{extension}

The language code can be omitted for the default language.


content

├── _index.md
├── _index.fr.md
├── about.md
├── about.fr.md
└── some_section
   ├── _index.md
   └── _index.fr.md

Templates

All templates will get an additional parameter: lang for the current language (en, fr, etc) and a new global Tera function will be added to get the i18n values out of the config.toml.


{{ trans(key="title", lang=lang) }}

If lang is not provided, it will look up the key in the default language map.

get_page, get_section and get_url will also have to take an optional lang parameter.

Each page/section will have a languages map pointing to al the other languages pages of the same content.

Content

Internal links in the markdown text will refer to the current lang and fallback to the actual file if nothing was found.

Reference-level explanation

Loading

We need to revamp the whole content loading:

  • load every page/section and add the language if there is one, we can set it to config.default_language automatically
  • if we have several languages, we try to reconcile pages/sections based on filename in a map {lang -> key} on each page/section to point to other version of the same content
  • we populate sections as usual except we filter by language for pages/subsections, eg a _index.fr.md will only get .fr.md pages

Template and rendering

Rendering will be done language by language and the lang will be passed in all Tera context.

RSS

A separate RSS file will be rendered for each language:

  • {base_url}/rss.xml

  • {base_url}/fr/rss.xml

and so on.

Taxonomies

How do we handle them? Are users going to use the same taxonomy name (let’s say authors for example) for content in English/French/Italian or people would create a auteurs taxonomy in French and so on. The issue with putting everything in the same taxonomy is that the URL would look like fr/authors with no way of changing the url apart from making the taxonomy config more complex.

Drawbacks

This change requires changing pretty much all the logic in Zola to account for multiple languages for a feature used by a minority of people. Not having it is a deal breaker for those people.

Rationale and Alternatives

This design is similar to Hugo and Lektor and is the simplest to use from a user point of view, as far as I know.

Alternative design: do nothing

A user could create a fr section and mirror the content of the default language for example.

This is even simpler than the solution in the RFC but it doesn’t work well with posts with assets as those will need to be duplicated or put in the static folder.

i18n in the templates could be solved by adding a lang attribute in the extra of the front-matter of section of pages by the user and using that.

Pros

  • Easy to implement: pretty much nothing to do

  • Easier to understand (imo)

Cons

  • No RSS per language

  • Harder to get started for a user

  • No way to link articles from different languages unless it is done manually in the front-matter

Unresolved questions

RSS

How to only render a RSS feed for a given language if generate_rss is set to true. Do we even care about that feature?

Content with url set

Some content can have a path hardcoded in the front-matter: what to do with multiple languages version of such content?

Prepending the language code to the URL is the obvious solution but is not intuitive and change implicitly the meaning of the url field so it’s not an option.
I think having 2 files with the same url in front-matter should not be allowed.


#2

I have started implementing it, it is a bit trickier than I thought!
Braindump / rubber ducking incoming.

Permalinks will be prefixed by the language: a file named content.fr.md at the root of content will be available at $base_url/fr/content unless the page defines its own path in the front-matter. The issue with that is that you cannot have multi-host languages setup like shown in https://gohugo.io/content-management/multilingual/#configure-multilingual-multihost

I can see the usecase for that but it would make things more complex (again) for a small set of users. Anyone needing that? I’m inclined to say no personally.

In terms of code, all the loading/populating sections will need to be done for each configured language and the default one.

I am still not sure what to do for the taxonomies.

By default, pages with the same location/name outside of the language code will be linked together. What about something like https://gohugo.io/content-management/multilingual/#bypassing-default-linking ? That does seem overkill.


#3

Pushed most of the content translation to GitHub.

The main points still missing are:

  • taxonomies
  • list of translations for the same content
  • per language RSS

#4

RSS added, listing translation of content is next.

Has anyone got some ideas on how it should work for taxonomies?


#5

As someone who is using Zola for a few static sites for a very small business that need to be multilingual I have to say I am super excited about this. The improvements suggested in this RFC will make my life so much easier.

I would suggest just using the same taxonomies for all languages and not worry about translating the url. Browsers are now fading away the path part of the url and only highlighting the domain name, so translating the path isn’t as important as it used to be.

One thing I want to ask about is if there will be a good way to access the url of other language versions of the current section/page in a template? I am mainly thinking about being able to make a language switcher that switches to another language version of the current page.


#6

Yep, that’s what listing translation of content is next. is about. Each page/section will get a list of TranslatedContent with the lang, permalink and title.


#7

Translated content list is now on the next branch: https://github.com/getzola/zola/pull/567

Please try it when you have time. Only missing the taxonomies now.


#8

I’m also interested in multilingual websites. I think translated URLs could be useful when used non-internally. Like on printed stuff or when being shared.

I’m going to try the next branch in the next days.


#9

You should be able to set the slug in the front-matter to have translated URLs. It is using the filename as source of truth for translations after all.

Unless you meant taxonomies?


#10

I thought taxonomies was related to translated URLs. I just read https://gohugo.io/content-management/taxonomies/ and I think I was mistaken.


#11

It’s still an open question: if you have taxonomies, you probably want to translate those as well.
If for example you have a version in French and the tags are “travel” instead of “voyage” for example, it might be confusing.

My idea:

  • each taxonomy can have an optional lang code for which it applies to
  • when rendering a taxonomy list, render it per languages so /tags will have travel but not voyage and fr/tags will have voyage and not travel. Setting travel on a French article would error since there is no tags of that name for that language.

#12

I just pushed the per-language taxonomies.

I think it’s all there for the multilingual content, let me know if I missed something. If you have ideas on how to improve it as well, I’ll take them.

Regarding the translation of template themselves, I think I’ll need more data: what level of “power” do you need? Is the current trans function + strings in config.toml good enough?
Do we need to add something like https://projectfluent.org/ to the templates? Etc…


#13

Has anyone used it? Any feedback?


#14

I pulled and built the next branch of Zola and started trying to migrate a multilingual Zola site I maintain to the new functionality. Generally this is a huge improvement. Co-locating different language version next to each other makes keeping track of them much simpler. And more proper tools for getting translations of content means I can get rid of a huge amount of gnarly logic from my templates.

Here are the misunderstandings/issue I’ve come across thus far:

The way I have templates set up is that i have a _layout.html that is then extended by index.html, section.html and page.html. Because I need a language switcher everywhere I put it in_layout.html`, but that makes the code akward because it has to work for both pages and sections:

<div class="lang-picker">
    {% if section %}
        <span>{{trans(key=section.lang|default(value=config.default_language))}}</span>
        {% for t in section.translations %}
            <a href="{{t.permalink|safe}}">{{trans(key=t.lang|default(value=config.default_language))}}</a>
        {% endfor %}
    {% else %}
        <span>{{trans(key=page.lang|default(value=config.default_language))}}</span>
        {% for t in page.translations %}
            <a href="{{t.permalink|safe}}">{{trans(key=t.lang|default(value=config.default_language))}}</a>
        {% endfor %}
    {% endif %}
</div>

It would be nice if there was some way to not have to duplicate the code, once for sections and again for pages.

Another thing that is awkward is the constant dance with t.lang|default(value=config.default_language) to handle the default language case. Would it be possible to just set lang to default_language if no language is specified in the filename?

There is a weird double trailing slash thing happening when linking to the non default language front page. If I use the language switcher above to switch to the fi page the path ends up as: /fi// .

I can’t figure out how to link to the current language version of a specific page. In my case I want clicking on the logo to take to user back to the current language version of the front page. Problem is that get_url(path='./_index.md') always takes me to the English version, even if I am on a /fi/something page. Maybe I’m missing something?

I’m not sure I like having translations in config.toml. Its a weird mixing of config and content. Also, if there are a lot of translations the file will get huge. I would probably prefer to have a translations directory with a file per language.

These are the things I’ve noticed thus far, but I haven’t migrated everything on the site yet. This site doesn’t use any taxonomies so I can’t really evaluate how well that works.

Regarding project fluent, in my experience you don’t get very far with translations before you want to at least want to do something with plurals. Just dealing with plurals in different languages can get really complex real quick and you find yourself wanting something fully featured, like project fluent. Looking at it it seems like a really nice system in Rust, so I would recommend going with that if it it an option.

Despite my issues above, don’t mistake my basic reaction to this. This is really great.


#15

Thanks for the feedback!

For the lang picker, you can make it a macro that takes an object and since they have the same interface it should work fine.
Something like

<div class="lang-picker">
    {% if section %}
      {{ macros::lang_picker(obj=section) }}
    {% else %}
      {{ macros::lang_picker(obj=page) }}
   {% endif %}
</div>

Good idea.

Sounds like a bug!

Hmm, I think get_url will have to take an optional lang or you could have that logo in a block and override it on each lang layout to point to _index.fi.md instead.

100% agreed, trans was added as a stopgap a few versions ago but is not going to work for real translations. As you said, you need plural handling for most serious i18n work. I am just waiting to see if static sites are complex enough that they actually do need something like Fluent or if trans is enough for 99% of the sites.


#16

@johansigfrids

Would it be possible to just set lang to default_language if no language is specified in the filename?

This is now the case, for taxonomies as well.

I can’t reproduce the /fi//, do you have an example repo somewhere?


#17
<div class="lang-picker">
   {% if section %}
     {{ macros::lang_picker(obj=section) }}
   {% else %}
     {{ macros::lang_picker(obj=page) }}
  {% endif %}
</div>

That looks like a cleaner solution. It makes me wish for short-circuting boolean expressions in Tera:

<div class="lang-picker">
      {{ macros::lang_picker(obj=(section or page)) }}
</div>

Hmm, I think get_url will have to take an optional lang or you could have that logo in a block and override it on each lang layout to point to _index.fi.md instead.

An optional lang argument similar to how trans works sounds like a great idea. That way it defaults to current language but you can override it if you want to cross link languages. Maybe for get_page and get_section as well?

As you said, you need plural handling for most serious i18n work. I am just waiting to see if static sites are complex enough that they actually do need something like Fluent or if trans is enough for 99% of the sites.

90% might be realistic, but I think 99% might be optimistic. Even for a simple blog someone will want to do Category/Categories to show their taxonomy for the blog post. But I suppose one can hack around it by sticking logic in the template instead, but that can get messy.

This is now the case, for taxonomies as well.

Great!

I can’t reproduce the /fi//, do you have an example repo somewhere?

I put together a repro here https://github.com/johansigfrids/zola_double_slash Using the lang links at the top leads to /fi//.


#18

You can simulate it like so:

<div class="lang-picker">
      {{ macros::lang_picker(obj=section | default(value=page)) }}
</div>

Right now taxonomies are language specific so Category/Categories will correspond to only one language. Definitely an issue if you want to have en-us and en-gb for example.

Fixed, thanks for the repro!

I have to think about it a bit, I thought people would use template blocks more rather than relying on the lang parameter.


#19

Any other feedback in the meantime?


#20

I’m trying the next branch. It’s the first time I’m trying Zola so I probably did something wrong but my index files doesn’t seems to be used when I browse “/”, “/fr” or “/en”.

Instead I think I’m seeing all the articles of the selected language. Do I need to set something?

config.toml

theme = "feather"

default_language = "fr"
languages = [
    {code = "fr"}, # there will be a RSS feed for French content
    {code = "en"}, # there won't be a RSS feed for Italian content
]
.
├── config.toml
├── content
│   ├── an-article.en.md
│   ├── an-article.md
│   ├── _index.en.md
│   └── _index.md
├── sass
├── static
├── templates
└── themes
    └── feather
        ├── CODE_OF_CONDUCT.md
        ├── config.toml
        ├── content
[...]