I know you’ve previously discussed the inability to output custom fields in the RUST generated elasticlunr.js JSON index. I accept custom search fields would be mission creep and Zola needs to stick to what it does best. But... Would it be possible to specify a trim size for the page data?
For example; At it’s simplest, a config.toml variable that said search_truncate = 200 i.e. first 200 hundred characters would be used to build the stem words and be held in the index JSON data, you would simply skip the rest. If the page is < 200 characters long you just roll with what you’ve got.
You change none of the current implementation other than to lookup a configuration variable which determines if you trim the page length data.
I think one of zola’s USP’s is the search built in. Implementing search for a static website can be a major pain or worse still cost money.
Implementing this basic limitation does a few things:
It makes search possible for websites with a small number of pages, even when the pages contain a lot of character data/text.
It gives people, who have 100’s of pages, the flexibility to reduce the file size of the search_index.en.js file to keep it usable before having to look for alternative search engine options.
Possibility to set search_truncate = 0 which would effectively make the search title only
In theory it’s not breaking any of the current implementation, it all works exactly the same way, except the data you generate the search index is trimmed first.
At this point your thinking, omg these guys just don’t get it. It’s impossible or hopefully, this one is doable
I think there are many ways to improve search, from limiting the amount of text to selecting only title/description or some random field in [extra]
I need to create a GH issue for that this weekend
Great, thanks. We’ve currently got a 5mb json file (around 250,000 words) which when gzip’d is around 500k over the network, this is fine on a desktop, but interestingly on mobile the JVM on the mobile browser takes ages 5+ seconds on older phones to decompress and scan/load the database it’s quite slow. So having options to tweak the size and how much is indexed would be a really valuable.
I use Zola in local knowledgebase and I’ve found only one solution we discussed there: to build my own version of zola binary and search index with just title&description. It works despite of some bounds, an example, I should use separate debian virtual server with ownbuilded zola and nginx to serve html files. It would be super useful to have settings string for search fields in config.toml
I’ve noticed the discussion on Zola git has started to talk about a completely new search engine, which when I took a look at it, does not even provide a stand alone static website javascript library. I’m not sure if I’m reading this the wrong way, but I’d like to make sure that Zola search evolves, rather than being replaced.
I (and I am sure lots of other people) have spent quite a bit of time and effort getting the current search functionality to work. It works fine, in fact it’s an elegant solution, the only downside is the lack of any ability to optimise the json index file size to suit different use cases (like mobile).
To be clear, I am happy with the way search is implemented, improvements would be welcome. Please don’t move away from a static website compatible implementation. Please consider carefully the impact of breaking changes, I’m sure you will
No the goal for any search engine in Zola would be to have something that works without a server so Tantivy is out. Then it’s just a matter of how good the results are, if we can have the same UX for search as right now but with a better search engine, why not.
I know this is probably not a huge priority for everyone, but over on https://adeptenglish.com/ we now have an index file that’s 6mb (600kb over the wire).
On desktop this is no big deal, but on mobile the 600kb is not great but OK, its the decompression time to unzip the JavaScript to its 6mb is killing older mobile phones. Google lighthouse bench-marking is hideous.
It’s getting to the point where we are going to switch it off.
All it needs is the ability to truncate the amount of text taken from page.content in the short term. So we can index on first 300-500 words, and ignore the other 2000+ words in the article.
Mean? Are you asking me to raise a PR? I don’t know how to do that, but I’ll have a go if you show me how.
I’ve been looking at the search_index.en.js json output and interestingly enough I was able to reduce the file size just removing the full URL and replacing with a \.
For example
Using notepad++ I just global replaced "https://adeptenglish.com/" with "/"
So on a local search index the original was 6,713,339 bytes (Before Gzip over the wire)
It appears no functional change happened, i.e. it seems to work fine. Around a 20% saving for free. Your mileage will vary based on domain length. This is a huge saving and should be trivial? What do you think?
I’m still very keen to see an improvement in the search but I’ll take a quick win
It means if anyone wants to implement it, I’ll merge it It’s on the TODO list for the next release but it will be faster if someone else implements it since the list is long (0.12 · GitHub) and I don’t have a lot of time right now.
For this change, we need to add some options in the config and then just change what we are passing to the search index in https://github.com/getzola/zola/blob/master/components/search/src/lib.rs depending on those config options.
Using the path instead of the permalink is a really good idea too.
looking forward the new release with the new search features. thanks.
meanwhile, and sorry if this is stating the obvious, what worked for me was to defer loading of /search_index.en.js and only initialise the search when/if a user start typing in the search field.