Skip to content

Issue/vivo 3606 : add language-specific sorting and label fields to search index

backups requested to merge github/fork/brianjlowe/issue/VIVO-3606 into main

Created by: brianjlowe

Issue VIVO-3606:

What does this pull request do?

  • Populates a sort field and label field in the search index for each language tag found in the list of labels for an individual.
  • If RDFService.languageFilter = true in runtime.properties, the sort field that corresponds to the current locale is used to sort the individual lists in the VClassGroup-based browse pages (People, Organizations, Events, etc.). The original nameLowercasedSingleValue field is used as a secondary sort in case this field is not populated.
  • alpha (A*) browse lookups search either for documents where the locale-specific sort field starts with the selected letter OR documents where the locale-specific sort field does not exist at all but the default nameLowerCaseSingleValued starts with the selected letter. This should prevent content from disappearing entirely if the locale-specific sort field is not available, though it may mean that individuals appear under the wrong letter (unchanged from current behavior).
  • When displaying the results of an autocomplete request, the label corresponding to the current locale is displayed if available. If not, the original field nameRaw is used instead.

Note that this approach is intended only to enable the minimum sorting / autocomplete functionality needed by production i18nized sites. It has (inter alia) the following key limitations:

  • There is only one level of fallback (e.g. from nameLowercasedSingleValue_de-DE_s to nameLowercasedSingleValue). There is no attempt to try de or de-AT if de-DE is not available, which means that improperly sorted values may appear if the appropriate label is not available. Similarly, content may continue to appear under the wrong alpha heading.
  • Because all sort fields use the dynamic string type _s, there is no language-specific collation enabled. All languages will sort based on (lowercased) unicode values and not by more complex rules. It would be nice to add this in the future, but will require the ability to modify the Solr schema according to the languages in use.

What's new?

  • SelectQueryDocumentModifierDynamicTargetField extends SelectQueryDocumentModifier for queries that return the na e of the search index field to modify in the ?targetField variable.
  • Two new document modifiers are added to home/rdf/display/everytime to populate the i18nized sort and label fields.
  • AutocompleteController/IndividualListController/SearchQueryUtils are modified to take advantage of the new fields.

How should this be tested?

  • In runtime.properties, enable RDFService.languageFilter = true and add en_US, es, fr_CA, and de_DE as the selectable locales.
  • Load sorttest.n3.txt (attached).
  • Switch between the four locales and observe that the items in the People tab sort and alpha-filter properly, displaying the language-appropriate label in parentheses except in the case of 'Yanny' when de_DE is selected. In the latter case, Yanny (en-US) will be displayed instead.
  • Observe that Yanny is still browsable on the People tab when the locale is set to de_DE, even though there is no de-DE label for the individual.
  • Add a publication to the DB. Add an author. Autocomplete on the author names 'Alpha', 'Bravo', 'Charlie' or 'Delta'. Note that all 4 individuals are returned when you type one of these names. This is because the autocomplete edgengram is not language-specific, and is out of scope for this PR. The improvement with this PR is that the labels in the autocomplete dropdown will change according to your currently-selected locale.

Interested parties

@VIVO-project/vivo-committers

sorttest.n3.txt

Merge request reports