Fredhopper Discovery Platform 26.4. is fully compatible with the previous release, FAS 26.3.
No special actions are required when upgrading from FAS 26.3 to FAS 26.4.
Lucene upgrade
This release includes an upgrade to Lucene 10.4.0.
No immediate re-index is required after upgrading, as Lucene 10.x continues to support index formats created with Lucene 9.x.
Existing indexes will continue to function without changes.
Warning: If you downgrade from FAS 26.4 to FAS 26.3, a full re-index is required. This is due to the Lucene version change (FAS 26.4.0 → Lucene 10.x and FAS 26.3 → Lucene 9.x). Lucene 9.x cannot read indexes created or modified by Lucene 10.x.
Search and language processing changes
The following updates have been made to stemming and tokenization components:
German stem filters
GermanStemFilterPorter2 (ID: german-stem-filter-porter-2) is now equivalent to GermanStemFilterPorter1 (ID: german-stem-filter-porter-1).
Both filters now use Lucene’s
org.tartarus.snowball.ext.GermanStemmer.
Czech stem filter
CzechStemFilter (ID: czech-stem-filter) now uses:Lucene’s Light Stemmer for Czech (
CzechStemmer)The following modes are no longer supported: "LIGHT" and "AGGRESSIVE".
English stem filter (Lovins)
EnglishStemFilterLovins (ID: english-stem-filter-lovins) is now an alias of EnglishStemmer. This change is due to its removal in Lucene 10: https://github.com/apache/lucene/pull/13230.
Note: This filter was not detected in production use and will be removed in a future FAS release.
IKTokenizer removal
IKTokenizer(ID:ik-tokenizer) has been removed because it depended on an unsupported third-party library tied to Lucene 8.5 (https://github.com/magese/ik-analyzer-solr/)
Note: This tokenizer was not found to be used in production environments.
Excerpt from Lucene 10.4.0 migration guide
Snowball Dependency Update
The "German2" stemmer has been merged into the "German" stemmer, so "German2" is no longer available. In Lucene APIs (TokenFilter, TokenFilterFactory) that accept a String, "German2" will now automatically map to "German" to maintain compatibility. If your code was using the German2Stemmer, you should update it to use the GermanStemmer instead. For more details, refer to the German Stemming Algorithm Variant - Snowball.Romanian Analysis Update
The RomanianAnalyzer now supports modern Unicode for Romanian text and normalizes cedilla characters to comma-based forms. Both variations are still in use, so it's recommended to re-index any Romanian documents.Persian Analysis Update
PersianAnalyzer now includes the PersianStemFilter (LUCENE-10312), which may alter analysis results. If you need to maintain the exact behavior from version 9.x, consider cloning the PersianAnalyzer from version 9.x or creating a custom analyzer using CustomAnalyzer.Kuromoji User Dictionary Validation
The Kuromoji user dictionary now strictly checks that concatenated segments match the surface form exactly. This change prevents unexpected runtime errors. Any invalid entries will trigger an exception when loading the dictionary file.Japanese Tokenizer Behavior Change
The JapaneseTokenizer no longer outputs original compound tokens by default when the mode is not set to NORMAL (LUCENE-9123). Instead, it outputs decompounded tokens (e.g., "株式" and "会社" from "株式会社"). To restore the original behavior and include the compound token, set thediscardCompoundTokenoption to false. Note that when set to false, SynonymFilter or SynonymGraphFilter may not work as expected (LUCENE-9173).English Stopwords in StandardAnalyzer
By default, the StandardAnalyzer no longer removes English stopwords (LUCENE-7444). To revert to the previous behavior, passEnglishAnalyzer.ENGLISH_STOP_WORDS_SETas an argument in the constructor.
Configurable parallel search executor
The system.xml property /com/fredhopper/search/fred/FredSearchEngine@search-executor-thread-count allows you to specify a custom thread pool to use in Lucene as the parallel search executor, which can improve search performance on big indices and servers with many CPU cores.
The value can be:
an absolute thread count specifier
<n>, e.g.3;a decimal number followed by
Cto specify a value relative to the number of CPU cores, e.g.0.5C.
The default value is 0, which disables the use of a parallel search executor.
Example in system.xml:
<?xml version="1.0" encoding="UTF-8"?>
<preferences xmlns="http://java.sun.com/dtd/preferences.dtd">
<root type="system">
<map />
<node name="com">
<map />
<node name="fredhopper">
<map />
<node name="search">
<map/>
<node name="fred">
<map/>
<node name="FredSearchEngine">
<map>
<!--
Size of the thread-pool used to perform search in parallel within a Lucene segment.
When terminated with "C", the number part is multiplied with the number of CPU
cores. Floating point value are only accepted together with "C".
If set to 0, the custom executor is not used.
Example values: "0", "1C", "0.5C"
-->
<entry key="search-executor-thread-count" value="0.5C" />
</map>
</node>
</node>
</node>
</node>
</node>
</root>
</preferences>
Comments
0 comments
Article is closed for comments.