Fredhopper is designed for multi-lingual business, supporting 40+ languages. This article gives an overview of supported linguistic features and refers further to concrete best practices how to optimize Fredhopper Search.
Best practices
The following pages collect best practice patterns to configure Fredhopper Search:
- Fredhopper search passes
- Optimize spell corrections
- Step 1 - Create a standard search configuration
- Step 2 - Create an advanced Search Configuration
- Add stemming exceptions
- Example use of a Search Profile
Languages
Fredhopper currently provides language support for:
- Arabic
- Bulgarian
- Catalan
- Chinese (Traditional)
- Czech
- Danish
- Dutch
- English
- Farsi
- Finnish
- French
- German
- Greek
- Hindi
- Hungarian
- Indonesian
- Italian
- Japanese
- Latvian
- Lithuanian
- Norwegian
- Polish
- Portuguese
- Romanian
- Russian
- Spanish
- Swedish
- Thai
- Turkish
Linguistic Analysis Pipeline
Fredhopper uses advanced natural language processing techniques to unlock the meaning of unstructured text.
Fredhopper's uniquely powerful and modular Linguistic Analysis Pipeline analyses and, if necessary, transforms and expands all content and queries in order to cover grammatical forms and improve linguistic relevancy.
The Linguistic Analysis Pipeline includes components for functions such as tokenization, decompounding, stemming, entity extraction and part-of-speech analysis. Each component performs a particular linguistic analysis task. It may modify, remove, or add information.
Importantly, the pipeline also offers the flexibility to plug-in any custom components that can be activated without the need for code changes. A custom component could, for example, be used to create a custom part number analyser to automatically normalise spelling variations of your specific part numbers.
All character sets are converted internally to UTF-8. Therefore there is no co-dependency between any analysis features and the native encoding of the content.

Linguistic Features
Fredhopper provides several advanced linguistic features:
- Segmentation i.e. the tokenization of queries and text in words or tokens. Fredhopper includes tokenizers for all languages including a comprehensive Asian language tokenizer specific to Japanese, Chinese and Korean languages.
-
Synonym Expansion i.e. a custom dictionary of related words or phrases, including:
- Two-way synonyms: A two way relationship between two words. A TV is a television and a television is a TV. Example: TV = television.
- One way synonyms: A one way relationship between two words. A jeans is a pair of trousers but a pair of trousers is not a jeans. Example: trousers > jeans > 501.
- Weighted synonyms: Similar but not identical relationship between words. A city bike is close to a mountain bike but not the same. Searching for a city bike will find you both city bikes and mountain bikes, but city bikes will appear first.
Example: city bike ~ mountain bike.
Business managers can edit and manage synonyms through a powerful interface in the Business Manager.
- Stemming and Lemmatization i.e. a function that automatically reduces words to one or more base forms. Stemming is intended to allow words with a common root form (such as the singular and plural forms of nouns or the various tenses of verbs) to be considered interchangeable in search operations. For example, search results for the word shirt will include the plural form shirts, while a search for fishing will also include its word root fish. Fredhopper provides stemmers using statistical and morphological analysis for variety of Asian, European and Middle Eastern languages.
- Decomposition or compound splitting is not an issue in English since almost all compounds are separated by a white space e.g. Computer Science or DVD Player. In languages such as German and Dutch, compounds are often not separated and decomposition is a critical feature to deliver relevant search results by, for example, splitting Damenhose in Damen Hose. The decomposition feature automatically detects compounds requiring decomposition.
- Spell Correction and "Did you mean?" features enable search queries to return expected results when the spelling used in query terms does not match the spelling used in the result text (that is, when the user misspells search terms). Fredhopper automatically creates Spell Correction dictionaries from the source data during indexing. By doing this, data specific terms (such as the names of brands or features) are included in the dictionary that would not traditionally be in a standard dictionary.
- Model Number Normalisation i.e. the ability to automatically handle spelling variations of combinations of brand and product model numbers. For example: search results for Sony DS 400, Sony DS-400, Sony DS400, SonyDS400 will all return the same results.
- Linguistic Relevance Ranking i.e. the formula assigning a linguistic match rate to each result. This ranking can be used in combination with commercial ranking features to control the order in which search results are displayed. Fredhopper's relevance ranking formula puts more emphasis on linguistic aspects that matter in commercial environments and removes the noise generated by commercially irrelevant linguistic aspects. In practice, you will find that linguistic match rates will be more accurate and simpler. This enables you to have commercial ranking aspects play a more important role and drive conversion and sales. For example, a search for iPod cable will return the same linguistic match rate for an item named iPod cable and an item with the name cable for iPod.
| Fredhopper has no support for wildcards in search, please consult Fredhopper in case you have specific search requirements |
Comments
0 comments
Article is closed for comments.