This article will introduce the Fredhopper Data Manager and its concepts and architecture.
Introduction
Functionalities on any eCommerce site are largely defined by the data available, think of complex ranking schemes or available filters to drill down the catalog. Aggregating and loading data from a variety of sources is usually a complex and time consuming task. In situations where there are different data sources available that need to be integrated into one single source and ETL tool is required.
ETL
ETL stands for Extract, Transform and Load:
- Collect data from a variety of data sources (EXTRACT)
- Move and modify data (TRANSPORT and TRANSFORM) while cleansing, normalizing, aggregating and enriching the data
- Store data (LOAD) in the final target destination
The Fredhopper Data Manager is a full blown ETL toolkit that allows you to read data from a variety of sources including ERP systems, databases, Excel sheets, CSV files, or XML.
The Fredhopper Data Manager toolkit
The Fredhopper Data Manager toolkit is based on the successful Commercial Open Source ETL tool Pentaho Data Integration. Fredhoppers contribution to the toolkit lies in the standardized eCommerce transformation steps developed and maintained by Fredhopper. This includes the possibility to integrate data from analytics providers like Google, Coremetrics, Bazaarvoice or Omniture. Integrating this data into the Fredhopper application allows to you use the data to configure complex promotions or rankings schemes. Next to the 100+ standard transformations the Fredhopper Data Manager has this gives you the power to get the most out of your data.
Meta-data, model-driven approach
The Fredhopper Data Manager uses a meta data model driven approach. This allows you to:
- Tell it WHAT to do not HOW
- Create complex transformations with zero code
- Graphically design transformations and job
There is a complete separation of user interface, data, and metadata. The simple graphical user interface ensures a steep learning curve.
Fredhopper Data Manager architecture
The Fredhopper Data Manager is a JAVA application and can run on any operating system that has JAVA installed. It consists of 3 applications:
Spoon
A graphically oriented end-user tool to model transformations and jobs. Spoon is used to design transformations as well as to test transformations in a visual manner. Typically, this involves no actual programming at all - rather, it's a purely declarative task which results in a model. Jobs and transformation are stored in XML files or in a repository.
Pan
A command line tool that executes transformations modeled with Spoon.
Kitchen
A command line tool used to executes jobs created with Spoon.
The command line tools Pan and Kitchen know how to read and interpret the models created by Spoon. These tools actually execute the implied ETL processes. This is done all in one go: there is no intermediate code generation or compilation involved.
Fredhopper Data Manager concepts
In this section we will introduce the most important concepts in the Fredhopper Data Manager.
Transformation
A transformation is the flow from input data to output data. The flow can consist of a unlimited number of steps. Steps in transformations are asynchronous: each step transforms input rows into output rows, but the rows can flow through the transformation in their own pace. A step is done when all the input rows are transformed into output rows, but the rows that come out of the step proceed immediately to the next step and do not wait for the other input rows. In a running transformation almost all steps are executing simultaneously.
Job
Jobs consist of job entries such as transformations or FTP downloads that are placed in a flow of control. Steps in jobs are executed sequentially: all work done inside a step must complete before the next step is entered and executed. This allows for example for advanced ERROR notification in case any of the steps fail.
Step
A step denotes a particular kind of action that is performed on data, like reading data or filtering data. Steps are easily created by dragging them from the design panel and dropping them on the graphical model view.
Once a step is created, it can be opened by double-clicking it. A dialog window appears that can be used to parameterize the step to specify it's exact behavior. Most steps have multiple property pages according to the different categories of properties applicable to steps of that type.
Typically there are 3 types of steps:
- Input steps process some kind of 'raw' resource, such as a file or a database query, and create an output stream of records from it.
- Transforming steps process input streams and perform a particular action on it, often adding new fields or even new records to it. This is then fed to one or more output streams.
- Output steps are like the reverse of input steps: they accept records, and store them in some external resource, such as a XML file or a database table.
Hop
In the graphical representation of the model, lines are visible that form connections between the steps. In the Fredhopper Data Manager, these connections are called hops. Hops between steps behave like pipelines. Records may flow through them from one step to the other. The records travel in a stream-like manner, and steps may buffer records until the next step in line is ready to accept them.
Hops can be created by placing the mouse pointer above the source step, holding the shift button and then dragging (holding the left mouse button and the shift button) to the destination step.
Comments
0 comments
Article is closed for comments.