Introduction
This document is an extension of the information provided in the API reference documentation and describes the fail-safe setup of the Query API end-points of Attraqt Fredhopper Services and provides a guideline for the integration of this API.
We believe that these insights will help you better understand the capabilities of the service and therefore consume it more optimally.
Scope and prerequisites
The document covers the essential technical details of the setup of the Query API end-points.
One requires to have a good understanding of the service, its purpose and functionality.
Technical information
Query API
The Query API accepts requests usually sent from the front-end and responds with the results.
The diagram below depicts the overview of the infrastructure setup (see the Infrastructural overview section below), shows the path of a query request (see the Mechanisms section below) and indicates the fail-safe components.
Infrastructural Components overview
Front-end
By front-end, we mean the complete set of components on the customer's side involved with the integration of the Query API.
One could think of the application servers and the underlying infrastructure, proxies, and other network appliances, etc.
Internet
The front-end connects to the API end-point via the Internet.
DNS authority
We register the DNS names for the API end-points at a 3rd party DNS service provider.
To ensure high availability, each DNS record for the Query API end-point contains a set of public IP addresses of our load balancers and adheres to the Round-robin DNS principle. In short, this means that every time, the DNS name gets resolved into an IP address different from the previous time. To minimize the chance of issues related to DNS caching, we set the TTL to 30 seconds.
Note that the DNS name defines which service instance the request will be sent to.
Load balancer
Each of the load balancers fulfills the following functions:
- Authentication: only authenticated requests are accepted.
- Load balancing: once authenticated, the load balancer passes the request to the correct service instance.
We can scale out the load balancing tier depending on the required capacity. This tier consists of multiple identical components for both capacity and redundancy purposes: all load balancers have identical configuration and functionality and are placed in different geographical locations.
When a load balancer needs to undergo maintenance, its IP will be removed from the DNS record, and the maintenance will start once traffic stops flowing to it. The IP is added back when the maintenance is over.
Note that by default, the nature of the load balancing tier is multi-tenant.
Service instance
Each service instance consists of a single indexer and a set of query servers. We can adjust the capacity of the service instance based on the load, which means scaling out the query server tier.
This tier consists of multiple identical components for both capacity and redundancy purposes: the query servers are identical and are placed in different geographical locations.
During the reconfiguration of the service instance, each of its components is removed from the records in the load balancing tier. This happens sequentially, ensuring continuous availability of the service instance.
Note that each service instance is dedicated to a customer account.
Mechanisms
Query flow
- The front-end composes the query and prepares it to be sent to the Query API end-point
- The Query API end-point DNS name gets resolved. The DNS authority answers with the external IP of one of the load-balancers.
- The front-end sends the query to the IP address using the Host header to indicate which service instance to use and supplying the account credentials.
- The load balancer receives the query, authenticates the requests (assume: successfully), and uses the Host header to decide to which service instance this request should be forwarded to. Once decided, the query is passed to an active query server with the least connections open.
- The query server processes the query, logs it and sends it back to the load balancer.
- The load balancer logs the query and sends it back to the front-end.
Integration guideline
We would like to stress the importance of correct integration of the Query API. In addition, one can find below a number of best practices below.
Response sizes
- Use Compression for every request; this reduces bandwidth and throughput time considerably.
- Use (FAS) configuration to control the amount of information included in the response.
- Avoid using fh_view_size > 50, as this will result in large responses that might take too long to transfer
DNS resolving and caching
- Do not use HTTP Keep-Alive, which might result in unexpected time-outs and connection closings.
- Resolve the DNS name each time the front-end sends a query request. If, for some reason, this is absolutely not working as described and it is not possible due to limitations of the consumed front-end infrastructure or the specifics of the application, we suggest the following workaround heuristics:
- Resolve the DNS record into the full list of IP addresses (e.g. using nslookup, dig or similar tools)
- Implement the Round-robin DNS by sending every new request to the next IP in the list (iterating over the list infinitely).
Make sure to set the Host header when sending queries using an IP address, e.g.
curl --header 'Host: query.published.live1.fas.eu1.fredhopperservices.com' http://123.45.67.89/fredhopper/query
For a query to be sent to the end-point of fas:live1 in the EU region. More information regarding the hostname can be found here.
If this is omitted, the request will result in a 404 response code.
-
- Refresh the list every X regularly (every 5 to 10 minutes) OR in case of errors.
- Make sure that the DNS caching respects the advertised TTL.
- If IP filtering policies apply, we advise to regularly (once every 4-6 hours) resolve the query endpoint and the returned IP addresses to the white-list
- Test DNS with the command below, it should rotate through all IP
while true; do dig +short query.published.live1.fas.eu1.fredhopperservices.com | head -n 1; sleep 1; done
Fail-safety
- Set a timer for each query. Time out setting can be different for different types of queries.
- Implement a single retry for responses containing the HTTP code 5xx.
- Do not query with textual terms longer than 40 characters, frontends should cap these long $s= terms.
Implementation optimizations
- Don't do lazy authentication; ensure your HTTP queries send the required username and password for every request, this to avoid the useless HTTP 401 roundtrip.
- Use REST/XML instead of SOAP WebServices when implementing the Query API
- If you do use SOAP get the WSDL file only occasionally; avoid every second the download of the WSDL.
- If possible, keep a cache of query responses. Do not refresh all cached query responses at once, as those may cause service instance resource saturation.
- Do not use streaming processing of responses. Instead, store each response immediately in a container and then process the container. This will prevent slow streaming on the front-end potentially blocking the query servers and result in a not easily recognizable dead-lock situation on both sides.
- Log the round trip time and the actual query (fh_location=.... etc) for each query sent, format: yyyymmdd hh:mm:ss TZONE <roundtrip time in milliseconds> <IP of response> <HTTP STATUS> <Bytes> <query> . Be sure to rotate these files daily.
- Extend Fredhopper Queries with fh_session=<keyword> parameters for traceability. With proper usage of keywords Fredhopper Cloud Team can quickly pinpoint queries and potential issues. Suggested keywords can be:
- fh_session=<IP>.<DATETIME>.<TRY>, IP is IP address of submitting machine, DATETIME is milliseconds since the epoch, each TRY is a counter starting from 1, second retry is 2, etc.
- fh_session=frontend_cache_refresh, appended to any frontend cache refresh queries
- fh_session=frontend_monitoring, used frontend monitoring queries
- fh_session=frontend_userquery, used of frontend user queries
- or a comibination, for example: fh_session=<IP>.<DATETIME>.<TRY>.frontend_userquery
Monitoring
- Ensure DNS lookups are fast. Unavailable local DNS server may impose increased RTT.
- Setup external monitoring (for instance: Statuscake) so connectivity and ISP related issues can be easily traced.
- Ensure a fast and short route from the front-end to our API end-points. Tracing the route could be of help.
- Monitor your bandwidth and measure the maximum throughput
Comments
0 comments
Please sign in to leave a comment.