DQM scanner guide – Rezolve Ai Support Portal

What the scanner does

The DQM scanner is a configurable, general-purpose web crawler designed to fetch and analyze as much content as possible from a target website. It operates by:

Starting from one or more initial URLs
Parsing each fetched page for navigable links (<a href="...">)
Recursively visiting each unique URL within the defined domain scope
Capturing content once per unique URL—meaning dynamic updates on the same URL are not rescanned

Limitations:

The scanner has has limited interactive capabilities at the start of the crawl to bypass simple access barriers such as:

Login forms
Cookie consent banner
Age verification gates

These interactions occur only during the initial phase to enable crawl access—not during the full site scan.

Prerequisites

To enable a successful and complete scan, the following conditions must be in place:

Website accessibility

The scanner must be able to access the website without being blocked by:
- CAPTCHA challenges
- Anti-bot protections (e.g. WAFs with "shields-up" mode)
- Any human-verification or interactive barriers
The scanner:
- Identifies itself as MagusBot 1.0
- Operates from a known range of IP addresses. See Do I need to whitelist DQM IP addresses? article for more information.

Authentication constraints

The scanner can input credentials into login frames, but it cannot complete login flows that require:

Multi-Factor Authentication (MFA)
Email-based login confirmations (e.g. "Click the link we sent you")

If your login flow includes these, your team must work with us to configure alternative access.

Site structure consistency

The site should support anchor-based navigation (<a href="...">) for link discovery
Avoid relying on:
- <button> elements for navigation
- JavaScript-only routing (common in SPAs)
Any changes to login or navigation processes can disrupt scanning and must be communicated in advance

How the scanner works

Crawl initiation

We support the use of a sitemap to initiate the crawl. Customers can provide a list of URLs up front that the crawler scans, then loads each page and scans for HTML anchor tags (<a href="...">).

Link discovery

Each discovered link is evaluated to determine:
- If it has already been visited
- If it is within the defined domain/scope
Valid links are added to the crawl queue

Content capture

The scanner fetches and processes one piece of content per unique URL
It does not handle dynamic content changes on the same URL

Initial phase interactions

At the beginning of the crawl, the scanner can:
- Input text into login fields
- Click through cookie banners
- Respond to age gates
These are intended to unlock protected content, not to interact with the site during deep scanning

Limitations

The crawler does not:
- Activate user UI actions with unknown outcomes
- Navigate through JavaScript-only paths
- Handle multi-step flows requiring checkboxes or manual confirmation

Example: It cannot proceed through checkout flows requiring acceptance of terms via a checkbox

Best practices

To maximize crawl efficiency and data completeness:

Allow/approve traffic from our IP range and avoid blocking MagusBot 1.0
Disable or relax bot detection measures (e.g. CAPTCHA, rate-limiting, behavioral detection). This allows the scanner to bypass bot protections approving our IP range or user agent.
Use anchor tags (<a href="...">) for internal navigation wherever possible
Provide a sitemap if internal links are not easily discoverable
Avoid dynamic, non-anchor-based navigational methods (e.g., JavaScript buttons)

For any questions related to the DQM scanner, please contact Crownpeak Support.