Our Hunter Engine scans robots.txt directives and brute-forces common paths to map site architecture.
Enter a clean domain URL. No need to look for the XML file yourself.
We parse robots.txt and check 20+ common locations (WP, Shopify, etc.).
View the URL map, filter nodes instantly, and export data to CSV.
Welcome to the Nexus Sitemap Hunter, a specialized digital cartography tool designed for SEO professionals, developers, and site architects. In the labyrinth of the modern web, visibility is currency. This tool operates as a high-precision radar, locating the structural blueprints—XML Sitemaps—that dictate how search engines perceive and index your digital territory.
Unlike standard validators that require you to know the exact URL, Nexus employs a dual-layer discovery engine. It first queries the robots.txt directive—the "front door" protocol for bots—and if that fails, it initiates a heuristic brute-force scan of over 20 common directory paths used by platforms like WordPress, Shopify, Magento, and custom frameworks.
The Hunter Engine is built on a lightweight PHP backbone, utilizing wp_remote_get for server-side fetching to bypass CORS limitations often encountered by JS-only scanners. It includes native support for:
Input the root domain of the target entity (e.g., example.com). The system automatically handles protocol normalization (HTTP/HTTPS). Do not input a specific page URL unless you are targeting a subdirectory installation.
Upon execution, the terminal log will display the scan progress. If multiple sitemaps are detected (e.g., one declared in robots.txt and another at a default WordPress location), Nexus will present all valid candidates. Select the candidate marked ROBOTS.TXT for the most authoritative data, as this is what Googlebot is explicitly instructed to follow.
Once a sitemap is fetched, the data is rendered in the "Target Intel" interface. You can:
Error: "Invalid XML Structure"
This usually indicates the target server is returning an HTML page (like a 404 error or a "Coming Soon" page) instead of a valid XML file, even if the file extension says .xml. Verify the URL manually in your browser.
Why is the scan blocked?
Some high-security firewalls (Cloudflare, WAFs) may block programmatic requests. Nexus mimics a standard user agent, but aggressive security settings may still reject the handshake. If this occurs, try accessing the sitemap URL directly.
Difference: Index vs. Standard?
A Standard Sitemap contains direct URLs to pages. A Sitemap Index is a container that links to other sitemaps. This is necessary because a single XML file is capped at 50,000 URLs or 50MB. Nexus identifies the type automatically.
Why does lastmod date matter?
The lastmod tag tells search engines when content was updated. If your lastmod dates are old but you've changed content, Google may not recrawl the page. Nexus exposes this data instantly for audit.
NEXUS PROTOCOL // OPTIMIZED FOR SEARCH ENGINE ARCHITECTURE // V3.0