Sitemap en crawling │ Rankfender v2.1 Docs

This feature is currently in development and will be available in a future release.

Introduction

The Sitemap & Crawling tool will give you complete visibility into how search engines discover and access your website's pages. It automatically parses your XML sitemap, compares it against the pages actually found on your site, and identifies discrepancies that could be preventing your content from being indexed. By continuously monitoring crawl health, this tool ensures that every important page on your site is accessible to search engines and properly represented in your sitemap.

Sitemap Validation

The tool will automatically locate and parse your XML sitemap (or sitemap index) and validate it against industry standards. Validation checks will include:

Proper XML formatting and encoding
Valid URL entries with correct protocol and domain
Appropriate use of lastmod, changefreq, and priority attributes
Sitemap size compliance (under 50,000 URLs and 50MB per sitemap)
Sitemap index structure for large sites with multiple sitemaps
Correct referencing of the sitemap in your robots.txt file

Any validation errors will be flagged with clear explanations and recommended fixes, so you can resolve issues before they impact how search engines process your sitemap.

URL Discovery Comparison

One of the most valuable features of this tool will be its ability to compare the URLs listed in your sitemap against the URLs actually found through crawling your site. This comparison reveals several important categories of pages:

Orphaned Pages

Orphaned pages are those that exist on your site but are not included in your sitemap and may not be linked from other pages. These pages are difficult for search engines to discover and are often overlooked during content audits. The tool will identify every orphaned page and recommend whether to add it to your sitemap, redirect it, or remove it entirely.

Missing from Site

URLs that appear in your sitemap but return 404 errors or are otherwise inaccessible will be flagged. Keeping invalid URLs in your sitemap wastes crawl budget and signals poor site maintenance to search engines. The tool will recommend removing these entries or setting up appropriate redirects.

Non-Indexable Pages in Sitemap

Pages that are included in the sitemap but have noindex directives, canonical tags pointing elsewhere, or are blocked by robots.txt will be identified. These conflicting signals confuse search engines and should be resolved by either removing the page from the sitemap or updating its directives.

Broken Link Detection

During each crawl, the tool will check every internal and external link found on your pages. Broken links will be categorized by HTTP status code and sorted by the number of pages linking to the broken URL. For each broken link, you will see:

The broken URL and its HTTP status code
All pages that link to the broken URL
The anchor text used in each link
Suggestions for replacement URLs or redirects

Redirect Chain Detection

Redirect chains occur when a URL redirects to another URL, which redirects to yet another, creating a sequence of redirects that slows down page loading and dilutes link equity. The Sitemap & Crawling tool will map every redirect chain on your site and show you the full path from the initial URL to the final destination. You will receive recommendations to update your links to point directly to the final URL, eliminating unnecessary redirect hops.

Canonical Tag Analysis

Canonical tags tell search engines which version of a page should be treated as the authoritative one. The tool will analyze canonical tags across your entire site, identifying:

Pages missing canonical tags entirely
Self-referencing canonical tags (which are correct in most cases)
Canonical tags pointing to non-existent or redirecting URLs
Conflicting canonical signals between HTTP headers and HTML tags
Pages where the canonical URL differs from the URL in the sitemap

Robots.txt Compliance

The tool will parse your robots.txt file and verify that it is correctly configured. It will check whether important pages are accidentally blocked, confirm that your sitemap is referenced, and identify overly broad disallow rules that might be preventing search engines from crawling valuable content. A visual representation will show exactly which sections of your site are accessible and which are blocked for each major search engine crawler.

Continuous Crawl Monitoring

Rather than relying solely on one-time audits, the Sitemap & Crawling tool will continuously monitor your site on a schedule determined by your plan:

Plan	Crawl Frequency	Pages per Crawl	Alert Notifications
Professional	Weekly	Up to 500	Email
Agencies	Daily	Up to 5,000	Email and in-app
Enterprise	Real-time	Unlimited	Email, in-app, and webhook

When new crawl issues arise between scheduled scans, the tool will send proactive alerts so you can address problems before they affect your search engine rankings.

Google Search Console Integration

By connecting your Google Search Console account, the Sitemap & Crawling tool will be able to cross-reference its findings with actual indexation data from Google. This integration will enable you to see which of your pages are actually indexed, compare your sitemap coverage against Google's index, identify pages that Google has chosen not to index along with the reasons why, and track how quickly new pages are being discovered and indexed after publication. This combination of internal crawl data and Google's indexation data will provide the most complete picture possible of your site's search engine accessibility.

Issue Resolution Workflow

Every issue discovered by the Sitemap & Crawling tool will include a recommended resolution path. For sites connected through CMS integrations, many fixes can be applied directly from within Rankfender. For other sites, detailed instructions will guide your development team through the necessary changes. Each resolved issue is automatically verified during the next crawl cycle to confirm the fix was applied correctly.