Sitemap en crawling
Monitor de gezondheid van de sitemap en de crawlstatus om ervoor te zorgen dat zoekmachines je content vinden.
This feature is currently in development and will be available in a future release.
Introduction
The Sitemap & Crawling tool will give you complete visibility into how search engines discover and access your website's pages. It automatically parses your XML sitemap, compares it against the pages actually found on your site, and identifies discrepancies that could be preventing your content from being indexed. By continuously monitoring crawl health, this tool ensures that every important page on your site is accessible to search engines and properly represented in your sitemap.
Sitemap Validation
The tool will automatically locate and parse your XML sitemap (or sitemap index) and validate it against industry standards. Validation checks will include:
- Proper XML formatting and encoding
- Valid URL entries with correct protocol and domain
- Appropriate use of lastmod, changefreq, and priority attributes
- Sitemap size compliance (under 50,000 URLs and 50MB per sitemap)
- Sitemap index structure for large sites with multiple sitemaps
- Correct referencing of the sitemap in your robots.txt file
Any validation errors will be flagged with clear explanations and recommended fixes, so you can resolve issues before they impact how search engines process your sitemap.
URL Discovery Comparison
One of the most valuable features of this tool will be its ability to compare the URLs listed in your sitemap against the URLs actually found through crawling your site. This comparison reveals several important categories of pages:
Orphaned Pages
Orphaned pages are those that exist on your site but are not included in your sitemap and may not be linked from other pages. These pages are difficult for search engines to discover and are often overlooked during content audits. The tool will identify every orphaned page and recommend whether to add it to your sitemap, redirect it, or remove it entirely.
Missing from Site
URLs that appear in your sitemap but return 404 errors or are otherwise inaccessible will be flagged. Keeping invalid URLs in your sitemap wastes crawl budget and signals poor site maintenance to search engines. The tool will recommend removing these entries or setting up appropriate redirects.
Non-Indexable Pages in Sitemap
Pages that are included in the sitemap but have noindex directives, canonical tags pointing elsewhere, or are blocked by robots.txt will be identified. These conflicting signals confuse search engines and should be resolved by either removing the page from the sitemap or updating its directives.
Broken Link Detection
During each crawl, the tool will check every internal and external link found on your pages. Broken links will be categorized by HTTP status code and sorted by the number of pages linking to the broken URL. For each broken link, you will see:
- The broken URL and its HTTP status code
- All pages that link to the broken URL
- The anchor text used in each link
- Suggestions for replacement URLs or redirects
Redirect Chain Detection
Redirect chains occur when a URL redirects to another URL, which redirects to yet another, creating a sequence of redirects that slows down page loading and dilutes link equity. The Sitemap & Crawling tool will map every redirect chain on your site and show you the full path from the initial URL to the final destination. You will receive recommendations to update your links to point directly to the final URL, eliminating unnecessary redirect hops.
Canonical Tag Analysis
Canonical tags tell search engines which version of a page should be treated as the authoritative one. The tool will analyze canonical tags across your entire site, identifying:
- Pages missing canonical tags entirely
- Self-referencing canonical tags (which are correct in most cases)
- Canonical tags pointing to non-existent or redirecting URLs
- Conflicting canonical signals between HTTP headers and HTML tags
- Pages where the canonical URL differs from the URL in the sitemap
Robots.txt Compliance
The tool will parse your robots.txt file and verify that it is correctly configured. It will check whether important pages are accidentally blocked, confirm that your sitemap is referenced, and identify overly broad disallow rules that might be preventing search engines from crawling valuable content. A visual representation will show exactly which sections of your site are accessible and which are blocked for each major search engine crawler.
Continuous Crawl Monitoring
Rather than relying solely on one-time audits, the Sitemap & Crawling tool will continuously monitor your site on a schedule determined by your plan:
| Plan | Crawl Frequency | Pages per Crawl | Alert Notifications |
|---|---|---|---|
| Professional | Weekly | Up to 500 | |
| Agencies | Daily | Up to 5,000 | Email and in-app |
| Enterprise | Real-time | Unlimited | Email, in-app, and webhook |
When new crawl issues arise between scheduled scans, the tool will send proactive alerts so you can address problems before they affect your search engine rankings.
Google Search Console Integration
By connecting your Google Search Console account, the Sitemap & Crawling tool will be able to cross-reference its findings with actual indexation data from Google. This integration will enable you to see which of your pages are actually indexed, compare your sitemap coverage against Google's index, identify pages that Google has chosen not to index along with the reasons why, and track how quickly new pages are being discovered and indexed after publication. This combination of internal crawl data and Google's indexation data will provide the most complete picture possible of your site's search engine accessibility.
Issue Resolution Workflow
Every issue discovered by the Sitemap & Crawling tool will include a recommended resolution path. For sites connected through CMS integrations, many fixes can be applied directly from within Rankfender. For other sites, detailed instructions will guide your development team through the necessary changes. Each resolved issue is automatically verified during the next crawl cycle to confirm the fix was applied correctly.