XML sitemaps and SEO: What you need to know
XML sitemaps just turned 18 in June, and in honor of its full-fledged introduction into adulthood, I’m breaking down how XML sitemaps work, how to build one, and why XML sitemaps offer a boatload of SEO benefits.
What is an XML sitemap?
In a nutshell, an XML sitemap is a protocol in XML format that tells search engines about the URLs on a website available to crawl.
XML sitemaps were meant to share information about each URL, like:
- When it was last updated.
- How often it changes.
- How important it is in relation to other URLs across your website.
XML sitemaps allow search engines to crawl your website more efficiently if certain URLs are isolated.
If you’re an SEO professional, you probably already use XML sitemaps built into your Yoast WordPress plugin. But there are plenty of extras that you might be missing.
How to build a dynamic XML sitemap
If you’re using WordPress, leverage SEO plugins like Yoast and RankMath to generate a dynamic XML sitemap.
If you’re not using WordPress, follow these steps to build a dynamic XML sitemap.
Understand your navigational structure
Pull together a list of all our URLs on the website and group them to align to navigational structure and content groupings.
Partner with your developer team
Share the list of URLs grouped together with your dev team to gain their help in creating a dynamic XML sitemap using Python, PHP, or JavaScript.
Here is an example of how to code a dynamic XML sitemap using Python:
import xml.etree.ElementTree as ET
import datetime
# List of URLs (replace these with your actual URLs)
urls = [
"https://www.example.com/page1",
"https://www.example.com/page2",
"https://www.example.com/page3",
# Add more URLs here
]
# Create the root element
urlset = ET.Element("urlset")
urlset.set("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
# Loop through URLs and create URL entries
for url in urls:
url_elem = ET.SubElement(urlset, "url")
loc_elem = ET.SubElement(url_elem, "loc")
loc_elem.text = url
lastmod_elem = ET.SubElement(url_elem, "lastmod")
lastmod_elem.text = datetime.datetime.now().isoformat()
changefreq_elem = ET.SubElement(url_elem, "changefreq")
changefreq_elem.text = "daily" # You can set this to the desired frequency
priority_elem = ET.SubElement(url_elem, "priority")
priority_elem.text = "0.5" # You can set this to a value between 0 and 1
# Create the XML tree
tree = ET.ElementTree(urlset)
# Save the XML to a file
tree.write("sitemap.xml", encoding="utf-8", xml_declaration=True)
When working with developers, I’ve leveraged Flask and Python together to create a truly dynamic XML sitemap that changes whenever the website’s content changes.
If we can define the dynamic sitemap route based on the website’s content using a function like get_dynamic_urls()
, we should be able to retrieve all URLs.
@app.route('/sitemap.xml', methods=['GET'])
def sitemap():
urls = get_dynamic_urls() # Replace with your dynamic URL retrieval logic
urlset = ET.Element("urlset")
urlset.set("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9")
for url in urls:
url_elem = ET.SubElement(urlset, "url")
loc_elem = ET.SubElement(url_elem, "loc")
loc_elem.text = url
lastmod_elem = ET.SubElement(url_elem, "lastmod")
lastmod_elem.text = datetime.datetime.now().isoformat()
changefreq_elem = ET.SubElement(url_elem, "changefreq")
changefreq_elem.text = "daily"
priority_elem = ET.SubElement(url_elem, "priority")
priority_elem.text = "0.5"
xml_content = ET.tostring(urlset, encoding="utf-8")
return xml_content, {'Content-Type': 'application/xml'}
If you’re sharing code across multiple domain properties, ensure the files are coded properly on your HTML pages.
For example, your CSS and JavaScript files should be linked using <link>
and <script>
tags.
Before sharing the code across different domains, you’ll need to create a centralized server to host the shared XML sitemap rules.
You will have multiple phases of your dynamic XML sitemap with your developers until all the rules get confirmed.
Don’t forget to check the dynamic URLs are actually getting retrieved and integrated into your sitemap logic.
Upload to your server
Once your XML sitemap is compressed, upload your dynamic XML sitemap to your website’s root directory.
Submit to search engines
Let search engines know about your XML sitemaps by submitting them through Google Search Console and Bing Webmaster Tools.
Ongoing maintenance
An XML sitemap is not just a “set it and forget it” SEO tactic. It’s essential to update your XML sitemap as your website changes.
15 tips to keep in mind when building an XML sitemap
1. XML sitemaps are not a ranking factor
When asked on X (previously Twitter) if there is a ranking advantage to using XML sitemaps, Google’s Gary Illyes responded:
- “No, not direct anyway.”
2. Google can find your pages without a sitemap
Websites don’t need a sitemap if they are small and linked properly, Daniel Waisberg, Search Advocate at Google, shared in a Google Search Console training video.
However, he followed up by saying a sitemap might be especially helpful if:
- The site is really large.
- The site’s pages are isolated.
- The site is new or changes quickly.
Also, it’s important to note that Google’s John Muller stated on X (previously Twitter) that having an XML sitemap is a “minimal baseline for any serious website.”
3. Google checks to see if your XML sitemap has changed before reprocessing
Google does check to see if your XML sitemap has changed before it wastes resources parsing and processing the file again, Illyes confirmed.
He stated in an SEO Office Hours.
- “As soon as you change something in your sitemap, be that the URL element or last mod, the sitemap will be parsed again and generally reprocessed. That doesn’t mean that the URLs will be surely crawled of course they are still subject to the quality evaluations like any other URL. It’s also worth to mention that if you remove a URL from the sitemap because perhaps it doesn’t exist anymore that doesn’t mean it’s automatically going to be dropped from the index or even prioritized for crawling so it can be dropped sooner.”
4. XML sitemaps can get your new content indexed quicker
Mueller confirmed that telling Google your XML sitemap is updated can help get your new content indexed quicker.
He shared:
- “Pinging a sitemap is a bit different than just providing it in the robots.txt — by pinging, you’re actively flagging a change in the sitemap file. If you’re keen on having new content indexed quickly, that’s a good practice.”
5. XML sitemaps must be UTF-8 encoded
If your XML sitemap isn’t using UTF-8, search engines can’t read it.
6. Keep the file size under 50 MB (uncompressed) or 50,000 URLs
There are size limits to your XML sitemaps. Keep it under 50,000 URLs per sitemap or 50 MB (uncompressed).
If you have a larger file size or more URLs, use multiple XML sitemaps.
7. You can have multiple XML sitemaps
Google lets you submit multiple sitemaps. Splitting up your sitemaps into different sections of your website to measure performance can be useful.
For example, if you’re a large ecommerce website, you can segment your XML sitemaps by product type (i.e., women’s shoes vs. men’s shoes).
Or, if you have multiple languages, you may want an hreflang related to your <loc> XML sitemap.
If you have alternate languages, it does not count toward the 50,000 URL max.
Or, if you have a lot of videos or images, you could separate your XML sitemaps into a video XML sitemap and an image XML sitemap.
There is no rank benefit to having multiple XML sitemaps.
8. If you have multiple XML sitemaps, remember to use a sitemap index file
If you have multiple XML sitemaps for a large site, you’ll want to submit a sitemap index file to submit many sitemaps at once.
9. The lastmod tag should be flawless
The <lastmod>
tag is one of the most essential parts of your XML sitemap.
For instance, my structured data leverages the <lastmod> **** of my XML sitemap.
The <lastmod>
tag also signals to search engines whether you’ve refreshed old content or launched new content.
Mueller confirmed this by stating:
- “…If you’re just changing the timestamp footer, sure, the page has technically changed too, but should that be prioritized – probably not.”
Bing has even made an official stance on the lastmod
tag in XML sitemap, stating:
- “For XML sitemaps, one of the most critical tags you can include in your sitemap is the “lastmod” tag.”
10. XML sitemaps should only include 200 status URLs that are all self-canonicalized
Remember to keep only live URLs, 200 status, indexable and self-canonicalized.
If you continue to showcase 404 or 301 redirected pages, search engines could stop crawling your XML sitemap entirely.
Mueller responded to a comment on X (formerly known as Twitter):
- “We’d stop fetching sitemap files if the URLs are invalid, but if you’re returning content or redirecting (which is kinda recommended), we’ll keep trying them. It shouldn’t cause problems, since overall sitemap files are only a tiny-tiny part of all URLs fetched from a site.”
11. Search engines can check your XML sitemap daily
Fabrice Canel from Microsoft shared that Microsoft checks your XML sitemap daily.
While Mueller shared that Google checks your XML sitemap based on frequency (a.k.a., your <lastmod>
tag).
12. Page speed indirectly impacts your XML sitemap
We all know how vital page speed is to our SEO efforts. But how does a slow-performing website impact your XML sitemap?
Mueller has confirmed page speed should not impact your XML sitemap by stating:
- “As long as it doesn’t time out, you should be fine. Maybe cache the latest version to prevent unnecessary server load?”
The truth is, I’ve worked on slow-performing websites where the XML sitemap is not loading when search engines crawl the website causing major ranking fluctuations.
There is an indirect impact on your XML sitemap if your website does not load properly.
This is why it’s important to use Gzip compression to reduce file size and speed up loading times.
13. Remove <priority> tag from your XML sitemaps
Illyes compared the <priority>
tag in XML sitemaps to “a bag of noise.”
So if you’re depending on search engines to crawl your URLs in your XML sitemap based on priority, you’ll be waiting a long time.
I recommend removing this tag from your XML sitemaps.
14. If you have navigational structure issues, pair your XML sitemap with an HTML sitemap
There have always been mixed feelings about HTML sitemaps.
Mueller shared that HTML sitemaps “should never be needed.”
He continued:
- “I changed my mind on HTML sitemaps over the years, they should never be needed. Sites small & large should always have a clear navigational structure. If you feel the need for a HTML sitemap, spend the time improving your site’s architecture instead. HTML sitemaps fit into the same bin with HTML loading spinners.”
<iframe src="https://mastodon.social/@johnmu/109477617298107922/embed" class="mastodon-embed" style="max-width: 100%; border: 0" width="400" allowfullscreen="allowfullscreen"></iframe><script src="https://mastodon.social/embed.js" async="async"></script>
When working on large websites at an enterprise level, there are often multiple layers of decision-makers on a website navigation structure. And oftentimes, SEO best practices are taken into consideration.
I pair both my XML sitemap with an HTML sitemap until I’m confident in our website navigation and internal linking hierarchy.
15. Don’t forget to include all your XML sitemaps in your robots.txt file
Don’t forget to include your XML sitemaps (all of them) as links in your robots.txt file.
Here is an example:
Don’t worry about where in the robots.txt file these should go. There is no impact on the location of your XML sitemap in your robots.txt file.
Mueller confirmed this:
XML sitemaps are the blueprint of your website for search engines
Whether you’re launching a new webpage or just adding a canonical tag to a duplicate page, your XML sitemap is something you should update 24/7.
But do so with caution.
While you shouldn’t be paranoid, your XML sitemap is the blueprint of your house for search engines.
So don’t forget to submit your XML sitemaps to Google Search Console and Bing Webmaster Tools.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
Source link : Searchengineland.com