Click here to Skip to main content
13,254,366 members (57,852 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as

Stats

4.7K views
2 bookmarked
Posted 11 Aug 2015

Generating sitemap.xml in Jekyll, Without Using Plugins

, 11 Aug 2015
Rate this:
Please Sign up or sign in to vote.
Generating sitemap.xml in Jekyll, without using plugins

After my blog was already online for a while, I discovered Google Webmaster Tools and sitemaps while reading about the SEO basics.

According to the link, a sitemap in its simplest form is just an XML file like this, with one <url><loc> element per URL on the site:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
  <url>
    <loc>http://www.example.com/foo.html</loc> 
  </url>
</urlset>

Using Jekyll and depending on the structure of the site, it's relatively easy to create this in a dynamic way, so it's updated automatically when adding new posts or pages to the site.

Note that if you're fine with using plugins, there's a plugin for generating sitemaps that works on GitHub Pages.
I'm not using GitHub Pages, but I still wanted to find a solution without plugins, because so far I managed to achieve everything I tried in Jekyll without plugins.

The <url> Element

Since the <url><loc>...</loc></url> part needs to be repeated for each link, it will go into an include file:

/_includes/sitemapxml.html

<url>
    <loc>{{ site.url }}{{ include.url }}</loc> 
</url>

site.url refers to the site's config file. For this blog, it contains the following line:

url: http://christianspecht.de

The links in the sitemap must contain the full URL (http://christianspecht.de/foo instead of just /foo), so all links need to be prefixed with the base URL.

With this include file, it's already possible to create a simple sitemap, by providing URLs manually:

/sitemap.xml

---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% include sitemapxml.html url="/foo/" %}
</urlset>

The generated XML:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
  <url>
    <loc>http://christianspecht.de/foo/</loc> 
  </url>
</urlset>

Being programmers, we obviously don't want to provide URLs manually, though...so we're going to automate this in the next steps.

Getting All URLs, The Simple Way

The easiest way to get all URLs is to loop site.pages:

---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% for p in site.pages %}{% include sitemapxml.html url=p.url %}
{% endfor %}
</urlset>

Note the line break inside the for loop. There must be exactly one line break after the include, so the line breaks in the resulting XML file looks exactly like in the example in the introduction above.

This approach has one big disadvantage: it only works if you actually want all pages listed in the sitemap.
"All" really means all files which contain YAML front matter...including, for example, the sitemap.xml file itself.

In reality, there are probably some pages besides the sitemap that you don't want to be listed either, so you could explicitly exclude everything in the loop which is named X, Y and Z.

It's possible to do it like this, but most of my Jekyll sites have "special" menus which rely on the important URLs being in a data file anyway, so I'll show two advanced approaches how to make use of this.

When Most of the URLs are Blog Posts or in Data Files

This is what I used for the sitemap of this blog.

All the URLs that exist here belong to one of these three categories:

So creating this sitemap file for my blog was as simple as that:

---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% include sitemapxml.html url="/archive/" %}
{% include sitemapxml.html url="/projects/" %}
{% include sitemapxml.html url="/tags/" %}
{% for project in site.data.sidebarprojects %}{% include sitemapxml.html url=project.url %}
{% endfor %}{% for post in site.posts %}{% include sitemapxml.html url=post.url %}
{% endfor %}</urlset>

The three pages at the beginning are very unlikely to change, and new projects and new blog posts will be updated automatically when I add them to the rest of the site.

When You Have A Nested Data File

I'm running another site which uses the "dynamic" tree menu I described here.

There's a data file with all the URLs anyway, but the URLs are nested, so getting a list with all of them is a bit more complex.
The way to create a sitemap file here is similar to creating the menu: by using a recursive include file.

I'll show just the code here - read the blog post linked above for an in-depth explanation how this works, it's exactly the same approach.

/_includes/sitemap.html

{% for item in include.map %}{% include sitemapxml.html url=item.url %}
{% if item.subitems %}{% include sitemap.html map=item.subitems %}{% endif %}{% endfor %}

Again, the line breaks must be exactly as shown here in order to avoid unnecessary empty lines in the finished sitemap file.

And here's the actual sitemap file which uses the include, passing the data file with the menu information:

/sitemap.xml

---
layout: none
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
{% include sitemap.html map=site.data.menu %}</urlset>

I won't show the generated XML here, but it looks exactly like the example in the very beginning of this post.

Telling Google about the Sitemap

Now there's only one thing missing: we need to tell the search engines about our new sitemap.

Another quote from Google's "Build a sitemap" page (from the very bottom):

Once you've made your sitemap, you can then submit it to Google with the Sitemaps page, or by inserting the following line anywhere in your robots.txt file:

Sitemap: http://example.com/sitemap_location.xml

For my sites, I immediately submitted the sitemaps to Google, but still inserted the line into robots.txt.

Apparently the link to the sitemap file needs to contain the full URL as well (not just /sitemap.xml), so created robots.txt with Jekyll as well, so I could reuse site.url:

/robots.txt

---
layout: none
---

Sitemap: {{ site.url }}/sitemap.xml

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Christian Specht
Software Developer
Germany Germany
No Biography provided

You may also be interested in...

Pro
Pro

Comments and Discussions

 
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171114.1 | Last Updated 11 Aug 2015
Article Copyright 2015 by Christian Specht
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid