Search-engine optimization is a mystery to me.

But if you build an awesome thing and make a Jupyter book, you may as well help people find it, right?!

I was working on ml.recipes, pythondeadlin.es, and data-science-gui.de, and they have a ton of different pages with content. Things I want people to see.

What's a Sitemap

A sitemap is a file that contains a list of all the pages and URLs on a website. This file supposedly helps search engines to crawl and index a website more effectively.

There are two types of sitemaps: XML and HTML. XML sitemaps are machine-readable and intended for search engines, while HTML sitemaps are intended for human visitors to navigate a website.

When a search engine crawls a website, it starts by looking for a sitemap. If the sitemap exists, the search engine will use it to discover all the pages on the website and add them to its index. This ensures that all the pages on the website are indexed and can potentially show up in search results. Pretty useful, right?!

Sitemaps are particularly helpful for websites that have a complex structure, or that contain pages that are difficult to discover through regular crawling, such as pages with limited internal links or pages that are blocked by robots.txt.

Having a sitemap also helps with SEO (search engine optimization) because it provides search engines with a clear view of the website's structure and content. This makes it easier for search engines to understand what a website is about, and can help to improve the website's visibility in search results.

Sitemaps and Jupyter Books

Jupyter books have a ton of different pages.

So it would be great to have them indexed properly by Google and Bing. But Jupyter Books do not natively build a sitemap!

But Jupyter books are based on Sphinx, so we can make use of the Sphinx ecosystem. Namely, extensions!

You can pip install sphinx-sitemap to get the sitemap plugin or add sphinx-sitemap to your requirements.txt if you use Github actions or other CI to deploy your book.

Then in your _config.yml you need to find your sphinx entry and modify it to contain your html_baseurl and additional_extensions, basically looking like this if you have no other sphinx entries:

sphinx:
  config:
    html_baseurl: 'https://path_to_book/'
  extra_extensions:
    - sphinx_sitemap

If you're curious about the discussion, why this isn't in Jupyter book from the beginning, you can check out this Github issue.

Now my pages are a little bit more optimized for the modern web!

Check out an example sitemap here.