Most websites use Gzip to compress the data to send it to you.

The idea is that internet speeds are still much slower than compute time, so spending compute to decompress a file that was sent to you, reduces load-times overall, compared to an uncompressed site.

So naturally I enabled it on this website.

Compiling static sites with Nikola

This website is generated with a static website generator called Nikola.

It's written in Python, so it's easy for me to add changes, compared to Hugo, Jekyll, or Atom.

Why static sites?

A website like mine that doesn't have users can make most dynamic content with javascript or smart choices in the design.

That means I don't have to pull from a database and it should be much faster than something like Wordpress.

In fact, most Wordpress will use caching plugins to basically create static equivalents of the dynamic websites.

Compile times

But my site has grown significantly.

We have a ton of articles and [/projects] and apps and other gimmicks.

I had already been working hard to reduce build and deploy times on this a ton. I got the build time down from sometimes over to 20 minutes to less than 5. But deploying to the server took quite some times.

Publish times

The deployment action I'm using is already quite smart.

Basically, it compares the hashes of the two files and only uploads the file if those hashes differ.

But I noticed that my publish times were slowly approaching 10 minutes!

All those files that were regularly replpaced were .gz files.

Gzips!

Even if the HTML didn't change!

Deterministic Gzip compression

Turns out Gzip by default will include the filename and a time stamp of the compression in the file.

If you use the gzip command you can simply omit those using the -n flag.

But how do we make reproducible archives with Python?!

Nikola uses the gzip library with the GzipFile object, which doesn't have a "deterministic" -n keyword.

I checked all of the possible tricks, but in the end it's all in the definition of Gzip.

It basically says that a mod time of 0 is equivalent to a deterministic compression.

A compression that will always provide the same files and hashes if those didn't change.

So I simply changed:

with gzip.GzipFile(out_path, 'wb+') as outf:

to

with gzip.GzipFile(out_path, 'wb+', mtime=0) as outf:

and it works. The publish time is down from almost 10 minutes to less than 3!

Such a small change.

Such a big impact!

And of course, since Nikola is open source, I could simply open a pull request!