Most websites use Gzip to compress the data to send it to you.
The idea is that internet speeds are still much slower than compute time, so spending compute to decompress a file that was sent to you, reduces load-times overall, compared to an uncompressed site.
So naturally I enabled it on this website.
Compiling static sites with Nikola
This website is generated with a static website generator called Nikola.
It's written in Python, so it's easy for me to add changes, compared to Hugo, Jekyll, or Atom.
Why static sites?
That means I don't have to pull from a database and it should be much faster than something like Wordpress.
In fact, most Wordpress will use caching plugins to basically create static equivalents of the dynamic websites.
But my site has grown significantly.
I had already been working hard to reduce build and deploy times on this a ton. I got the build time down from sometimes over to 20 minutes to less than 5. But deploying to the server took quite some times.
The deployment action I'm using is already quite smart.
Basically, it compares the hashes of the two files and only uploads the file if those hashes differ.
But I noticed that my publish times were slowly approaching 10 minutes!
All those files that were regularly replpaced were
Even if the HTML didn't change!
Deterministic Gzip compression
Turns out Gzip by default will include the filename and a time stamp of the compression in the file.
If you use the
gzip command you can simply ommit those using the
But how do we make reproducible archives with Python?!
Nikola uses the
gzip library with the
GzipFile object, which doesn't have a "deterministic"
I checked all of the possible tricks, but in the end it's all in the definition of Gzip.
It basically says that a mod time of 0 is equivalent to a deterministic compression.
A compression that will always provide the same files and hashes if those didn't change.
So I simply changed:
with gzip.GzipFile(out_path, 'wb+') as outf:
with gzip.GzipFile(out_path, 'wb+', mtime=0) as outf:
and it works. The publish time is down from almost 10 minutes to less than 3!
Such a small change.
Such a big impact!
And of course, since Nikola is open source, I could simply open a pull request!