As a classically trained physicist, my natural style of programming is functional.
But in production-level code, we usually want a mix of classes and functions.
However, this can get awkward.
The Chicken and Egg Attribute Dilemma
Oftentimes, we'll calculate some properties for a class and set them as attributes. Maybe it's the mean and standard deviation for our data class in machine learning, for example.
But that'd mean we have to initialise our class, load the data at initialisation and calculate those statistics.
That can be a bit silly in many cases.
Especially when we're building a larger class with some extra utility, this can quickly devolve into creating a monster at
But those should be methods!
I see your criticism. Shouldn't this just be a function or method?
Maybe. But not always.
I don't want to re-run a heavy method every time I call it.
Specifically, I would say methods, like functions, should work for some kind of information flow or process.
Innit an init?
So then, we would love to store function outputs in attributes to recall later.
Especially when we're saving and loading torch models, it's good to save certain states and information with the model to ensure proper processing at inference.
But how do you correctly run the function once?
Maybe if we want to have a
Data.mean attribute, so we basically have to
class Data: ... def _mean(self): if self.mean is None: self.mean = sum(self.data) / len(self.data)
And then calling
self._mean() in the correct position.
Can we talk about our lord and saviour caching
But we have an alternative in the wonderful
We can actually run a function and cache the result as an attribute!
So when you
from functools import cached_property, we can simply decorate our function like this:
class Data: ... @cached_property def mean(self): return sum(self.data) / len(self.data)
Then the mean gets calculated when we request
Data.mean the first time and saved as an attribute subsequently.
Disentangling and decoupling code with cached_property
Recently, that beauty of a decorator actually meant I was able to loosen up some tightly coupled code!
When you use cached properties liberally in your class, you can care a little less about when which function gets called.
Imagine before, you have to:
class Data: def __init__(self, filepath): self.filepath = filepath self._load_data() # Sets data self._mean() def _load_data(self): self.data = load(self.filepath) def _mean(self): self.mean = sum(self.data) / len(self.data)
That means our functions have to be run in the correct order explicitly and it's easy to introduce bugs by simply switching the commands around.
We also depend on everything being set correctly behind the scenes.
with cached properties we can make this into:
class Data: def __init__(self, filepath): self.filepath = filepath: @cached_property def data(self): return load(self.filepath) @cached_property def mean(self): return sum(self.data) / len(self.data)
Now we can simply call
Data.mean, and it will request the data for us and then calculate the mean on top of it.
The first call to
self.data executes the loading command and the second falls back to the cached data.
When we request the
Data.mean again, it just gives the cached mean as well.
Of course we have to remember when to use caching and when not to. But when used appropriately, this makes for a very good experience.
Such a neat and clean way to resolve dependencies within your class!