Are you using Data Classes?
Introduction
I don’t know about you but I have a tendency to store results in a dictionary and pass that around to functions when I need to. I have typically avoided creating classes for storing data as it always seemed a bit of overkill for the job at hand. Lots of repetitive code with little actual reward.
Here’s a rather simple example of what I mean, where I’m gathering all the results of interest into a single return item for a function. I find this easier than having multiple returns from multiple functions.
import numpy as np
from dataclasses import dataclass
def some_complex_function(forces, scale):
mult = np.pi**scale
complex_calculated_forces = forces * mult
result = dict()
result["val_x"] = complex_calculated_forces[:, 0]
result["val_y"] = complex_calculated_forces[:, 1]
result["val_z"] = complex_calculated_forces[:, 2]
result["mult"] = mult
return result
forces = np.random.random([1000,3])
calculated_forces = some_complex_function(forces, 4)
calculated_forces.keys()
This isn’t beautiful code, but it returns a single dict
with all the related properties together, keeping the variable workspace a bit clearer in the process.
Much handier if you need to pass this into several other functions later on in your workflow.
However, it’s not particularly re-usable and not great for modifying in future. Maybe a class would be a better option? But there’s so much effort involved in created a class I hear you say.
All those __init__
and __repr__
methods that need to be defined, you may end up with a many lines of code for defining a very basic class.
Data Classes
And that’s why data classes were introduced in python 3.7, to remove all that unnecessary boilerplate code required and just let you use the classes quickly.
So here’s my rather silly contrived example again, but this time using a fancy new data class.
import numpy as np
from dataclasses import dataclass
@dataclass
class resultant_force:
x: np.ndarray
y: np.ndarray
z: np.ndarray
multiplier: np.float64
def some_complex_function_using_dataclass(forces, scale):
mult = np.pi**scale
complex_calculated_forces = forces * mult
return resultant_force(*complex_calculated_forces.T, mult)
forces = np.random.random([1000,3])
force_dataclass = some_complex_function_using_dataclass(forces, 4)
I can now run dir(force_dataclass)
on my result and see that it’s a fully fledged class :
['__annotations__',
'__class__',
'__dataclass_fields__',
'__dataclass_params__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__slotnames__',
'__str__',
'__subclasshook__',
'__weakref__',
'multiplier',
'x',
'y',
'z']
There’s even a repe
created for free! So I can quite easily query force_dataclass.multiplier
and get Out[4]: 97.40909103400242
. What’s nice about this is now tht it’s a data class instead of a dictionary most IDE’s will autocomplete the dataclass
fields for you, which is another bonus.
The other major benefit is now I have a nice reusable data container which I could make a little more generic and use in many places. I can do this because dataclasses also accept default values for fields. So I can change my previous class to something like this:
@dataclass
class resultant_force:
x: np.ndarray
y: np.ndarray
z: np.ndarray
multiplier: np.float64 = None
def some_complex_function_using_dataclass(forces, scale):
mult = np.pi**scale
complex_calculated_forces = forces * mult
return resultant_force(*complex_calculated_forces.T, mult)
def some_complex_function_using_generic_dataclass(forces):
complex_calculated_forces = forces * 2
return resultant_force(*complex_calculated_forces.T,)
force_dataclass = some_complex_function_using_dataclass(forces, 4)
generic_force_dataclass = some_complex_function_using_generic_dataclass(forces)
And now from one simple change I have a generic data structure that can be used in multiple places, passing in the additional variables when needed, otherwise they are set to None
.
And data classes have one more nice trick where you can “embed” some calculation into the class
.
@dataclass
class resultant_force:
x: np.ndarray
y: np.ndarray
z: np.ndarray
multiplier: np.float64 = None
def __post_init__(self):
self.custom_var = np.sum(self.x) / 3
def some_complex_function_using_generic_dataclass(forces):
complex_calculated_forces = forces * 2
return resultant_force(*complex_calculated_forces.T,)
force_dataclass = some_complex_function_using_dataclass(forces)
This now calculates whatever is in the __post_init__
method when the object is created. This is very handy if you always do some calculation with the data in the class
, just simply embed the calculation within the class and the result will be there for you when you need it!
Conclusion
These are just some very simple examples of how useful data classes can be for organising and improving your code. I love how the boilerplate of class creation is gone, and how they can make code more readable and easier to maintain.
There are many other features you would expect of a class and these are also included such as automatic __repr__
and object comparison. There’s also easy conversion to lists and dictionaries.
I suggest reading this post for a more detailed introduction to dataclasses and to see how they may help you.