Dataclass in Python
A dataclass is a special type of class in Python designed to store structured data with less boilerplate code.
They can be created via @dataclass decorator before the class definition.
from dataclasses import dataclass
💡 Q1: How are dataclasses different from normal classes?
A: They automatically generate common methods such as init(), repr(), and eq(), so we don’t have to write them ourselves (though we can still override them if needed).
Let's give some examples.
1. __init__()
# normal class
class BoulderingGym:
def __init__(self, place, capacity):
self.place = place
self.capacity = capacity
I learnt that while a class definition can start with or without () at the end, it would make more sense to only use () if the class you are defining is inheriting from another class, e.g. class BoulderingGym(Gym).
With a dataclass
@dataclass
class BoulderingGym:
place: str
capacity: int
If we try to create an instance without arguments:
> bouldering_gym = BoulderingGym()
Output
Traceback (most recent call last):
File "/Users/ambernguyen/Documents/Playground/playground.py", line 8, in
bouldering_gym = BoulderingGym()
TypeError: BoulderingGym.__init__() missing 2 required
positional arguments: 'place' and 'capacity'
Tadaaaa, notice how no __init__() was written for @dataclass. The dataclass generated it for us.
2. __repr__()
This stands for representation string - how your object looks when printed.
#normal class
class BoulderingGym:
def __init__(self, place, capacity):
self.place = place
self.capacity = capacity
b_gym = BoulderingGym("Australia", 1000)
print(b_gym)
Output
<__main__.BoulderingGym object at 0x10250dbe0>
If we then add __repr__() manually:
#normal class
class BoulderingGym:
def __init__(self, place, capacity):
self.place = place
self.capacity = capacity
+ def __repr__(self):
+ return ("BoulderingGym(place={}, capacity={})".format(self.place, self.capacity))
b_gym = BoulderingGym("Australia", 1000)
print(b_gym)
Output
BoulderingGym(place=Australia, capacity=1000)
BUT, if we use @dataclass
from dataclasses import dataclass
@dataclass
class BoulderingGym:
place: str
capacity: int
b_gym = BoulderingGym("Australia", 1000)
print(b_gym)
Output
BoulderingGym(place='Australia', capacity=1000)
Very nice, no need to write __repr__() method but it still prints some meaningful information about the object.
3. __eq__()
The__eq__() checks equality between two objects.
# normal class
class BoulderingGym:
def __init__(self, place, capacity):
self.place = place
self.capacity = capacity
def __repr__(self):
return ("BoulderingGym(place={}, capacity={})".format(self.place, self.capacity))
b_gym_1 = BoulderingGym("Australia", 1000)
b_gym_2 = BoulderingGym("Australia", 1000)
print(b_gym_1 == b_gym_2)
Output
False
For a dataclass:
from dataclasses import dataclass
@dataclass
class BoulderingGym:
place: str
capacity: int
b_gym_1 = BoulderingGym("Australia", 1000)
b_gym_2 = BoulderingGym("Australia", 1000)
print(b_gym_1 == b_gym_2)
Output
True
If we want the normal class to do the same as the dataclass, we need to define __eq__() manually to specify what being compared because currently, it is comparing the objects' id and they are different. Let's make some changes:
# normal class
class BoulderingGym:
def __init__(self, place, capacity):
self.place = place
self.capacity = capacity
def __repr__(self):
return ("BoulderingGym(place={}, capacity={})".format(self.place, self.capacity))
+ def __eq__(self, other):
+ if isinstance(other, BoulderingGym):
+ return(self.capacity, self.place) == (other.capacity, other.place)
b_gym_1 = BoulderingGym("Australia", 1000)
b_gym_2 = BoulderingGym("Australia", 1000)
print(b_gym_1 == b_gym_2)
Output
True
4. Other dataclass parameters
@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)
More information about what each parameter does can be found at dataclasses — Data Classes.
5. Setting default values
Say we want to set a default value for place:
@dataclass
class BoulderingGym:
- place: str
capacity: int
+ place: str = "Australia
b_gym_1 = BoulderingGym(1000)
print(b_gym_1)
Output
BoulderingGym(capacity=1000, place='Australia')
Notice how I shifted place to be after capacity. This is because ⚠️fields without default values cannot appear after fields with default values. Just the rule, sorry.
And that's the quick crash course on @dataclass.