Amber’s Weblog ☕

Dataclass in Python

A dataclass is a special type of class in Python designed to store structured data with less boilerplate code.

They can be created via @dataclass decorator before the class definition.

from dataclasses import dataclass

💡 Q1: How are dataclasses different from normal classes?

A: They automatically generate common methods such as init(), repr(), and eq(), so we don’t have to write them ourselves (though we can still override them if needed).

Let's give some examples.

1. __init__()

# normal class

class BoulderingGym:
    def __init__(self, place, capacity):
        self.place = place
        self.capacity  = capacity
💡 Tip

I learnt that while a class definition can start with or without () at the end, it would make more sense to only use () if the class you are defining is inheriting from another class, e.g. class BoulderingGym(Gym).

With a dataclass

@dataclass
class BoulderingGym:
    place: str
    capacity: int

If we try to create an instance without arguments:

> bouldering_gym = BoulderingGym()

Output

Traceback (most recent call last):
  File "/Users/ambernguyen/Documents/Playground/playground.py", line 8, in 
    bouldering_gym = BoulderingGym()
TypeError: BoulderingGym.__init__() missing 2 required
positional arguments: 'place' and 'capacity'

Tadaaaa, notice how no __init__() was written for @dataclass. The dataclass generated it for us.

2. __repr__()

This stands for representation string - how your object looks when printed.

#normal class

class BoulderingGym:
    def __init__(self, place, capacity):
        self.place = place
        self.capacity  = capacity

b_gym = BoulderingGym("Australia", 1000)
print(b_gym)

Output

<__main__.BoulderingGym object at 0x10250dbe0>

If we then add __repr__() manually:

#normal class

class BoulderingGym:
    def __init__(self, place, capacity):
        self.place = place
        self.capacity  = capacity

+   def __repr__(self):
+     return ("BoulderingGym(place={}, capacity={})".format(self.place, self.capacity))

b_gym = BoulderingGym("Australia", 1000)
print(b_gym)

Output

BoulderingGym(place=Australia, capacity=1000)

BUT, if we use @dataclass

from dataclasses import dataclass

@dataclass
class BoulderingGym:
    place: str
    capacity: int

b_gym = BoulderingGym("Australia", 1000)
print(b_gym)

Output

BoulderingGym(place='Australia', capacity=1000)

Very nice, no need to write __repr__() method but it still prints some meaningful information about the object.

3. __eq__()

The__eq__() checks equality between two objects.

# normal class
class BoulderingGym:
    def __init__(self, place, capacity):
        self.place = place
        self.capacity  = capacity

    def __repr__(self):
      return ("BoulderingGym(place={}, capacity={})".format(self.place, self.capacity))

b_gym_1 = BoulderingGym("Australia", 1000)
b_gym_2 = BoulderingGym("Australia", 1000)
print(b_gym_1 == b_gym_2)

Output

False

For a dataclass:

from dataclasses import dataclass

@dataclass
class BoulderingGym:
    place: str
    capacity: int

b_gym_1 = BoulderingGym("Australia", 1000)
b_gym_2 = BoulderingGym("Australia", 1000)

print(b_gym_1 == b_gym_2)

Output

True

If we want the normal class to do the same as the dataclass, we need to define __eq__() manually to specify what being compared because currently, it is comparing the objects' id and they are different. Let's make some changes:

# normal class
class BoulderingGym:
    def __init__(self, place, capacity):
        self.place = place
        self.capacity  = capacity

    def __repr__(self):
      return ("BoulderingGym(place={}, capacity={})".format(self.place, self.capacity))

+   def __eq__(self, other):
+      if isinstance(other, BoulderingGym):
+         return(self.capacity, self.place) == (other.capacity, other.place)

b_gym_1 = BoulderingGym("Australia", 1000)
b_gym_2 = BoulderingGym("Australia", 1000)
print(b_gym_1 == b_gym_2)

Output

True

4. Other dataclass parameters

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

More information about what each parameter does can be found at dataclasses — Data Classes.

5. Setting default values

Say we want to set a default value for place:

@dataclass
class BoulderingGym:
-   place: str
    capacity: int
+   place: str = "Australia

b_gym_1 = BoulderingGym(1000)
print(b_gym_1)

Output

BoulderingGym(capacity=1000, place='Australia')

Notice how I shifted place to be after capacity. This is because ⚠️fields without default values cannot appear after fields with default values. Just the rule, sorry.

And that's the quick crash course on @dataclass.

6. More readings: