Classes and objects#

Education objectives

  • class, type, objects, attribute, methods

  • special methods (“dunder”)

  • OOP and encapsulation

Object-oriented programming: encapsulation#

Python is also an object-oriented language. For some problems, Object-Oriented Programming (OOP) is a very efficient paradigm. Many libraries use it so it is worth understanding what is object oriented programming, when it is useful and how it can be used in Python.

In this notebook, we are just going to consider the OOP notion of encapsulation and won’t study the more complicated concept of inheritance.

Concepts#

Object

An object is an entity that has a state and a behaviour. Objects are the basic elements of object-oriented system.

Class

Classes are “families” of objects. A class is a pattern that describes how objects will be built.

Introduction based on the complex type#

These concepts are so important for Python that we already used many objects and classes.

In particular, str, list and dict are “types”, or “classes”. In Python, these two names basically means the same. We tend to use “types” for building types and classes for types defined in libraries or in user code.

We have also used complex to do things like:

complex_number = complex("1j")

Here, we have just instantiated (i.e. create an instance of a class) the builtin type complex.

We can use the dir function to get its attribute names. We filter out the names starting by __ since they are special methods.

[name for name in dir(complex_number) if not name.startswith("__")]
['conjugate', 'imag', 'real']

real and imag are simple attributes and conjugate is a method (which can be called):

complex_number.real
0.0
result = complex_number.conjugate()
result
-1j

We are now going to see how to define our own Complex class.

Attributes and __init__ special method#

You remember that it is better to first define a test function which defines what we want. Let us start with very simple requirements.

def test_complex_attributes(cls):
    number = cls("1j")
    assert number.imag == 1.0
    assert number.real == 0.0

    number = cls("1")
    assert number.imag == 0.0
    assert number.real == 1.0

    number = cls(1)
    assert number.imag == 0.0
    assert number.real == 1.0

We can check if it works with the builtin complex type:

test_complex_attributes(complex)

No assert error indicates that our test is reasonable.

We can start by a too simple implementation

class Complex:
    """Our onw complex class"""
test_complex_attributes(Complex)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 test_complex_attributes(Complex)

Cell In[6], line 2, in test_complex_attributes(cls)
      1 def test_complex_attributes(cls):
----> 2     number = cls("1j")
      3     assert number.imag == 1.0
      4     assert number.real == 0.0

TypeError: Complex() takes no arguments

We need to improve our implementation, which can lead to something like:

class Complex:
    def __init__(self, obj):
        if isinstance(obj, str):
            obj = obj.strip()
            if obj.endswith("j"):
                self.real = 0.0
                self.imag = float(obj[:-1])
                # warning: early return
                return

        self.real = float(obj)
        self.imag = 0.0

We defined a class with one __init__ method. Note that the methods take as first argument a variable named self. The name self is just a convention but in practice it is nearly always used. This first argument is the object used for the call of the method.

Note

We are going to understand that better in few minutes but the __init__ method is really not adapted to explain this mechanism. So we will first see how this works for a simpler method and then come back to the __init__ case.

Let us check if this implementation meet our requirements:

test_complex_attributes(Complex)

No assert error mean that this implementation is enough.

Add the conjugate method#

We are new going to focus on the conjugate method with this test:

def test_complex_conjugate(cls):
    number = cls("1j").conjugate()
    assert number.imag == -1.0
    assert number.real == 0.0
test_complex_conjugate(complex)
test_complex_conjugate(Complex)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 1
----> 1 test_complex_conjugate(Complex)

Cell In[12], line 2, in test_complex_conjugate(cls)
      1 def test_complex_conjugate(cls):
----> 2     number = cls("1j").conjugate()
      3     assert number.imag == -1.0
      4     assert number.real == 0.0

AttributeError: 'Complex' object has no attribute 'conjugate'

As expected, we have an exception. Let us modify our Complex class to fix that.

class Complex:
    def __init__(self, real=0.0, imag=0.0):
        if isinstance(real, str):
            real = real.strip()
            if real.endswith("j"):
                if imag != 0.0:
                    raise TypeError(
                        "Complex() can't take second arg if first is a string"
                    )
                self.real = 0.0
                self.imag = float(real[:-1])
                return

        self.real = float(real)
        self.imag = imag

    def conjugate(self):
        """Return the complex conjugate of its argument."""
        return Complex(real=self.real, imag=-self.imag)

Let’s check if it is sufficient:

test_complex_conjugate(Complex)

Note

Numbers in Python are immutable. Complex.conjugate returns a new object and does not modify the object used for the call.

We can now come back to this weird self argument and note that:

number = Complex(imag=4)
assert Complex.conjugate(number).imag == -number.imag
assert number.conjugate().imag == -number.imag

Important

We now understand the purpose of the first argument of a method (self). It is the object with whom the method is called.

Special (“dunder”) methods#

Special methods (also known as “dunder methods”) are methods whose name starts with __. They are used to define how objects behave in specific situations. Python objects have a lot of dunder methods:

complex_number = complex(imag=2)
[name for name in dir(complex_number) if name.startswith("__")]
['__abs__',
 '__add__',
 '__bool__',
 '__class__',
 '__complex__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__rpow__',
 '__rsub__',
 '__rtruediv__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__']

Let’s now print a complex object. In IPython, we can just use its name for the last instruction of a cell:

complex_number
2j

Note that we can get the string used by IPython by calling the builtin function str:

str(complex_number)
'2j'

Or by directly calling the special method __str__ of the object:

complex_number.__str__()
'2j'

This actually approximately what happens when we just write complex_number. IPython produces a string with str(complex_number) and the str function calls complex_number.__str__().

Let us see what happens for our object:

number = Complex(imag=2)
number
<__main__.Complex at 0x7f45e052e570>

Hum, not great. So we should write a test about this behaviour.

def test_complex_str(cls):
    number = cls(imag=2)
    assert str(number) == "2j"

Does it pass with the builtin complex type?

test_complex_str(complex)

Does it fail with our onw type?

test_complex_str(Complex)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[25], line 1
----> 1 test_complex_str(Complex)

Cell In[23], line 3, in test_complex_str(cls)
      1 def test_complex_str(cls):
      2     number = cls(imag=2)
----> 3     assert str(number) == "2j"

AssertionError: 

Good, let’s work on this:

class Complex:
    def __init__(self, real=0.0, imag=0.0):
        if isinstance(real, str):
            real = real.strip()
            if real.endswith("j"):
                if imag != 0.0:
                    raise TypeError(
                        "Complex() can't take second arg if first is a string"
                    )
                self.real = 0.0
                self.imag = float(real[:-1])
                return

        self.real = float(real)
        self.imag = imag

    def conjugate(self):
        """Return the complex conjugate of its argument."""
        return Complex(real=self.real, imag=-self.imag)

    def __str__(self):
        if self.real == 0.0:
            return f"{self.imag}j"
        return f"{self.real} + {self.imag}j"

Does it work better now?

test_complex_str(Complex)

Note on test coverage problem

Note that the last line of the class definition is not tested (return f"{self.real} + {self.imag}j"). This is bad. It could be badly modified and the tests would still pass. For real life code, one can consider and try to maximize the test coverage, which is approximately defined as the percentage of lines covered by some tests.

Difference between __str__ and __repr__

We don’t care too much at this point, but these two different special methods exist. In few words:

  • The goal of __str__ is to be readable.

  • __repr__ has to be unambiguous.

Back to the __init__ special method#

number = Complex("1j")

is actually equivalent to:

# create a non-initialized object
# (no need to study and understand this line)
number = Complex.__new__(Complex)
# initialization of the object
number.__init__("1j")

You should now understand that the last line is equivalent to:

Complex.__init__(number, "1j")

Example: the weather stations#

Solution 0: a list of lists#

Let us suppose we have a set of weather stations that do measurements of wind speed and temperature. Suppose now one wants to compute some statistics on these data. A basic representation of a station will be an array of arrays: wind values and temperature values.

paris = [[10, 0, 20, 30, 20, 0], [1, 5, 1, -1, -1, 3]]

# get wind when temperature is maximal
idx_max_temp = paris[1].index(max(paris[1]))
print(f"max temp is {paris[1][idx_max_temp]}°C at index {idx_max_temp} ")
print(f"wind speed at max temp = {paris[0][idx_max_temp]} km/h")
max temp is 5°C at index 1 
wind speed at max temp = 0 km/h

Comments on this solution#

Many problems:

  • if the number of measurements increases (e.g. having rainfall, humidity, …) the previous indexing will not be valid (what will paris[5] represent? wind, temperature, …, ?)

  • Code analysis is not (that) straightforward

Solution 1: a dict of lists#

We can use a dictionnary:

paris = {"wind": [10, 0, 20, 30, 20, 0], "temperature": [1, 5, 1, -1, -1, 3]}

# get wind when temperature is minimal
paris_temp = paris["temperature"]
idx_max_temp = paris_temp.index(max(paris_temp))

print(f"max temp is {paris_temp[idx_max_temp]}°C at index {idx_max_temp}")
print(f"wind speed at max temp = {paris['wind'][idx_max_temp]} km/h")
max temp is 5°C at index 1
wind speed at max temp = 0 km/h

Comments#

  • Pro

    • More readable code (reading paris["temperature"] is clearer than paris[1]).

    • Less error prone code (i.e. using words as keys allow to not use index numbers that are easily mistaken and lead to code that is hard to read and debug)

  • Con

    • The code to compute the final result is not very readable

Solution 2: add functions#

paris = {"wind": [10, 0, 20, 30, 20, 0], "temperature": [1, 5, 1, -1, -1, 3]}


def max_temp(station):
    """returns the maximum temperature available in the station"""
    return max(station["temperature"])


def arg_max_temp(station):
    """returns the index of maximum temperature available in the station"""
    max_temperature = max_temp(station)
    return station["temperature"].index(max_temperature)


idx_max_temp = arg_max_temp(paris)

print(f"max temp is {max_temp(paris)}°C at index {arg_max_temp(paris)}")
print(f"wind speed at max temp = {paris['wind'][idx_max_temp]} km/h")
max temp is 5°C at index 1
wind speed at max temp = 0 km/h

Comments#

  • Pro:

    • Adding functions leads to a code that is easier to read, hence easier to debug.

    • Testing functions can be done separately from the rest of the code.

    • The computation done on the second part depends upon the functions (i.e it depends on the function definitions not their implementations).

    • Adding function allows to reuse code: computing the max temperature is something one could want to do in other places.

  • Con

    • We rely on the fact that the dictionnaries have been built correctly (for example wind and temperature arrays have the same length).

Solution 3: init function#

Define a function that builds the station (delegate the generation of the station dictionnary to a function).

def build_station(wind, temp):
    """Build a station given wind and temp
    :param wind: (list) floats of winds
    :param temp: (list) float of temperatures
    """
    if len(wind) != len(temp):
        raise ValueError("wind and temperature should have the same size")
    return {"wind": list(wind), "temperature": list(temp)}


def max_temp(station):
    """returns the maximum temperature available in the station"""
    return max(station["temperature"])


def arg_max_temp(station):
    """returns the index of maximum temperature available in the station"""
    max_temperature = max_temp(station)
    return station["temperature"].index(max_temperature)


paris = build_station([10, 0, 20, 30, 20, 0], [1, 5, 1, -1, -1, 3])
idx_max_temp = arg_max_temp(paris)

print(f"max temp is {max_temp(paris)}°C at index {arg_max_temp(paris)}")
print(f"wind speed at max temp = {paris['wind'][idx_max_temp]} km/h")
max temp is 5°C at index 1
wind speed at max temp = 0 km/h

Comments#

  • If the dedicated function build_station is used, the returned dictionary is well structured.

  • If one changes build_station, only max_temp and arg_max_temp have to be changed accordingly

  • We use a list comprehension to be able to have parameters wind and temp provided by any ordered iterable (e.g. see test_build_station_with_iterable wtih range)

  • BUT if we have a new kind of station, i.e. that holds only wind and humidity, we want to avoid to be able to use max_temp with it.

Solution 4: using a class#

We would like to “embed” the max_temp and the arg_max_temp in the “dictionnary station” in order to address the last point.

And here comes object-oriented programming !

A class defines a template used for building object. In our example, the class (named WeatherStation) defines the specifications of what is a weather station (i.e, a weather station should contain an array for wind speeds, named “wind”, and an array for temperatures, named “temp”). paris should now be an object that answers to these specifications. Is is called an instance of the class WeatherStation.

When defining the class, we need to define how to initialize the object (special “function” __init__).

class WeatherStation(object):
    """A weather station that holds wind and temperature

    :param wind: any ordered iterable
    :param temperature: any ordered iterable

    wind and temperature must have the same length.

    """

    def __init__(self, wind, temperature):
        """initialize the weather station.
        Precondition: wind and temperature must have the same length.
                      ValueError is raised if this is not the case
        :param wind: any ordered iterable
        :param temperature: any ordered iterable"""
        self.wind = list(wind)
        self.temp = list(temperature)
        if len(self.wind) != len(self.temp):
            raise ValueError(
                "wind and temperature should have the same size"
                f" got len(wind)={len(self.wind)} vs "
                f" len(temp)={len(self.temp)}"
            )

    def max_temp(self):
        """returns the maximum temperature recorded in the station"""
        return max(self.temp)

    def arg_max_temp(self):
        """returns the index of (one of the) maximum temperature recorded in the station"""
        return self.temp.index(self.max_temp())


paris = WeatherStation([10, 0, 20, 30, 20, 0], [1, 5, 1, -1, -1, 3])
idx_max_temp = paris.arg_max_temp()

print(f"max temp is {paris.max_temp()}°C at index {paris.arg_max_temp()}")
print(f"wind speed at max temp = {paris.wind[idx_max_temp]} km/h")
max temp is 5°C at index 1
wind speed at max temp = 0 km/h

Comments#

  • The max_temp and the arg_max_temp are now part of the class WeatherStation. Functions attached to classes are named methods. Similary, wind and temp lists are also now part this class. Variables attached to classes are named members or attributes.

  • if max_temp method is called in many places, we can improve it by caching the result. This will not affect code the uses the class.

  • arg_max_temp method should be rewriten as we implicitelly check equality of floats.

An object (here paris) thus contains both attributes (holding data for example) and methods to access and/or process the data.

Exercise 21 (Try to code with class)

  • Add a method (perceived_temp) that takes as input a temperature and wind and return the perceived temperature, i.e. taking into account the wind chill effect.

  • Modify max_temp and arg_max_temp so that they take an additional optional boolean parameter (e.g. perceived default to False). If perceived is False, the methods have the same behaviour as before. If perceived is True, the temperatures to process are the perceived temperatures.

Comments#

  • The wind array was changed to have different maximum temperatures for the air and perceived temperatures: for air temperatures, the max is 5°C (with a wind speed 50 km/h). For perceived temperatures, the max is 3°C (as the wind speed is 0).

  • It was a choice to set the apparent/perceived temperature to the air temperature if the wind speed is 0 so the tests were written with this in mind. Testing such choices allows to have clear inputs/outputs.

  • isinstance allows to test the type of an object (in this case, we test if apparent_temps is a list)

  • When testing boolean in if structures: use if perceived: rather than if perceived == True:. It is equivalent but clearer and shorter !

Coming next: inheritance#

What if we now have a weather station that also measure humidity ?

Do we need to rewrite everything ?

What if we rewrite everything and we find a bug ?

Here comes inheritance