Shallow vs. deep copying in Python

Shallow vs. deep copying in Python

If you’ve worked with Lists in Python before, you’ll quickly realise that they work differently from primitives like integers and strings. Consider the following:

a = "hello"
b = a
a = "world"
print(a) # Outputs world
print(b) # Outputs hello

Notice that changing the value of a does not change the value of b. This is called passing by value. In Python, Lists do not behave this way:

a = [2, 3, 4, 5]
b = a
a.append(6)
print(a) # Outputs [2, 3, 4, 5, 6]
print(b) # Outputs [2, 3, 4, 5, 6]

In the above example, notice that changing the value of List a also changes the value of List b. This is because both a and b are referring to the same List, and this is called passing by reference.

Creating a separate copy

To work with a separate List in a new variable, we have to call the copy() method:

a = [2, 3, 4, 5]
b = a.copy()
a.append(6)
print(a) # Outputs [2, 3, 4, 5, 6]
print(b) # Outputs [2, 3, 4, 5]

This does not work, however, if your List contains other Lists (or other kinds of Collections like Sets and Dictionaries).

marksheet = [
    ["P", 68],
    ["M", 76]
]
copied_marksheet = marksheet.copy()
copied_marksheet[0][0] = "PM"

print(marksheet) # Outputs [["PM", 68], ["M", 76]]
print(copied_marksheet) # Outputs [["PM", 68], ["M", 76]]

Here, modifying marksheet affects the copied_marksheet because only the external List is copied, and the nested (internal) Lists are not. To properly copy nested Lists like these, you need to do a deep copy:

marksheet = [
    ["P", 68],
    ["M", 76]
]
copied_marksheet = []
for row in marksheet:
    copied_marksheet.append(row.copy())
copied_marksheet[0][0] = "PM"

print(marksheet) # Outputs [["P", 68], ["M", 76]]
print(copied_marksheet) # Outputs [["PM", 68], ["M", 76]]

Article continues after the advertisement:


A more usable deep copy

So, do we have to devise a proper copy function for every list that we have? What if we have a List that has multiple levels of nested Lists?

marksheet = [
    ["P", [68, 69]]
    ["M", [76, 79]]
]

Fortunately for us, Python has a library for that. It’s called copy. To copy marksheet above, here’s all we need to do:

import copy
copied_marksheet = copy.deepcopy(marksheet)
copied_marksheet[0][0] = "PP"

print(marksheet) # Outputs [["P", [68, 69]], ["M", [76,79]]]
print(copied_marksheet) # Outputs [["PP", [68, 69]], ["M", [76,79]]]

This creates a completely new copy of marksheet onto copied_marksheet for our use.

Not exclusive to Lists

Note that these types of behaviours do not apply exclusively to Lists — other Collections in Python behave this way too:

a = { 'Peter': 67, 'Mary': 56 }
b = a
b['Peter'] = 88

print(a) # Outputs {'Peter': 88, 'Mary': 56}
print(b) # Outputs {'Peter': 88, 'Mary': 56}

As well as objects in Python:

class Orange:
    seeds = 0
    size = 0
    def __init__(self, seeds, size):
        self.seeds = seeds
        self.size = size

    def __str__(self):
        return "Orange { seeds: " + str(self.seeds) + ", size: " + str(self.size) + "}"

a = Orange(10, 5)
b = a
b.seeds = 22

print(a) # Outputs Orange { seeds: 22, size: 5 }
print(b) # Outputs Orange { seeds: 22, size: 5 }

For both Collections and objects, you can use copy.deepcopy() to ensure that nested objects and collections get copied too when you are making a duplicate.

Conclusion

Pass by reference behaviour in objects and Collections are not exclusive to Python. In many other languages (Javascript immediately comes to mind), objects and Collections behave in the same manner too, and similar functions or methods of deep copying objects and Collections exist.

Whenever you are working with Collections or objects, and find that changing the property of one changes another, the first question that should come to your mind is whether you have copied or duplicated these objects correctly.


Article continues after the advertisement:


Leave a Reply

Your email address will not be published.