If you’ve worked with Lists in Python before, you’ll quickly realise that they work differently from primitives like integers and strings. Consider the following:
a = "hello" b = a a = "world" print(a) # Outputs world print(b) # Outputs hello
Notice that changing the value of a
does not change the value of b
. This is called passing by value. In Python, Lists do not behave this way:
a = [2, 3, 4, 5] b = a a.append(6) print(a) # Outputs [2, 3, 4, 5, 6] print(b) # Outputs [2, 3, 4, 5, 6]
In the above example, notice that changing the value of List a
also changes the value of List b
. This is because both a
and b
are referring to the same List, and this is called passing by reference.
Creating a separate copy
To work with a separate List in a new variable, we have to call the copy()
method:
a = [2, 3, 4, 5] b = a.copy() a.append(6) print(a) # Outputs [2, 3, 4, 5, 6] print(b) # Outputs [2, 3, 4, 5]
This does not work, however, if your List contains other Lists (or other kinds of Collections like Sets and Dictionaries).
marksheet = [ ["P", 68], ["M", 76] ] copied_marksheet = marksheet.copy() copied_marksheet[0][0] = "PM" print(marksheet) # Outputs [["PM", 68], ["M", 76]] print(copied_marksheet) # Outputs [["PM", 68], ["M", 76]]
Here, modifying marksheet
affects the copied_marksheet
because only the external List is copied, and the nested (internal) Lists are not. To properly copy nested Lists like these, you need to do a deep copy:
marksheet = [ ["P", 68], ["M", 76] ] copied_marksheet = [] for row in marksheet: copied_marksheet.append(row.copy()) copied_marksheet[0][0] = "PM" print(marksheet) # Outputs [["P", 68], ["M", 76]] print(copied_marksheet) # Outputs [["PM", 68], ["M", 76]]
A more usable deep copy
So, do we have to devise a proper copy function for every list that we have? What if we have a List that has multiple levels of nested Lists?
marksheet = [ ["P", [68, 69]] ["M", [76, 79]] ]
Fortunately for us, Python has a library for that. It’s called copy
. To copy marksheet
above, here’s all we need to do:
import copy copied_marksheet = copy.deepcopy(marksheet) copied_marksheet[0][0] = "PP" print(marksheet) # Outputs [["P", [68, 69]], ["M", [76,79]]] print(copied_marksheet) # Outputs [["PP", [68, 69]], ["M", [76,79]]]
This creates a completely new copy of marksheet
onto copied_marksheet
for our use.
Not exclusive to Lists
Note that these types of behaviours do not apply exclusively to Lists — other Collections in Python behave this way too:
a = { 'Peter': 67, 'Mary': 56 } b = a b['Peter'] = 88 print(a) # Outputs {'Peter': 88, 'Mary': 56} print(b) # Outputs {'Peter': 88, 'Mary': 56}
As well as objects in Python:
class Orange: seeds = 0 size = 0 def __init__(self, seeds, size): self.seeds = seeds self.size = size def __str__(self): return "Orange { seeds: " + str(self.seeds) + ", size: " + str(self.size) + "}" a = Orange(10, 5) b = a b.seeds = 22 print(a) # Outputs Orange { seeds: 22, size: 5 } print(b) # Outputs Orange { seeds: 22, size: 5 }
For both Collections and objects, you can use copy.deepcopy()
to ensure that nested objects and collections get copied too when you are making a duplicate.
Conclusion
Pass by reference behaviour in objects and Collections are not exclusive to Python. In many other languages (Javascript immediately comes to mind), objects and Collections behave in the same manner too, and similar functions or methods of deep copying objects and Collections exist.
Whenever you are working with Collections or objects, and find that changing the property of one changes another, the first question that should come to your mind is whether you have copied or duplicated these objects correctly.