Understanding Copy in Python — Copy, Shallow Copy and Deep Copy
In this post, the different type of copy statement in python is demonstrated using the id
keyword. The id
keyword of Python is used to return the identity of the object i.e the address of the object in memory [1].
>>> a = 5
>>> id(a) # prints the address of `a` in memory
140641120530224
Assignment statement in Python do not copy objects, they only create a binding between the target and the original [2]. For example, in the following code snippet, we observe that both a
and b
point to the same location in memory because the assignment statement created only a binding between the target and the original.
>>> a = 5
>>> id(a) # address of `a` in memory
4310045424
>>> b = a
>>> id(b) # address of `b` in memory
4310045424
Shallow Copy
Consider the following code snippet:
>>> a = [1, 1, 1]
>>> b = a
>>> a[0] = 2
>>> a
[2, 1, 1]
>>> b # note that we have not performed b[0] = 2, only b = a
[2, 1, 1]
>>> id(a)
140641120890464
>>> id(b) # note that `a` and `b` point to same location in memory
140641120890464
When we are assigning a
to b
, the address of a
is assigned to b
. Thus, when we make a change to one of a
’s value, the change also get reflected in b
.
Shallow copy helps in preventing this issue by constructing a new compound object and then (to the extent possible), inserts references into it to the objects found in the original. Here is an example demonstrating it:
>>> import copy
>>> a = [1, 1, 1]
>>> b = copy.copy(a) # b is a shallow copy of a
>>> a[0] = 2
>>> a
[2, 1, 1]
>>> b # note that b's value is not changed
[1, 1, 1]
>>> id(a)
140641121848368
>>> id(b)
140641389116384
In the above example, even though we set a[0] = 2
, we observe that b[0]
remains the same. The reason here is that b
is a new object created by performing shallow copy on a
. This can be observed by seeing that id(a)
does not equal id(b)
. An another way to perform shallow copy in list
is via built-in copy
function(ex: b = a.copy()
performs a shallow copy of a
, where type(a) == list
)
Deep Copy
Consider the below code snippet:
>>> a0, a1 = [0, 0], [1, 1]
>>> a = [a0, a1]
>>> b = copy.copy(a)
>>> a[0][0] = 2
>>> b
[[2, 0], [1, 1]]
Even though we perform a shallow copy of a
into b
, we observe that b
’s value had changed which should have not been. This happens because in a shallow copy for a data type containing nested layers of objects, the copy creates a new compound object and for the objects inside the original variable, it inserts references of those objects inside the target variable. Let’s dig deeper into it by printing id’s of a
and b
.
>>> id(a)
140641121848448
>>> id(b) # b and a are in different location in memory (id(a) != id(b)
140641121387136
>>> id(a[0])
140641121848368
>>> id(b[0]) # b[0] and a[0] points to the same location in memory, thereby shallow copy has inserted reference of objects inside `a` into the target `b`
140641121848368
>>> id(a[1]) # also b[1] and a[1] are pointing to the same location in memory
140641121848048
>>> id(b[1])
140641121848048
From above, we see that even though a new copy of b
has been created, the values inside a
has not been copied. Inside b
, the references of a
has been inserted when shallow copy is used.
This problem can be alleviated by using copy.deepcopy
. deepcopy
creates a new compound object and then recursively, inserts copies into it of the objects found in the original.
>>> import copy
>>> a0, a1 = [0, 0], [1, 1]
>>> a = [a0, a1]
>>> b = copy.deepcopy(b)
>>> a0, a1 = [0, 0], [1, 1]
>>> a = [a0, a1]
>>> b = copy.deepcopy(a)
>>> a[0][0] = 2
>>> a
[[2, 0], [1, 1]]
>>> b
[[0, 0], [1, 1]]
Now, try checking the id’s of a[0]
and b[0]
. They should be different! Finally, there are two issues with deepcopy and why it should not be used in all scenarios:
- deepcopy recursively copies objects from the original into the target. This may cause a recursive loop.
- deepcopy consumes more memory than shallow copy.