|Ned Batchelder : Blog | Code | Text | Site|
Facts and myths about Python names and values
» Home : Text
Created 6 July 2013, last updated 26 January 2014
The behavior of names and values in Python can be confusing. Like many parts of Python, it has an underlying simplicity that can be hard to discern, especially if you are used to other programming languages. Here I'll explain how it all works, and present some facts and myths along the way.
BTW: I worked this up into a presentation for PyCon 2015: Python Names and Values.
Let's start simple:
Fact: Names refer to values.
As in many programming languages, a Python assignment statement associates a symbolic name on the left-hand side with a value on the right-hand side. In Python, we say that names refer to values, or a name is a reference to a value:
Now the name "x" refers to the value 23. The next time we use the name x, we'll get the value 23:
Exactly how the name refers to the value isn't really important. If you're experienced with the C language, you might like to think of it as a pointer, but if that means nothing to you then don't worry about it.
To help explain what's going on, I'll use diagrams. A gray rectangular tag-like shape is a name, with an arrow pointing to its value. Here's the name x referring to an integer 23:
I'll be using these diagrams to show how Python statements affect the names and values involved. (The diagrams are SVG, if they don't render, let me know.)
Another way to explore what's going on with these code snippets is to try them on pythontutor.com, which cleverly diagrams your code as it runs. I've included links there with some of the examples.
Fact: Many names can refer to one value.
There's no rule that says a value can only have one name. An assignment statement can make a second (or third, ...) name refer to the same value.
Now x and y both refer to the same value:
Neither x or y is the "real" name. They have equal status: each refers to the value in exactly the same way.
Fact: Names are reassigned independently of other names.
If two names refer to the same value, this doesn't magically link the two names. Reassigning one of them won't reassign the other also:
When we said "y = x", that doesn't mean that they will always be the same forever. Reassigning x leaves y alone. Imagine the chaos if it didn't!
Fact: Values live until nothing references them.
Python keeps track of how many references each value has, and automatically cleans up values that have none. This is called "garbage collection," and means that you don't have to get rid of values, they go away by themselves when they are no longer needed.
Exactly how Python keeps track is an implementation detail, but if you hear the term "reference counting," that's an important part of it. Sometimes cleaning up a value is called reclaiming it.
An important fact about assignment:
Fact: Assignment never copies data.
When values have more than one name, it's easy to get confused and think of it as two names and two values:
Assigning a value to a name never copies the data, it never makes a new value. Assignment just makes the name on the left refer to the value on the right. In this case, we have only one 23, and x and y both refer to it, just as we saw in the last diagrams.
Things get more interesting when we have more complicated values, like a list:
Now if we assign nums to another name, we'll have two names referring to the same list:
Remember: assignment never makes new values, and it never copies data. This assignment statement doesn't magically turn my list into two lists.
At this point, we have one list, referred to by two names, which can lead to a big surprise which is common enough I'm going to give it a catchy name: the Mutable Presto-Chango.
Fact: Changes in a value are visible through all of its names. (Mutable Presto-Chango)
Values fall into two categories based on their type: mutable or immutable. Immutable values include numbers, strings, and tuples. Almost everything else is mutable, including lists, dicts, and user-defined objects. Mutable means that the value has methods that can change the value in-place. Immutable means that the value can never change, instead when you think you are changing the value, you are really making new values from old ones.
Since numbers are immutable, you can't change one in-place, you can only make a new value and assign it to the same name:
Here, x+1 computes an entirely new value, which is then assigned to x.
With a mutable value, you can change the value directly, usually with a method on the value:
First we assign a list to a name:
Then we append another value onto the list:
Here we haven't changed which value nums refers to. At first, the name nums refers to a three-element list. Then we use the name nums to access the list, but we don't assign to nums, so the name continues to refer to the same list. The append method modifies that list by appending 4 to it, but it's the same list, and nums still refers to it. This distinction between assigning a name and changing a value is sometimes described as "rebinding the name vs. mutating the value."
Notice that informal English descriptions can be ambigious. We might say that "x = x+1" is changing x, and "nums.append(4)" is changing nums, but they are very different kinds of change. The first makes x refer to a new value (rebinding), the second is modifying the value x refers to (mutating).
Here's where people get surprised: if two names refer to the same value, and the value is mutated, then both names see the change:
Why did tri change!? The answer follows from what we've learned so far. Assignment never copies values, so after the assignment to tri, we have two names referring to the same list:
Then we mutate the list by calling .append(4), which modifies the list in place. Since tri refers to that list, when we look at tri we see the same list as nums, which has been changed, so tri now shows four numbers also:
This Mutable Presto-Chango is the biggest issue people have with Python's names and values. A value is shared by more than one name, and is modified, and all names see the change. To make the Presto-Chango happen, you need:
Keep in mind, this is not a bug in Python, however much you might wish that it worked differently. Many values have more than one name at certain points in your program, and it's perfectly fine to mutate values and have all the names see the change. The alternative would be for assignment to copy values, and that would make your programs unbearably slow.
Myth: Python assigns mutable and immutable values differently.
Because the Presto-Chango only happens with mutable values, some people believe that assignment works differently for mutable values than for immutable values. It doesn't.
All assignment works the same: it makes a name refer to a value. But with an immutable value, no matter how many names are referring to the same value, the value can't be changed in-place, so you can never get into a surprising Presto-Chango situation.
I said earlier that Python has an underlying simplicity. Its mechanisms are quite simple, but they manifest in a number of ways.
Fact: References can be more than just names.
All of the examples I've been using so far used names as references to values, but other things can be references. Python has a number of compound data structures each of which hold references to values: list elements, dictionary keys and values, object attributes, and so on. Each of those can be used on the left-hand side of an assignment, and all the details I've been talking about apply to them. Anything that can appear on the left-hand side of an assignment statement is a reference, and everywhere I say "name" you can substitute "reference".
In our diagrams of lists, I've shown numbers as the elements, but really, each element is a reference to a number, so it should be drawn like this:
But that gets complicated quickly, so I've used a visual shorthand:
If you have list elements referring to other mutable values, like sub-lists, it's important to remember that the list elements are just references to values.
Here are some other assignments. Each of these left-hand sides is a reference:
and so on. Lots of Python data structures hold values, and each of those is a reference. All of the rules here about names apply exactly the same to any of these references. For example, the garbage collector doesn't just count names, it counts any kind of reference to decide when a value can be reclaimed.
Note that "i = x" assigns to the name i, but "i = x" doesn't, it assigns to the first element of i's value. It's important to keep straight what exactly is being assigned to. Just because a name appears somewhere on the left-hand side of the assignment statement doesn't mean the name is being rebound.
Fact: Lots of things are assignment
Just as many things can serve as references, there are many operations in Python that are assignments. Each of these lines is an assignment to the name X:
I don't mean that these statements act kind of like assignments. I mean that these are assignments: they all make the name X refer to a value, and everything I've been saying about assignments applies to all of them uniformly.
For the most part, these statements define X in the same scope as the statement, but not all of them, especially the comprehensions, and the details differ slightly between Python 2 and Python 3. But they are all real assignments, and every fact about assignment applies to all of them.
Fact: Python passes function arguments by assigning to them.
Let's examine the most interesting of these alternate assignments: calling a function. When I define a function, I name its parameters:
Here x and y are the parameters of the function my_func. When I call my_func, I provide actual values to be used as the arguments of the function. These values are assigned to the parameter names just as if an assignment statement had been used:
When my_func is called, the name x has 8 assigned to it, and the name y has 9 assigned to it. That assignment works exactly the same as the simple assignment statements we've been talking about. The names x and y are local to the function, so when the function returns, those names go away. But if the values they refer to are still referenced by other names, the values live on.
Just like every other assignment, mutable values can be passed into functions, and changes to the value will be visible through all of its names:
This can produce surprising results, so let's take this step by step. When we call augment_twice, the names and values look like this:
The local names in the function are drawn in a new frame. Calling the function assigned the actual values to the parameter names, just like any other assignment statement. Remember that assignment never makes new values or copies any data, so the here the local name a_list refers to the same value that was passed in, nums.
Then we call a_list.append twice, which mutates the list:
When the function ends, the local names are destroyed. Values that are no longer referenced are reclaimed, but others remain:
You can try this example code yourself on pythontutor.com.
We passed the list into the function, which modified it. No values were copied. Although this behavior might be surprising, it's essential. Without it, we couldn't write methods that modify objects.
Here's another way to write the function, but it doesn't work. Let's see why.
At the moment we call augment_twice_bad, it looks the same as we saw earlier with augment_twice:
The next statement is an assignment. The expression on the right-hand side makes a new list, which is then assigned to a_list:
When the function ends, its local names are destroyed, and any values no longer referenced are reclaimed, leaving us just where we started:
(Try this code on pythontutor.com.)
It's really important to keep in mind the difference between mutating a value in place, and rebinding a name. augment_twice worked because it mutated the value passed in, so that mutation was available after the function returned. augment_twice_bad used an assignment to rebind a local name, so the changes weren't visible outside the function.
Another option for our function is to make a new value, and return it:
Here we make an entirely new value inside augment_twice_good, and return it from the function. The caller uses an assignment to hold onto that value, and we get the effect we want.
This last function is perhaps the best, since it creates the fewest surprises. It avoids the Presto-Chango by not mutating a value in-place, and only creating new values.
There's no right answer to choosing between mutating and rebinding: which you use depends on the effect you need. The important thing is to understand how each behaves, to know what tools you have at your disposal, and then to pick the one that works best for your particular problem.
Some details about Python names and values:
Fact: Any name can refer to any value at any time.
Python is dynamically typed, which means that names have no type. Any name can refer to any value at any time. A name can refer to an integer, and then to a string, and then to a function, and then to a module. Of course, this could be a very confusing program, and you shouldn't do it, but the Python language won't mind.
Fact: Names have no type, values have no scope.
Just as names have no type, values have no scope. When we say that a function has a local variable, we mean that the name is scoped to the function: you can't use the name outside the function, and when the function returns, the name is destroyed. But as we've seen, if the name's value has other references, it will live on beyond the function call. It is a local name, not a local value.
Fact: Values can't be deleted, only names can.
Python's memory management is so central to its behavior, not only do you not have to delete values, but there is no way to delete values. You may have seen the del statement:
This does not delete the value nums, it deletes the name nums. The name is removed from its scope, and then the usual reference counting kicks in: if nums' value had only that one reference, then the value will be reclaimed. But if it had other references, then it will not.
Myth: Python has no variables.
Some people like to say, "Python has no variables, it has names." This slogan is misleading. The truth is that Python has variables, they just work differently than variables in C.
Names are Python's variables: they refer to values, and those values can change (vary) over the course of your program. Just because another language (albeit an important one) behaves differently is no reason to describe Python as not having variables.
Myth? Python is confusing.
I hope this has helped clarify how names and values work in Python. It's a very simple mechanism that can be surprising, but is very powerful. Especially if you are used to languages like C, you'll have to think about your values differently.
There are lots of side trips that I skipped here:
If you are looking for more information about these topics, try: