Chapter 7. Dictionaries
This chapter discusses dictionaries, Python’s name for associative arrays or maps, which it implements by using hash tables. Dictionaries are amazingly useful, even in simple programs.
Because dictionaries are less familiar to many programmers than other basic data structures such as lists and strings, some of the examples illustrating dictionary use are slightly more complex than the corresponding examples for other built-in data structures. It may be necessary to read parts of chapter 8 to fully understand some of the examples in this chapter.
7.1. What is a dictionary?
If you’ve never used associative arrays or hash tables in other languages, a good way to start understanding the use of dictionaries is to compare them with lists:
Values in lists are accessed by means of integers called indices, which indicate where in the list a given value is found.
Dictionaries access values by means of integers, strings, or other Python objects called keys, which indicate where in the dictionary a given value is found. In other words, both lists and dictionaries provide indexed access to arbitrary values, but the set of items that can be used as dictionary indices is much larger than, and contains, the set of items that can be used as list indices. Also, the mechanism that dictionaries use to provide indexed access is quite different from that used by lists.
Both lists and dictionaries can store objects of any type.
Values stored in a list are implicitly ordered by their positions in the list, because the indices that access these values are consecutive integers. You may or may not care about this ordering, but you can use it if desired. Values stored in a dictionary are not implicitly ordered relative to one another because dictionary keys aren’t just numbers. Note that if you’re using a dictionary but also care about the order of the items (the order in which they were added, that is), you can use an ordered dictionary, which is a dictionary subclass that can be imported from the collections module. You can also define an order on the items in a dictionary by using another data structure (often a list) to store such an ordering explicitly; this won’t change the fact that basic dictionaries have no implicit (built-in) ordering.
In spite of the differences between them, the use of dictionaries and lists often appears to be the same. As a start, an empty dictionary is created much like an empty list, but with curly braces instead of square brackets:
Here, the first line creates a new, empty list and assigns it to x. The second line creates a new, empty dictionary and assigns it to y.
After you create a dictionary, you may store values in it as though it were a list:
Even in these assignments, there’s already a significant operational difference between the dictionary and list usage. Trying to do the same thing with a list would result in an error, because in Python, it’s illegal to assign to a position in a list that doesn’t exist. For example, if you try to assign to the 0th element of the list x, you receive an error:
This isn’t a problem with dictionaries; new positions in dictionaries are created as necessary.
Having stored some values in the dictionary, now you can access and use them:
All in all, this makes a dictionary look pretty much like a list. Now for the big difference. Store (and use) some values under keys that aren’t integers:
This is definitely something that can’t be done with lists! Whereas list indices must be integers, dictionary keys are much less restricted; they may be numbers, strings, or one of a wide range of other Python objects. This makes dictionaries a natural for jobs that lists can’t do. For example, it makes more sense to implement a telephone-directory application with dictionaries than with lists because the phone number for a person can be stored indexed by that person’s last name.
A dictionary is a way of mapping from one set of arbitrary objects to an associated but equally arbitrary set of objects. Actual dictionaries, thesauri, or translation books are good analogies in the real world. To see how natural this correspondence is, here’s the start of an English-to-French color translator:
Write the code to ask the user for three names and three ages. After the names and ages are entered, ask the user for one of the names, and print the correct age.
7.2. Other dictionary operations
Besides basic element assignment and access, dictionaries support several operations. You can define a dictionary explicitly as a series of key-value pairs separated by commas:
len returns the number of entries in a dictionary:
You can obtain all the keys in the dictionary with the keys method. This method is often used to iterate over the contents of a dictionary using Python’s for loop, described in chapter 8:
In Python 3.5 and earlier, the order of the keys in a list returned by keys has no meaning; the keys aren’t necessarily sorted, and they don’t necessarily occur in the order in which they were created. Your Python code may print out the keys in a different order than my Python code did. If you need keys sorted, you can store them in a list variable and then sort that list. However, starting with Python 3.6, dictionaries preserve the order that the keys were created and return them in that order.
It’s also possible to obtain all the values stored in a dictionary by using values:
This method isn’t used nearly as often as keys.
You can use the items method to return all keys and their associated values as a sequence of tuples:
Like keys, this method is often used in conjunction with a for loop to iterate over the contents of a dictionary.
The del statement can be used to remove an entry (key-value pair) from a dictionary:
The keys, values, and items methods return not lists, but views that behave like sequences but are dynamically updated whenever the dictionary changes. That’s why you need to use the list function to make them appear as a list in these examples. Otherwise, they behave like sequences, allowing code to iterate over them in a for loop, using in to check membership in them, and so on.
The view returned by keys (and in some cases the view returned by items) also behaves like a set, with union, difference, and intersection operations.
Attempting to access a key that isn’t in a dictionary is an error in Python. To handle this error, you can test the dictionary for the presence of a key with the in keyword, which returns True if a dictionary has a value stored under the given key and False otherwise:
Alternatively, you can use the get function. This function returns the value associated with a key if the dictionary contains that key, but returns its second argument if the dictionary doesn’t contain the key:
The second argument is optional. If that argument isn’t included, get returns None if the dictionary doesn’t contain the key.
Similarly, if you want to safely get a key’s value and make sure that it’s set to a default in the dictionary, you can use the setdefault method:
The difference between get and setdefault is that after the setdefault call, there’s a key in the dictionary 'chartreuse' with the value 'No translation'.
You can obtain a copy of a dictionary by using the copy method:
This method makes a shallow copy of the dictionary, which is likely to be all you need in most situations. For dictionaries that contain any modifiable objects as values (for example, lists or other dictionaries), you may want to make a deep copy by using the copy.deepcopy function. See chapter 5 for an introduction to the concept of shallow and deep copies.
The update method updates a first dictionary with all the key-value pairs of a second dictionary. For keys that are common to both dictionaries, the values from the second dictionary override those of the first:
Dictionary methods give you a full set of tools to manipulate and use dictionaries. For quick reference, table 7.1 lists some of the main dictionary functions.
Table 7.1. Dictionary operations
Dictionary operation
Creates an empty dictionary
x = {}
Returns the number of entries in a dictionary
Returns a view of all keys in a dictionary
Returns a view of all values in a dictionary
Returns a view of all items in a dictionary
Removes an entry from a dictionary
Tests whether a key exists in a dictionary
'y' in x
Returns the value of a key or a configurable default
x.get('y', None)
Returns the value if the key is in the dictionary; otherwise, sets the value for the key to the default and returns the value
x.setdefault('y', None)
Makes a shallow copy of a dictionary
y = x.copy()
Combines the entries of two dictionaries
This table isn’t a complete list of all dictionary operations. For a complete list, refer to the Python standard library documentation.
Assume that you have a dictionary x = {'a':1, 'b':2, 'c':3, 'd':4} and a dictionary y = {'a':6, 'e':5, 'f':6}. What would be the contents of x after the following snippets of code have executed?:
7.3. Word counting
Assume that you have a file that contains a list of words, one word per line. You want to know how many times each word occurs in the file. You can use dictionaries to perform this task easily:
Increment the occurrences count for each word 1. This is a good example of the power of dictionaries. The code is simple, but because dictionary operations are highly optimized in Python, it’s also quite fast. This pattern is so handy, in fact, that it’s been standardized as the Counter class in the collections module of the standard library.
7.4. What can be used as a key?
The previous examples use strings as keys, but Python permits more than just strings to be used in this manner. Any Python object that is immutable and hashable can be used as a key to a dictionary.
In Python, as discussed earlier, any object that can be modified is called mutable. Lists are mutable because list elements can be added, changed, or removed. Dictionaries are also mutable for the same reason. Numbers are immutable. If a variable x is referring to the number 3, and you assign 4 to x, you’ve made x refer to a different number (4), but you haven’t changed the number 3 itself; 3 still has to be 3. Strings are also immutable. list[n] returns the nth element of list, string[n] returns the nth character of string, and list[n] = value changes the nth element of list, but string[n] = character is illegal in Python and causes an error, because strings in Python are immutable.
Unfortunately, the requirement that keys be immutable and hashable means that lists can’t be used as dictionary keys, but in many instances, it would be convenient to have a listlike key. For example, it’s convenient to store information about a person under a key consisting of the person’s first and last names, which you could easily do if you could use a two-element list as a key.
Python solves this difficulty by providing tuples, which are basically immutable lists; they’re created and used similarly to lists, except that once created, they can’t be modified. There’s one further restriction: Keys must also be hashable, which takes things a step further than just immutable. To be hashable, a value must have a hash value (provided by a __hash__ method) that never changes throughout the life of the value. That means that tuples containing mutable values are not hashable, although the tuples themselves are technically immutable. Only tuples that don’t contain any mutable objects nested within them are hashable and valid to use as keys for dictionaries. Table 7.2 illustrates which of Python’s built-in types are immutable, hashable, and eligible to be dictionary keys.
Table 7.2. Python values eligible to be used as dictionary keys
Python type
Dictionary key?
The next sections give examples illustrating how tuples and dictionaries can work together.
Decide which of the following expressions can be a dictionary key: 1; 'bob'; ('tom', [1, 2, 3]); ["filename"]; "filename"; ("filename", "extension")
7.5. Sparse matrices
In mathematical terms, a matrix is a two-dimensional grid of numbers, usually written in textbooks as a grid with square brackets on each side, as shown here.
A fairly standard way to represent such a matrix is by means of a list of lists. In Python, a matrix is presented like this:
Elements in the matrix can be accessed by row and column number:
But in some applications, such as weather forecasting, it’s common for matrices to be very large—thousands of elements to a side, meaning millions of elements in total. It’s also common for such matrices to contain many zero elements. In some applications, all but a small percentage of the matrix elements may be set to zero. To conserve memory, it’s common for such matrices to be stored in a form in which only the nonzero elements are actually stored. Such representations are called sparse matrices.
It’s simple to implement sparse matrices by using dictionaries with tuple indices. For example, the previous sparse matrix can be represented as follows:
Now you can access an individual matrix element at a given row and column number by this bit of code:
A slightly less clear (but more efficient) way of doing this is to use the dictionary get method, which you can tell to return 0 if it can’t find a key in the dictionary and otherwise return the value associated with that key, preventing one of the dictionary lookups:
If you’re considering doing extensive work with matrices, you may want to look into NumPy, the numeric computation package.
7.6. Dictionaries as caches
This section shows how dictionaries can be used as caches, data structures that store results to avoid recalculating those results over and over. Suppose that you need a function called sole, which takes three integers as arguments and returns a result. The function might look something like this:
But if this function is very time-consuming, and if it’s called tens of thousands of times, the program might run too slowly.
Now suppose that sole is called with about 200 different combinations of arguments during any program run. That is, you might call sole(12, 20, 6) 50 or more times during the execution of your program and similarly for many other combinations of arguments. By eliminating the recalculation of sole on identical arguments, you’d save a huge amount of time. You could use a dictionary with tuples as keys, like so:
The rewritten sole function uses a global variable to store previous results. The global variable is a dictionary, and the keys of the dictionary are tuples corresponding to argument combinations that have been given to sole in the past. Then any time sole passes an argument combination for which a result has already been calculated, it returns that stored result rather than recalculating it.
Suppose that you’re writing a program that works like a spreadsheet. How might you use a dictionary to store the contents of a sheet? Write some sample code to both store a value and retrieve a value in a particular cell. What might be some drawbacks to this approach?
7.7. Efficiency of dictionaries
If you come from a traditional compiled-language background, you may hesitate to use dictionaries, worrying that they’re less efficient than lists (arrays). The truth is that the Python dictionary implementation is quite fast. Many of the internal language features rely on dictionaries, and a lot of work has gone into making them efficient. Because all of Python’s data structures are heavily optimized, you shouldn’t spend much time worrying about which is faster or more efficient. If the problem can be solved more easily and cleanly by using a dictionary than by using a list, do it that way, and consider alternatives only if it’s clear that dictionaries are causing an unacceptable slowdown.
In the previous lab, you took the text of the first chapter of Moby Dick, normalized the case, removed punctuation, and wrote the separated words to a file. In this lab, you read that file, use a dictionary to count the number of times each word occurs, and then report the most common and least common words.
Dictionaries are powerful data structures, used for many purposes even within Python itself.
Dictionary keys must be immutable, but any immutable object can be a dictionary key.
Using keys means accessing collections of data more directly and with less code than many other solutions.
