Appendix B. Exercise answers
Appendix B. Exercise answers
B.1. Chapter 4
TRY THIS: VARIABLES AND EXPRESSIONS
In the Python shell, create some variables. What happens when you try to put spaces, dashes, or other nonalphanumeric characters in the variable name? Play around with a few complex expressions, such as x = 2 + 4 * 5 – 6 / 3. Use parentheses to group the numbers in different ways, and see how that changes the result compared with the original ungrouped expression.
copy
TRY THIS: MANIPULATING STRINGS AND NUMBERS
In the Python shell, create some string and number variables (integers, floats, and complex numbers). Experiment a bit with what happens when you do operations with them, including across types. Can you multiply a string by an integer, for example, or by a float or complex number? Also, load the math module and try out a few of the functions; then load the cmath module and do the same. What happens if you try to use one of those functions on an integer or float after loading the cmath module? How might you get the math module functions back?
copy
To reconnect the first sqrt to your current namespace, you can reimport it. Note that this code doesn’t reload the file:
copy
TRY THIS: GETTING INPUT
Experiment with the input() function to get string and integer input. Using code similar to the code above, what is the effect of not using int() around the call to input()for integer input? Can you modify that code to accept a float, such as 28.5? What happens if you deliberately enter the “wrong” type of value, such as a float where an int is expected or a string where a number is expected, and vice versa?
copy
QUICK CHECK: PYTHONIC STYLE
Which of the following variable and function names do you think are not good Pythonic style, and why?: bar(, varName, VERYLONGVARNAME, foobar, longvarname, foo_bar(), really_very_long_var_name
bar(: Not good, not legal, includes symbol
varName: Not good, mixed case
VERYLONGVARNAME: Not good, long, all caps, hard to read
foobar: Good
longvarname: Good, although underscores to separate words would be better
foo_bar(): Good
really_very_long_var_name: Long, but good if all of the words are needed, perhaps to distinguish among similar variables
B.2. Chapter 5
QUICK CHECK: LEN()
What would len() return for each of the following: [0]; []; [[1, 3, [4, 5], 6], 7]?
len([0]) - 1
len([]) - 0
len([[1, 3, [4, 5], 6], 7 s]) - 2
([1, 3, [4, 5], 6] is a list and a single item in the list before the second item, 7.
TRY THIS: LIST SLICES AND INDEXES
Using what you know about the len() function and list slices, how would you combine the two to get the second half of a list when you don’t know what size it is? Experiment in the Python shell to confirm that your solution works.
copy
len(my_list) // 2 is the halfway point; slice from there to the end.
TRY THIS: MODIFYING LISTS
Suppose that you have a list 10 items long. How might you move the last three items from the end of the list to the beginning, keeping them in the same order?
copy
TRY THIS: SORTING LISTS
Suppose that you have a list in which each element is in turn a list: [[1, 2, 3], [2, 1, 3], [4, 0, 1]]. If you want to sort this list by the second element in each list, so that the result is [[4, 0, 1], [2, 1, 3], [1, 2, 3]], what function would you write to pass as the key value to the sort() method?
copy
or
copy
QUICK CHECK: LIST OPERATIONS
What is the result of len([[1,2]] * 3)?
3
What are two differences between using the in operator and a list’s index() method?
index gives position; in gives a true/false answer.
index gives an error if an element isn’t in the list.
Which of the following raises an exception? min(["a", "b", "c"]); max([1, 2, "three"]); [1, 2, 3].count("one")
max([1, 2, "three"]): Strings and ints can’t be compared, so it’s impossible to get a max value.
TRY THIS: LIST OPERATIONS
If you have a list x, write the code to safely remove an item if and only if that value is in the list.
copy
Modify that code to remove the element only if the item occurs in the list more than once.
copy
Note: This code removes only the first occurrence of element.
TRY THIS: LIST COPIES
Suppose that you have the following list: x = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]. What code could you use to get a copy y of that list in which you could change its elements without the side effect of changing the contents of x?
copy
QUICK CHECK: TUPLES
Explain why the following operations aren’t legal for the tuple x = (1, 2, 3, 4):
copy
All of these operations change the object in place, and tuples can’t be changed.
If you had a tuple x = (3, 1, 4, 2), how might you end up with x sorted?
copy
QUICK CHECK: SETS
If you were to construct a set from the following list, how many elements would it have?: [1, 2, 5, 1, 0, 2, 3, 1, 1, (1, 2, 3)]
Six unique elements: 1, 2, 5, 0, 3, and the tuple (1, 2, 3)
LAB 5: EXAMINING A LIST
In this lab, the task is to read a set of temperature data (in fact, the monthly high temperatures at Heathrow Airport for 1948–2016) from a file and then find some basic information: the highest and lowest temperatures, the mean (average) temperature, and the median temperature (the temperature in the middle if all of the temperatures are sorted).
The temperature data is in the file lab_05.txt in the source code directory for this chapter. Because I’ve not yet discussed reading files, the code to read the files into a list is here:
copy
As mentioned, you should find the highest and lowest temperature, the average, and the median. You’ll probably want to use min(), max(), sum(), len(), and sort().
copy
Bonus: Determine how many unique temperatures are in the list.
copy
B.3. Chapter 6
QUICK CHECK: SPLIT AND JOIN
How could you use split and join to change all of the whitespace in string x to dashes (such as "this is a test" to "this-is-a-test")?
copy
QUICK CHECK: STRINGS TO NUMBERS
Which of the following will not be converted to numbers, and why?
int('a1')
int('12G', 16)
float("12345678901234567890")
int("12*2")
Only #3 float("12345678901234567890") converts; all the others have a character that wouldn’t be allowed for conversion to an int.
QUICK CHECK: STRIP
If the string x equals "(name, date),\n", which of the following returns a string containing "name, date"?
x.rstrip("),")
x.strip("),\n")
x.strip("\n)(,")
x.strip("\n)(,") will remove the newline as well as the comma and parentheses.
QUICK CHECK: STRING SEARCHING
If you want to see whether a line ends with the string "rejected", what string method would you use? Are there any other ways you could get the same result?
copy
You could also do line[:-8] == rejected, but that wouldn’t be as clear or Pythonic.
QUICK CHECK: MODIFYING STRINGS
What would be a quick way to change all punctuation in a string to spaces?
copy
TRY THIS: STRING OPERATIONS
Suppose that you have a list of strings in which some (but not necessarily all) of the strings begin and end with the double quote character:
copy
What code would you use on each element to remove just the double quotes?
copy
What code could you use to find the position of the last p in Mississippi? When you’ve found its position, what code would you use to remove just that letter?
copy
QUICK CHECK: THE FORMAT() METHOD
What will be in x when the following snippets of code are executed?
copy
QUICK CHECK: FORMATTING STRINGS WITH %
What would be in the variable x after the following snippets of code have executed?
copy
QUICK CHECK: BYTES
For which of the following kinds of data would you want to use a string? For which could you use bytes?
(1) Data file storing binary data
Bytes. Because the data is binary, you’re more concerned with the contents as numbers rather than text. Therefore, it would make sense to use bytes.
(2) Text in a language with accented characters
String. Python 3 strings are Unicode, so they can handle accented characters.
(3) Text with only uppercase and lowercase roman characters
String. Strings should be used for all text in Python 3.
(4) A series of integers no larger than 255
Bytes. A byte is an integer no larger than 255, so the bytes type is perfect for storing integers like this.
LAB 6: PREPROCESSING TEXT
In processing raw text, it’s quite often necessary to clean and normalize the text before doing anything else. If you want to find the frequency of words in text, for example, you can make the job easier if, before you start counting, you make sure that everything is lowercase (or uppercase, if you prefer) and that all punctuation has been removed. It can also make things easier to break the text into a series of words.
In this lab, the task is to read an excerpt of the first chapter of Moby Dick, make sure that everything is one case, remove all punctuation, and write the words one per line to a second file. Again, because I haven’t yet covered reading and writing files, the code for those operations is supplied below.
Your task is to come up with the code to replace the commented lines in the sample below:
copy
B.4. Chapter 7
TRY THIS: CREATE A DICTIONARY
Write the code to ask the user for three names and three ages. After the names and ages are entered, ask the user for one of the names, and print the correct age.
copy
QUICK CHECK: DICTIONARY OPERATIONS
Assume that you have a dictionary x = {'a':1, 'b':2, 'c':3, 'd':4} and a dictionary y = {'a':6, 'e':5, 'f':6}. What would be the contents of x after the following snippets of code have executed?
copy
QUICK CHECK: WHAT CAN BE A KEY?
Decide which of the following expressions can be a dictionary key: 1; 'bob'; ('tom', [1, 2, 3]); ["filename"]; "filename"; ("filename", "extension")
1: Yes.
'bob': Yes.
('tom', [1, 2, 3]): No; it contains a list, which isn’t hashable.
["filename"]: No; it’s a list, which isn’t hashable.
"filename": Yes.
("filename", "extension"): Yes; it’s a tuple.
TRY THIS: USING DICTIONARIES
Suppose that you’re writing a program that works like a spreadsheet. How might you use a dictionary to store the contents of a sheet? Write some sample code to both store a value and retrieve a value in a particular cell. What might be some drawbacks to this approach?
You could use tuples of row, column values as keys to store the values in a dictionary. One drawback would be that the keys wouldn’t be sorted, so you’d have to manage that situation as you grabbed the keys/values to render as a spreadsheet.
copy
LAB 7: WORD COUNTING
In Lab 6, you took the text of the first chapter of Moby Dick, normalized the case, removed punctuation, and wrote the separated words to a file. In this lab, you read that file, use a dictionary to count the number of times each word occurs, and report the most common and least common words.
Use this code to read the words from the file into a list called moby_words:
copy
B.5. Chapter 8
TRY THIS: LOOPING AND IF STATEMENTS
Suppose that you have a list x = [1, 3, 5, 0, -1, 3, -2], and you need to remove all negative numbers from that list. Write the code to do this.
copy
How would you count the total number of negative numbers in a list y = [[1, -1, 0], [2, 5, -9], [-2, -3, 0]]?
copy
What code would you use to print "very low" if the value of x is below -5, "low" if it’s from -4 up to 0, "neutral" if it’s equal to 0, "high" if it’s greater than 0 up to 4, and "very high" if it’s greater than 5?
copy
TRY THIS: COMPREHENSIONS
What list comprehension would you use to process the list x so that all negative values are removed?
copy
Create a generator that returns only odd numbers from 1 to 100. (Hint: A number is odd if there’s a remainder when it’s divided by 2; use % 2 to do this.)
copy
Write the code to create a dictionary of the numbers and their cubes from 11 through 15.
copy
QUICK CHECK: BOOLEANS AND TRUTHINESS
Decide whether the following statements are true or false: 1, 0, -1, [0], 1 and 0, 1 > 0 or []
1 ->: True.
0 ->: False.
-1: True.
[0]: True; it’s a list containing one item.
1 and 0: False.
1 > 0 or []: True.
LAB: REFACTOR WORD_COUNT
Rewrite the word-count program in section 8.7 to make it shorter. You may want to look at the string and list operations already discussed, as well as think about different ways to organize the code. You may also want to make the program smarter so that only alphabetic strings (not symbols or punctuation) count as words.
Listing B.1. File: word_count_refactored.py
copy
B.6. Chapter 9
QUICK CHECK: FUNCTIONS AND PARAMETERS
How would you write a function that could take any number of unnamed arguments and print their values in reverse order?
copy
What do you need to do to create a procedure or void function—that is, a function with no return value?
Either don’t return a value (use a bare return) or don’t use a return statement at all.
What happens if you capture the return value of a function with a variable?
The only result is that you can use that value, whatever it might be.
QUICK CHECK: MUTABLE FUNCTION PARAMETERS
What would be the result of changing a list or dictionary that was passed into a function as a parameter value? Which operations would be likely to create changes that would be visible outside the function? What steps might you take to minimize that risk?
The changes would persist for future uses of the default parameter. Operations such as adding and deleting elements, as well as changing the value of an element, are particularly likely to be problems. To minimize the risk, it’s better not to use mutable types as default parameters.
TRY THIS: GLOBAL VS LOCAL VARIABLES
Assuming that x = 5, what will be the value of x after funct_1() below executes? After funct_2()?
copy
After calling funct_1(), x will be unchanged; after funct_2(), the value in the global x will be 2.
QUICK CHECK: GENERATOR FUNCTIONS
What would you need to modify in the code for the function four() above to make it work for any number? What would you need to add to allow the starting point to also be set?
copy
TRY THIS: DECORATORS
How would you modify the code for the decorator function above to remove unneeded messages and enclose the return value of wrapped function in "<html>" and "</html>" so that myfunction ("hello") would return "<html>hello<html>"?
This exercise is a hard one, because to define a function that changes the return value, you need to add an inner wrapper function to call the original function and add to the return value.
copy
LAB 9: USEFUL FUNCTIONS
Looking back at chapters 6 and 7, refactor the code into functions for cleaning and processing the data. The goal should be that most of the logic is moved into functions. Use your own judgment as to the types of functions and parameters, but keep in mind that functions should do just one thing and that they shouldn’t have any side effects that carry over outside the function.
copy
B.7. Chapter 10
QUICK CHECK: MODULES
Suppose that you have a module called new_math that contains a function called new_divide. What are the ways that you might import and then use that function? What are the pros and cons of each way?
copy
This solution is often preferred because there won’t be a clash between any identifiers in new_module and the importing namespace. This solution is less convenient to type, however.
copy
This version is more convenient to use but increases the chance of name clashes between identifiers in the module and the importing namespace.
Suppose that the new_math module contains a function call _helper_math(). How will the underscore character affect the way that _helper_math() is imported?
It won’t be imported if you use from new_math import *
QUICK CHECK: NAMESPACES AND SCOPE
Consider a variable width that’s in the module make_window.py. In which of the following contexts is width in scope?
(A) With the module itself
(B) Inside the resize() function in the module
(C) Within the script that imported the make_window.py module
A and B but not C
LAB 10: CREATE A MODULE
Package the functions that you created at the end of chapter 9 as a standalone module. Although you can include code to run the module as the main program, the goal should be for the functions to be completely usable from another script.
(no answer)
B.8. Chapter 11
TRY THIS: MAKING A SCRIPT EXECUTABLE
Experiment with executing scripts on your platform. Also try to redirect input and output into and out of your scripts.
(no answer)
QUICK CHECK: PROGRAMS AND MODULES
What issue is the use of if __name__ == "__main__": meant to prevent, and how does it do that? Can you think of any other way to prevent this issue?
When Python loads a module, all of its code is executed. By using the pattern above, you can have certain code run only if it’s being executed as the main script file.
LAB 11: CREATING A PROGRAM
In chapter 8, you created a version of the UNIX wc utility to count the lines, words, and characters in a file. Now that you have more tools at your disposal, refactor that program to make it work more like the original. In particular, it should have options to show only lines (-l), only words (-w), and only characters (-c). If none of those options is given, all three stats are displayed, but if any of them is present, only the specified stats are shown.
For an extra challenge, look at the man page for wc on a Linux/UNIX system, and add the -L to show the longest line length. Feel free to try to implement the complete behavior as listed in the man page, and test it against your system’s wc utility.
copy
B.9. Chapter 12
QUICK CHECK: MANIPULATING PATHS
How would you use the os module’s functions to take a path to a file called test.log and create a new file path in the same directory for a file called test.log.old? How would you do the same thing by using the pathlib module?
copy
What path would you get if you created a pathlib Path object from os .pardir? Try it to find out.
copy
LAB 12: MORE FILE OPERATIONS
How might you calculate the total size of all files ending with .txt that aren’t symlinks in a directory? If your first answer was using os.path, also try it with pathlib, and vice versa.
copy
Write some code that builds off your solution above to move the same .txt files in the question above to a new directory called backup in the same directory.
copy
B.10. Chapter 13
QUICK CHECK
What is the significance of adding a "b" to the file open mode string?
It makes the file open in binary mode, reading and writing bytes, not characters.
Suppose that you want to open a file named myfile.txt and write some additional data at the end of it. What command would you use to open myfile.txt? What command would you use to reopen the file to read from the beginning?
copy
TRY THIS: REDIRECTING INPUT AND OUTPUT
Write some code to use the mio.py module above to capture all of the print output of a script to a file named myfile.txt, reset the standard output to the screen, and print that file to screen.
copy
QUICK CHECK: STRUCT
What use cases can you think of in which the struct module would be useful for either reading or writing binary data?
You’re trying to read/write from a binary-format application file or image file.
You’re reading from some external interface, such as a thermometer or accelerometer, and you want to save the raw data exactly as it was transmitted.
QUICK CHECK: PICKLES
Think about why a pickle would or wouldn’t be a good solution for the following use cases:
(A) Saving some state variables from one run to the next
(B) Keeping a high-score list for a game
(C) Storing usernames and passwords
(D) Storing a large dictionary of English terms
A and B would be reasonable, although pickles aren’t secure.
C and D wouldn’t be good; the lack of security would be a big problem for C, and for D, there’d be a need to load the entire pickle into memory.
QUICK CHECK: SHELVE
Using a shelf object looks very much like using a dictionary. In what ways is using a shelf object different? What disadvantages would you expect there to be in using a shelf object?
The key difference is that the objects are stored on disk, not in memory. With very large amounts of data, particularly with lots of inserts and/or deletes, you’d expect disk access to make things slow.
LAB: FINAL FIXES TO WC
If you look at the man page for the wc utility, you see that two command-line options do very similar things. -c makes the utility count the bytes in the file, and -m makes it count characters (which in the case of some Unicode characters can be two or more bytes long). In addition, if a file is given, it should read from and process that file, but if no file is given, it should read from and process stdin.
Rewrite your version of the wc utility to implement both the distinction between bytes and characters and the ability to read from files and standard input.
copy
B.11. Chapter 14
TRY THIS: CATCHING EXCEPTIONS
Write some code that gets two numbers from the user and divides the first number by the second. Check for and catch the exception that occurs if the second number is zero (ZeroDivisionError).
copy
QUICK CHECK: EXCEPTIONS AS CLASSES
If MyError inherits from Exception, what will be the difference between except Exception as e and except MyError as e?
The first catches any exception that inherits from Exception (most of them), whereas the second catches only MyError exceptions.
TRY THIS: THE ASSERT STATEMENT
Write a simple program that gets a number from the user and then uses the assert statement to raise an exception if the number is zero. Test to make sure that the assert fires and then turn it off, using one of the methods mentioned above.
copy
QUICK CHECK: EXCEPTIONS
Do Python exceptions force a program to halt?
No. If exceptions are caught and handled correctly, the program won’t need to halt.
Suppose that you want accessing a dictionary x to always return None if a key doesn’t exist in the dictionary (that is, if a KeyError exception is raised). What code would you use to achieve that goal?
copy
TRY THIS: EXCEPTIONS
What code would you use to create a custom ValueTooLarge exception and raise that exception if the variable x is over 1000?
copy
QUICK CHECK: CONTEXT MANAGERS
Assume that you’re using a context manager in a script that reads and/or writes several files. Which of the following approaches do you think would be best?
(A) Put the entire script in a block managed by a with statement.
(B) Use one with statement for all file reads and another for all file writes.
(C) Use a with statement each time you read a file or write a file (that is, for each line).
(D) Use a with statement for each file that you read or write.
LAB 14: CUSTOM EXCEPTIONS
Think about the module you wrote in chapter 9 to count word frequencies. What errors might reasonably occur in those functions? Rewrite the code to handle those exception conditions appropriately.
copy
B.12. Chapter 15
TRY THIS: INSTANCE VARIABLES
What code would you use to create a Rectangle class?
copy
TRY THIS: INSTANCE VARIABLES AND METHODS
Update the code for a Rectangle class so that you can set the dimensions when an instance is created, just as for the Circle class above. Also add an area() method.
copy
TRY THIS: CLASS METHODS
Write a class method that’s similar to total_area() but returns the total circumference of all circles.
copy
TRY THIS: INHERITANCE
Rewrite the code for a Rectangle class to inherit from Shape. Because squares and rectangles are related, would it make sense to inherit one from the other? If so, which would be the base class, and which would inherit?
copy
It probably would make sense to inherit. Because squares are special kinds of rectangles, Square should inherit from the Rectangle class.
If Square was specialized so that it had only one dimension x, you would write
copy
How would you write the code to add an area() method for the Square class? Should the area() method be moved into the base Shape class and inherited by Circle, Square, and Rectangle? What issues would that change cause?
It makes sense to put the area() method in a Rectangle class that Square inherits from, but putting it in Shape wouldn’t be very helpful, because different types of shapes have their own rules for calculating area. Every shape would be overriding the base area() method anyway.
TRY THIS: PRIVATE INSTANCE VARIABLES
Modify the Rectangle class’s code to make the dimension variables private. What restriction will this change impose on using the class?
The dimension variables will no longer be accessible outside the class via .x and .y.
copy
TRY THIS: PROPERTIES
Update the dimensions of the Rectangle class to be properties with getters and setters that don’t allow negative sizes.
copy
LAB 15: HTML CLASSES
In this lab, you create classes to represent an HTML document. To keep things simple, assume that each element can contain only text and one subelement. So the <html> element contains only a <body> element, and the <body> element contains (optional) text and a <p> element, which contains only text.
The key feature to implement is the __str__() method, which in turn calls its subelement’s __str__() method so that the entire document is returned when the str() function is called on an <html> element. You can assume that any text comes before the subelement.
Following is example output from using the classes:
copy
Answer:
copy
B.13. Chapter 16
QUICK CHECK: SPECIAL CHARACTERS IN REGULAR EXPRESSIONS
What regular expression would you use to match strings that represent the numbers -5 through 5?
`r"-{0,1}[0-5]"` matches strings that represent the numbers -5 through 5.
What regular expression would you use to match a hexadecimal digit? Assume that the allowed hexadecimal digits are 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, A, a, B, b, C, c, D, d, E, e, F, and f.
copy
TRY THIS: EXTRACTING MATCHED TEXT
Making international calls usually requires a plus sign (+) and the country code. Assuming that the country code is two digits, how would you modify the code above to extract the plus sign and the country code as part of the number? (Again, not all numbers have a country code.) How would you make the code handle country codes of one to three digits?
copy
or
copy
For one- to three-digit country codes:
copy
TRY THIS: REPLACING TEXT
In the checkpoint above, you extended a phone-number regular expression to also recognize a country code. How would you use a function to make any numbers that didn’t have a country code now have +1 (the country code for the United States and Canada)?
copy
LAB 16: PHONE NUMBER NORMALIZER
In the United States and Canada, phone numbers consist of 10 digits, usually separated into a 3-digit area code, a 3-digit exchange code, and a 4-digit station code. As mentioned above, phone numbers may or may not be preceded by +1, the country code. In practice, there are many ways of formatting a phone number, such as (NNN) NNN-NNNN, NNN-NNN-NNNN, NNN NNN-NNNN, NNN.NNN.NNNN, and NNN NNN NNNN. Also, the country code may not be present, may not have a plus sign, and is usually (not always) separated from the number by a space or dash. Whew!
In this lab, the task is to create a phone number normalizer that takes any of the formats mentioned above and returns a normalized phone number 1-NNN-NNN-NNNN.
The following are all possible phone numbers:
+1 223-456-7890
1-223-456-7890
+1 223 456-7890
(223) 456-7890
1 223 456 7890
223.456.7890
Bonus: The first digit of the area code and the exchange code can be only 2–9, and the second digit of an area code can’t be 9. Use this information to validate the input and return the message "invalid phone number" if the number is invalid.
copy
B.14. Chapter 17
QUICK CHECK: TYPES
Suppose that you want to make sure that object x is a list before you try appending to it. What code would you use? What would be the difference between using type() and isinstance()? Would this be the LBYL (look before you leap) or EAFP (easier to ask forgiveness than permission) style of programming? What other options might you have besides checking the type explicitly?
copy
Using type would get only lists, not anything that subclasses lists. Either way, it’s LBYL programming.
You might also wrap the append in a try... except block and catch TypeError exceptions, which would be more EAFP.
QUICK CHECK: __GETITEM__
The example use of __getitem__ above is very limited and won’t work correctly in many situations. What are some cases in which the implementation above will fail or work incorrectly?
This implementation will not work if you try to access an item directly by index; neither can you move backward.
TRY THIS: IMPLEMENTING LIST SPECIAL METHODS
Try implementing the __len__ and __delitem__ special methods listed earlier, as well as an append method. The implementation is in bold in the code.
copy
QUICK CHECK: SPECIAL METHOD ATTRIBUTES AND SUBCLASSING EXISTING TYPES
Suppose that you want a dictionary like type that allows only strings as keys (maybe to make it work like a shelf object, as described in Chapter 13). What options would you have for creating such a class? What would be the advantages and disadvantages of each option?
You could use the same approach as you did for TypedList and inherit from the UserDict class. You could also inherit directly from dict, or you could implement all of the dict functionality yourself.
Implementing everything yourself provides the most control but is the most work and most prone to bugs. If the changes you need to make are small (in this case, just checking the type before adding a key), it might make the most sense to inherit directly from dict. On the other hand, inheriting from UserDict is probably safest, because the internal dict object will continue to be a regular dict, which is a highly optimized and mature implementation.
B.15. Chapter 18
QUICK CHECK: PACKAGES
Suppose that you’re writing a package that takes a URL, retrieves all images on the page pointed to by that URL, resizes them to a standard size, and stores them. Leaving aside the exact details of how each of these functions will be coded, how would you organize those features into a package?
The package will be performing three types of actions: fetching a page and parsing the HTML for image URLs, fetching the images, and resizing the images. For this reason, you might consider having three modules to keep the actions separate:
copy
LAB 18: CREATE A PACKAGE
In chapter 14, you added error handling to the text cleaning and word frequency counting module you created in chapter 11. Refactor that code into a package containing one module for the cleaning functions, one for the processing functions, and one for the custom exceptions. Then write a simple main function that uses all three modules.
copy
B.16. Chapter 20
QUICK CHECK: CONSIDER THE CHOICES
Take a moment to consider your options for handling the tasks identified above. What modules in the standard library can you think of that will do the job? If you want to, you can even stop right now, work out the code to do it, and compare your solution with the one you’ll develop in the next section.
From the standard library, use datetime for managing the dates/times of the files, and either os.path and os or pathlib for renaming and archiving the files.
QUICK CHECK: POTENTIAL PROBLEMS
Because the previous solution is very simple, there are likely to be many situations that it won’t handle well. What are some potential issues or problems that might arise with the script above? How might you remedy these problems?
Multiple files during the same day would be a problem, for one thing. If you have lots of files, navigating the archive directory will become increasingly difficult.
Consider the naming convention used for the files, which is based on the year, month and name, in that order. What advantages do you see in that convention? What might be the disadvantages? Can you make any arguments for putting the date string somewhere else in the filename, such as the beginning or the end?
Using year-month-day date formats makes a text-based sort of the files sort by date as well. Putting the date at the end of the filename but before the extension makes it more difficult to parse the date element visually.
TRY THIS: IMPLEMENTATION OF MULTIPLE DIRECTORIES
Using the code you developed in the section above as a starting point, how would you modify it to implement archiving each set of files in subdirectories named according to the date received? Feel free to take the time to implement the code and test it.
copy
QUICK CHECK: ALTERNATE SOLUTIONS
How might you create a script that does the same thing without using pathlib? What libraries and functions would you use?
You’d use the os.path and os libraries—specifically, os.path.join(), os.mkdir(), and os.rename().
TRY THIS: ARCHIVING TO ZIP FILES PSEUDOCODE
Take a moment to write the pseudocode for a solution that stores data files in zip files as shown above. What modules and functions or methods do you intend to use? Try coding your solution to make sure that it works.
Pseudocode:
copy
(See the next section for sample code that does this.)
QUICK CHECK: CONSIDER DIFFERENT PARAMETERS
Take some time to consider different grooming options. How would you modify the code in the previous Try This to keep only one file a month? How would you change the code so that files from the previous month and older are groomed to save one a week? (Note: This is not the same as older than 30 days!)
You could use something similar to the code above but also check the month of the file against the current month.
B.17. Chapter 21
QUICK CHECK: NORMALIZATION
Look closely at the list of words generated above. Do you see any issues with the normalization so far? What other issues do you think you might encounter with a longer section of text? How do you think you might deal with those issues?
Double hyphens for em dashes, hyphenation for line breaks and otherwise, and any other punctuation marks would all be potential problems.
Enhancing the word cleaning module you created in chapter 18 would be a good way to cover most of the issues.
TRY THIS: READ A FILE
Write the code to read a text file (assume that it’s the file temp_data_00a.txt as shown in the example above), split each line of the file into a list of values, and add that list to a single list of records.
(no answer)
What issues or problems did you encounter in implementing this solution? How might you go about converting the last three fields to the correct date, real, and int types?
You could use a list comprehension to explicitly convert those fields.
QUICK CHECK: HANDLING QUOTING
Consider how you’d approach the problems of handling quoted fields and embedded delimiter characters if you didn’t have the csv library. Which is easier to handle: the quoting or the embedded delimiters?
Without using the csv module, you’d have to check whether a field began and ended with the quote characters and then strip() them off.
To handle embedded delimiters without using the csv library, you’d have to isolate the quoted fields and treat them differently; then you’d split the rest of the fields by using the delimiter.
TRY THIS: CLEANING DATA
How would you handle the fields with 'Missing' as a possible value for math calculations? Can you write a snippet of code that averages one of those columns?
copy
What would you do with the average column at the end so that you could also report the average coverage? In your opinion, would the solution to this problem be at all linked to the way that the 'Missing' entries were handled?
copy
It may not be done at the same time as the 'Missing' values are handled.
LAB: WEATHER OBSERVATIONS
The file of weather observations provided here is by month and then by county for the state of Illinois from 1979 to 2011. Write the code to process this file and extract the data for Chicago (Cook County) into a single CSV or spreadsheet file. This code includes replacing the 'Missing' strings with empty strings and translating the percentage to a decimal. You may also consider what fields are repetitive and can be omitted or stored elsewhere. The proof that you’ve got it right occurs when you load the file into a spreadsheet. You can download a solution with the book’s source code.
B.18. Chapter 22
TRY THIS: RETRIEVING A FILE
If you’re working with the data file above and want to break each line into separate fields, how might you do that? What other processing would you expect to do? Try writing some code to retrieve this file and calculate the average annual rainfall or, for more of a challenge, the average maximum and minimum temperature for each year.
copy
TRY THIS: ACCESSING AN API
Write some code to fetch some data from the city of Chicago site used above. Look at the fields mentioned in the results, and see whether you can select on records based on another field in combination with the date range.
copy
TRY THIS: SAVING SOME JSON CRIME DATA
Modify the code you wrote to fetch Chicago crime data in section 22.2 to convert the fetched data from a JSON-formatted string to a Python object. See whether you can save the crime events both as a series of separate JSON objects in one file and as one JSON object in another file. Then see what code is needed to load each file.
copy
TRY THIS: FETCHING AND PARSING XML
Write the code to pull the Chicago XML weather forecast from http://mng.bz/103V. Then use xmltodict to parse the XML into a Python dictionary and extract tomorrow’s forecast maximum temperature. Hint: To match up time layouts and values, compare the layout-key value of the first time-layout section and the time-layout attribute of the temperature element of the parameters element.
copy
TRY THIS: PARSING HTML
Given the file forecast.html (which you can find with the code on this book’s website), write a script using Beautiful Soup that extracts the data and saves it as a CSV file.
copy
LAB 22: TRACK CURIOSITY’S WEATHER
Use the application programming interface (API) described in section 22.2 of chapter 22 to gather a weather history of Curiosity’s stay on Mars for a month. Hint: You can specify Martian days (sols) by adding ?sol=sol_number to the end of the archive query like this:
http://marsweather.ingenology.com/v1/archive/?sol=155
Transform the data so that you can load it into a spreadsheet and graph it. For a version of this project, see the book’s source code.
copy
B.19. Chapter 23
TRY THIS: CREATING AND MODIFYING TABLES
Using sqlite3, write the code that creates a database table for the Illinois weather data you loaded from a flat file in section 21.2 of chapter 21. Suppose that you have similar data for more states and want to store more information about the states themselves. How could you modify your database to use a related table to store the state information?
copy
You could add a state table and store only each state’s ID field in the weather database.
TRY THIS: USING AN ORM
Using the database from section 22.3, write a SQLAlchemy class to map to the data table and use it to read the records from the table.
TRY THIS: MODIFYING A DATABASE WITH ALEMBIC
Experiment with creating an a\Alembic upgrade that adds a state table to your database, with columns for ID, state name, and abbreviation. Upgrade and downgrade. What other changes would be necessary if you were going to use the state table along with the existing data table?
(no answer)
QUICK CHECK: USES OF KEY:VALUE STORES
What sorts of data and applications would benefit most from a key:value store like Redis?
Quick lookup of data
Caching
QUICK CHECK: USES OF MONGODB
Thinking back over the various data samples you’ve seen so far and other types of data in your experience, can you come up with any data that you think would be well suited to being stored in a database such as MongoDB? Would others clearly not be suited, and if so, why not?
Data that comes in large and/or more loosely organized chunks is suited to MongoDB, such as the contents of a web page or document.
Data with a specific structure is better suited to relational data. The weather data you’ve seen is a good example.
LAB 23: CREATE A DATABASE
Choose one of the datasets discussed in the past few chapters, and decide which type of database would be best to store that data. Create that database, and write the code to load the data into it. Then choose the two most common and/or likely types of search criteria, and write the code to retrieve both single and multiple matching records.
(no answer)
B.20. Chapter 24
TRY THIS: USING JUPYTER NOTEBOOK
Enter some code in the notebook, and experiment with running it. Check out the Edit, Cell, and Kernel menus to see what options are there. When you have a little code running, use the Kernel menu to restart the kernel, repeat your steps, and then use the cell menu to rerun the code in all of the cells.
(no answer)
TRY THIS: CLEANING DATA WITH AND WITHOUT PANDAS
Experiment with the operations mentioned above. When the final column has been converted to a fraction, can you think of a way to convert it back to a string with the trailing percentage sign?
By contrast, load the same data into a plain Python list by using the csv module, and apply the same changes by using plain Python.
QUICK CHECK: MERGING DATA SETS
How would you go about actually merging to data sets like the above in Python?
If you’re sure that you have exactly the same number of items in each set and that the items are in the right order, you could use the zip() function. Otherwise, you could create a dictionary, with the keys being something common between the two data sets, and then append the date by key from both sets.
QUICK CHECK: SELECTING IN PYTHON
What Python code structure would you use to select only rows that meet certain conditions?
You’d probably use a list comprehension:
copy
TRY THIS: GROUPING AND AGGREGATING
Experiment with pandas and the data above. Can you get the calls and amounts by both team member and month?
copy
TRY THIS: PLOTTING
Plot a line graph of the monthly average amount per call.
Last updated