Appendix B. Exercise answers

Appendix B. Exercise answers

B.1. Chapter 4

TRY THIS: VARIABLES AND EXPRESSIONS

In the Python shell, create some variables. What happens when you try to put spaces, dashes, or other nonalphanumeric characters in the variable name? Play around with a few complex expressions, such as x = 2 + 4 * 5 – 6 / 3. Use parentheses to group the numbers in different ways, and see how that changes the result compared with the original ungrouped expression.

12345678910111213141516171819202122232425262728>>> x = 3
>>> y = 3.14
>>> y
3.14
>>> x
3
>>> big var = 12
  File "<stdin>", line 1
    big var = 12
          ^
SyntaxError: invalid syntax
>>> big-var
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'big' is not defined
>>> big&var
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'big' is not defined
>>> x = 2 + 4 * 5 - 6 /3
>>> x
20.0
>>> x = (2 + 4) * 5 - 6 /3
>>> x
28.0
>>> x = (2 + 4) * (5 - 6) /3
>>> x
-2.0

copy

TRY THIS: MANIPULATING STRINGS AND NUMBERS

In the Python shell, create some string and number variables (integers, floats, and complex numbers). Experiment a bit with what happens when you do operations with them, including across types. Can you multiply a string by an integer, for example, or by a float or complex number? Also, load the math module and try out a few of the functions; then load the cmath module and do the same. What happens if you try to use one of those functions on an integer or float after loading the cmath module? How might you get the math module functions back?

123456789101112131415161718192021222324252627282930313233343536>>> i = 3
>>> f = 3.14
>>> c = 3j2
  File "<stdin>", line 1
    c = 3j2
          ^
SyntaxError: invalid syntax
>>> c = 3J2
  File "<stdin>", line 1
    c = 3J2
          ^
SyntaxError: invalid syntax
>>> c = 3 + 2j
>>> c
(3+2j)
>>> s = 'hello'
>>> s * f
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'float'
>>> s * i
'hellohellohello'
>>> s * c
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'complex'
>>> c * i
(9+6j)
>>> c * f
(9.42+6.28j)
>>> from math import sqrt
>>> sqrt(16)
4.0
>>> from cmath import sqrt
>>> sqrt(16)
(4+0j)

copy

To reconnect the first sqrt to your current namespace, you can reimport it. Note that this code doesn’t reload the file:

123>>> from math import sqrt
>>> sqrt(4)
2.0

copy

TRY THIS: GETTING INPUT

Experiment with the input() function to get string and integer input. Using code similar to the code above, what is the effect of not using int() around the call to input()for integer input? Can you modify that code to accept a float, such as 28.5? What happens if you deliberately enter the “wrong” type of value, such as a float where an int is expected or a string where a number is expected, and vice versa?

12345678910111213>>> x = input("int?")
int?3
>>> x
'3'
>>> y = float(input("float?"))
float?3.5
>>> y
3.5
>>> z = int(input("int?"))
int?3.5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '3.5'

copy

QUICK CHECK: PYTHONIC STYLE

Which of the following variable and function names do you think are not good Pythonic style, and why?: bar(, varName, VERYLONGVARNAME, foobar, longvarname, foo_bar(), really_very_long_var_name

bar(: Not good, not legal, includes symbol

varName: Not good, mixed case

VERYLONGVARNAME: Not good, long, all caps, hard to read

foobar: Good

longvarname: Good, although underscores to separate words would be better

foo_bar(): Good

really_very_long_var_name: Long, but good if all of the words are needed, perhaps to distinguish among similar variables

B.2. Chapter 5

QUICK CHECK: LEN()

What would len() return for each of the following: [0]; []; [[1, 3, [4, 5], 6], 7]?

len([0]) - 1

len([]) - 0

len([[1, 3, [4, 5], 6], 7 s]) - 2

([1, 3, [4, 5], 6] is a list and a single item in the list before the second item, 7.

TRY THIS: LIST SLICES AND INDEXES

Using what you know about the len() function and list slices, how would you combine the two to get the second half of a list when you don’t know what size it is? Experiment in the Python shell to confirm that your solution works.

1234>>> my_list = [1, 2, 3, 4, 5, 6]
>>> last_half = my_list[len(my_list)//2:]
>>> last_half
[4, 5, 6]

copy

len(my_list) // 2 is the halfway point; slice from there to the end.

TRY THIS: MODIFYING LISTS

Suppose that you have a list 10 items long. How might you move the last three items from the end of the list to the beginning, keeping them in the same order?

123>>> my_list = my_list[-3:] + my_list[:-3]
>>> my_list
[4, 5, 6, 1, 2, 3]

copy

TRY THIS: SORTING LISTS

Suppose that you have a list in which each element is in turn a list: [[1, 2, 3], [2, 1, 3], [4, 0, 1]]. If you want to sort this list by the second element in each list, so that the result is [[4, 0, 1], [2, 1, 3], [1, 2, 3]], what function would you write to pass as the key value to the sort() method?

1234>>> the_list =  [[1, 2, 3], [2, 1, 3], [4, 0, 1]]
>>> the_list.sort(key=lambda x: x[1])
>>> the_list
[[4, 0, 1], [2, 1, 3], [1, 2, 3]]

copy

or

1234>>> the_list =  [[1, 2, 3], [2, 1, 3], [4, 0, 1]]
>>> the_list.sort(key=lambda x: x[1])
>>> the_list
[[4, 0, 1], [2, 1, 3], [1, 2, 3]]

copy

QUICK CHECK: LIST OPERATIONS

What is the result of len([[1,2]] * 3)?

3

What are two differences between using the in operator and a list’s index() method?

  • index gives position; in gives a true/false answer.

  • index gives an error if an element isn’t in the list.

Which of the following raises an exception? min(["a", "b", "c"]); max([1, 2, "three"]); [1, 2, 3].count("one")

max([1, 2, "three"]): Strings and ints can’t be compared, so it’s impossible to get a max value.

TRY THIS: LIST OPERATIONS

If you have a list x, write the code to safely remove an item if and only if that value is in the list.

12if element in x:
    x.remove(element)

copy

Modify that code to remove the element only if the item occurs in the list more than once.

12if x.count(element) > 1:
    x.remove(element)

copy

Note: This code removes only the first occurrence of element.

TRY THIS: LIST COPIES

Suppose that you have the following list: x = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]. What code could you use to get a copy y of that list in which you could change its elements without the side effect of changing the contents of x?

12import copy
copy_x = copy.deepcopy(x)

copy

QUICK CHECK: TUPLES

Explain why the following operations aren’t legal for the tuple x = (1, 2, 3, 4):

123x.append(1)
x[1] = "hello"
del x[2]

copy

All of these operations change the object in place, and tuples can’t be changed.

If you had a tuple x = (3, 1, 4, 2), how might you end up with x sorted?

1x = sorted(x)

copy

QUICK CHECK: SETS

If you were to construct a set from the following list, how many elements would it have?: [1, 2, 5, 1, 0, 2, 3, 1, 1, (1, 2, 3)]

Six unique elements: 1, 2, 5, 0, 3, and the tuple (1, 2, 3)

LAB 5: EXAMINING A LIST

In this lab, the task is to read a set of temperature data (in fact, the monthly high temperatures at Heathrow Airport for 1948–2016) from a file and then find some basic information: the highest and lowest temperatures, the mean (average) temperature, and the median temperature (the temperature in the middle if all of the temperatures are sorted).

The temperature data is in the file lab_05.txt in the source code directory for this chapter. Because I’ve not yet discussed reading files, the code to read the files into a list is here:

123with open('lab_05.txt') as infile:
     for row in infile:
        temperatures.append(float(row.strip()))

copy

As mentioned, you should find the highest and lowest temperature, the average, and the median. You’ll probably want to use min(), max(), sum(), len(), and sort().

123456789101112131415max_temp = max(temperatures)
min_temp = min(temperatures)
mean_temp = sum(temperatures)/len(temperatures)
# we'll need to sort to get the median temp
temperatures.sort()
median_temp = temperatures[len(temperatures)//2]
print("max = {}".format(max_temp))
print("min = {}".format(min_temp))
print("mean = {}".format(mean_temp))
print("median = {}".format(median_temp))

max = 28.2
min = 0.8
mean = 14.848309178743966
median = 14.7

copy

Bonus: Determine how many unique temperatures are in the list.

123456unique_temps = len(set(temperatures))

print("number of temps - {}".format(len(temperatures)))
print("number of temps - {}".format(unique_temps))
number of temps - 828
number of unique temps – 217

copy

B.3. Chapter 6

QUICK CHECK: SPLIT AND JOIN

How could you use split and join to change all of the whitespace in string x to dashes (such as "this is a test" to "this-is-a-test")?

123>>> x = "this is a test"
>>> "-".join(x.split())
'this-is-a-test'

copy

QUICK CHECK: STRINGS TO NUMBERS

Which of the following will not be converted to numbers, and why?

  1. int('a1')

  2. int('12G', 16)

  3. float("12345678901234567890")

  4. int("12*2")

Only #3 float("12345678901234567890") converts; all the others have a character that wouldn’t be allowed for conversion to an int.

QUICK CHECK: STRIP

If the string x equals "(name, date),\n", which of the following returns a string containing "name, date"?

  1. x.rstrip("),")

  2. x.strip("),\n")

  3. x.strip("\n)(,")

  4. x.strip("\n)(,") will remove the newline as well as the comma and parentheses.

QUICK CHECK: STRING SEARCHING

If you want to see whether a line ends with the string "rejected", what string method would you use? Are there any other ways you could get the same result?

1endswith('rejected')

copy

You could also do line[:-8] == rejected, but that wouldn’t be as clear or Pythonic.

QUICK CHECK: MODIFYING STRINGS

What would be a quick way to change all punctuation in a string to spaces?

1234>>> punct = str.maketrans("!.,:;-?", "       ")
>>> x = "This is text, with: punctuation! Right?"
>>> x.translate(punct)
'This is text  with  punctuation  Right '

copy

TRY THIS: STRING OPERATIONS

Suppose that you have a list of strings in which some (but not necessarily all) of the strings begin and end with the double quote character:

1x = ['"abc"', 'def', '"ghi"', '"klm"', 'nop']

copy

What code would you use on each element to remove just the double quotes?

123456789>>> for item in x:

...     print(item.strip('"'))
...
abc
def
ghi
klm
nop

copy

What code could you use to find the position of the last p in Mississippi? When you’ve found its position, what code would you use to remove just that letter?

1234567>>> state = "Mississippi"
>>> pos = state.rfind("p")

>>> state = state[:pos] + state[pos+1:]
>>> print(state)

Mississipi

copy

QUICK CHECK: THE FORMAT() METHOD

What will be in x when the following snippets of code are executed?

12345678910x = "{1:{0}}".format(3, 4)
'  4'

x = "{0:$>5}".format(3)
'$$$$3'
x = "{a:{b}}".format(a=1, b=5)
'    1'

x = "{a:{b}}:{0:$>5}".format(3, 4, a=1, b=5, c=10)
'    1:$$$$3'

copy

QUICK CHECK: FORMATTING STRINGS WITH %

What would be in the variable x after the following snippets of code have executed?

123456x = "%.2f" % 1.1111
x will contain '1.11'
x = "%(a).2f" % {'a':1.1111}
x will contain '1.11'
x = "%(a).08f" % {'a':1.1111}
x will contain '1.11110000'

copy

QUICK CHECK: BYTES

For which of the following kinds of data would you want to use a string? For which could you use bytes?

(1) Data file storing binary data

Bytes. Because the data is binary, you’re more concerned with the contents as numbers rather than text. Therefore, it would make sense to use bytes.

(2) Text in a language with accented characters

String. Python 3 strings are Unicode, so they can handle accented characters.

(3) Text with only uppercase and lowercase roman characters

String. Strings should be used for all text in Python 3.

(4) A series of integers no larger than 255

Bytes. A byte is an integer no larger than 255, so the bytes type is perfect for storing integers like this.

LAB 6: PREPROCESSING TEXT

In processing raw text, it’s quite often necessary to clean and normalize the text before doing anything else. If you want to find the frequency of words in text, for example, you can make the job easier if, before you start counting, you make sure that everything is lowercase (or uppercase, if you prefer) and that all punctuation has been removed. It can also make things easier to break the text into a series of words.

In this lab, the task is to read an excerpt of the first chapter of Moby Dick, make sure that everything is one case, remove all punctuation, and write the words one per line to a second file. Again, because I haven’t yet covered reading and writing files, the code for those operations is supplied below.

Your task is to come up with the code to replace the commented lines in the sample below:

123456789101112131415161718192021222324with open("moby_01.txt") as infile, open("moby_01_clean.txt", "w") as
  outfile:
    for line in infile:
        # make all one case
        # remove punctuation
        # split into words
        # write all words for line
        outfile.write(cleaned_words)
punct = str.maketrans("",  "", "!.,:;-?")

with open("moby_01.txt") as infile, open("moby_01_clean.txt", "w") as
  outfile:
    for line in infile:
        # make all one case
        cleaned_line = line.lower()

        # remove punctuation
        cleaned_line = cleaned_line.translate(punct)

        # split into words
        words = cleaned_line.split()
        cleaned_words = "\n".join(words)
        # write all words for line
        outfile.write(cleaned_words)

copy

B.4. Chapter 7

TRY THIS: CREATE A DICTIONARY

Write the code to ask the user for three names and three ages. After the names and ages are entered, ask the user for one of the names, and print the correct age.

1234567891011121314151617>>> name_age = {}
>>> for i in range(3):
...     name = input("Name? ")
...     age = int(input("Age? "))
...     name_age[name] = age

>>> name_choice = input("Name to find? ")
>>> print(name_age[name_choice])

Name? Tom
Age? 33
Name? Talita
Age? 28
Name? Rania
Age? 35
Name to find? Talita
28

copy

QUICK CHECK: DICTIONARY OPERATIONS

Assume that you have a dictionary x = {'a':1, 'b':2, 'c':3, 'd':4} and a dictionary y = {'a':6, 'e':5, 'f':6}. What would be the contents of x after the following snippets of code have executed?

123456789101112131415del x['d']
z = x.setdefault('g', 7)
x.update(y)

>>> x = {'a':1, 'b':2, 'c':3, 'd':4}
>>> y = {'a':6, 'e':5, 'f':6}
>>> del x['d']
>>> print(x)
{'a': 1, 'b': 2, 'c': 3}
>>> z = x.setdefault('g', 7)
>>> print(x)
{'a': 1, 'b': 2, 'c': 3, 'g': 7}
>>> x.update(y)
>>> print(x)
{'a': 6, 'b': 2, 'c': 3, 'g': 7, 'e': 5, 'f': 6}

copy

QUICK CHECK: WHAT CAN BE A KEY?

Decide which of the following expressions can be a dictionary key: 1; 'bob'; ('tom', [1, 2, 3]); ["filename"]; "filename"; ("filename", "extension")

1: Yes.

'bob': Yes.

('tom', [1, 2, 3]): No; it contains a list, which isn’t hashable.

["filename"]: No; it’s a list, which isn’t hashable.

"filename": Yes.

("filename", "extension"): Yes; it’s a tuple.

TRY THIS: USING DICTIONARIES

Suppose that you’re writing a program that works like a spreadsheet. How might you use a dictionary to store the contents of a sheet? Write some sample code to both store a value and retrieve a value in a particular cell. What might be some drawbacks to this approach?

You could use tuples of row, column values as keys to store the values in a dictionary. One drawback would be that the keys wouldn’t be sorted, so you’d have to manage that situation as you grabbed the keys/values to render as a spreadsheet.

123456>>> sheet = {}
>>> sheet[('A', 1)] = 100
>>> sheet[('B', 1)] = 1000

>>> print(sheet[('A', 1)])
100

copy

LAB 7: WORD COUNTING

In Lab 6, you took the text of the first chapter of Moby Dick, normalized the case, removed punctuation, and wrote the separated words to a file. In this lab, you read that file, use a dictionary to count the number of times each word occurs, and report the most common and least common words.

Use this code to read the words from the file into a list called moby_words:

123456789101112131415161718192021222324252627282930313233343536373839moby_words = []
    for word in infile:
        if word.strip():
            moby_words.append(word.strip())

moby_words = []
with open('moby_01_clean.txt') as infile:
    for word in infile:
        if word.strip():
            moby_words.append(word.strip())

word_count = {}
for word in moby_words:
    count = word_count.setdefault(word, 0)
    count += 1
    word_count[word] += 1

word_list = list(word_count.items())
word_list.sort(key=lambda x: x[1])
print("Most common words:")
for word in reversed(word_list[-5:]):
    print(word)
print("\nLeast common words:")
for word in word_list[:5]:
    print(word)

Most common words:
('the', 14)
('and', 9)
('i', 9)
('of', 8)
('is', 7)

Least common words:
('see', 1)
('growing', 1)
('soul', 1)
('having', 1)
('regulating', 1)

copy

B.5. Chapter 8

TRY THIS: LOOPING AND IF STATEMENTS

Suppose that you have a list x = [1, 3, 5, 0, -1, 3, -2], and you need to remove all negative numbers from that list. Write the code to do this.

1234567x = [1, 3, 5, 0, -1, 3, -2]
for i in x:
    if i < 0:
        x.remove(i)
print(x)

[1, 3, 5, 0, 3]

copy

How would you count the total number of negative numbers in a list y = [[1, -1, 0], [2, 5, -9], [-2, -3, 0]]?

123456789count = 0
y = [[1, -1, 0], [2, 5, -9], [-2, -3, 0]]
for row in y:
    for col in row:
        if col < 0:
            count += 1
print(count)

4

copy

What code would you use to print "very low" if the value of x is below -5, "low" if it’s from -4 up to 0, "neutral" if it’s equal to 0, "high" if it’s greater than 0 up to 4, and "very high" if it’s greater than 5?

12345678if x < -5:
    print("very low")
elif x <= 0:
    print("low")
elif x <= 5:
    print("high")
else:
    print("very high")

copy

TRY THIS: COMPREHENSIONS

What list comprehension would you use to process the list x so that all negative values are removed?

1234x = [1, 3, 5, 0, -1, 3, -2]
new_x = [i for i in x if i >= 0]
print(new_x)
[1, 3, 5, 0, 3]

copy

Create a generator that returns only odd numbers from 1 to 100. (Hint: A number is odd if there’s a remainder when it’s divided by 2; use % 2 to do this.)

123odd_100 = (x for x in range(100) if x % 2)
for i in odd_100:
    print(i))

copy

Write the code to create a dictionary of the numbers and their cubes from 11 through 15.

123cubes = {x: x**3 for x in range(11, 16)}
print(cubes)
{11: 1331, 12: 1728, 13: 2197, 14: 2744, 15: 3375}

copy

QUICK CHECK: BOOLEANS AND TRUTHINESS

Decide whether the following statements are true or false: 1, 0, -1, [0], 1 and 0, 1 > 0 or []

1 ->: True.

0 ->: False.

-1: True.

[0]: True; it’s a list containing one item.

1 and 0: False.

1 > 0 or []: True.

LAB: REFACTOR WORD_COUNT

Rewrite the word-count program in section 8.7 to make it shorter. You may want to look at the string and list operations already discussed, as well as think about different ways to organize the code. You may also want to make the program smarter so that only alphabetic strings (not symbols or punctuation) count as words.

Listing B.1. File: word_count_refactored.py

123456789101112131415161718192021# File: word_count_refactored.py
""" Reads a file and returns the number of lines, words,
    and characters - similar to the UNIX wc utility
"""

# initialze counts
line_count = 0
word_count = 0
char_count = 0

# open the file
with  open('word_count.tst') as infile:
    for line in infile:
        line_count += 1
        char_count += len(line)
        words = line.split()
        word_count += len(words)

# print the answers using the format() method
print("File has {0} lines, {1} words, {2} characters".format(line_count,
                                                    word_count, char_count))

copy

B.6. Chapter 9

QUICK CHECK: FUNCTIONS AND PARAMETERS

How would you write a function that could take any number of unnamed arguments and print their values in reverse order?

12345def my_funct(*params):
    for i in reversed(params):
        print(i)

my_funct(1,2,3,4)

copy

What do you need to do to create a procedure or void function—that is, a function with no return value?

Either don’t return a value (use a bare return) or don’t use a return statement at all.

What happens if you capture the return value of a function with a variable?

The only result is that you can use that value, whatever it might be.

QUICK CHECK: MUTABLE FUNCTION PARAMETERS

What would be the result of changing a list or dictionary that was passed into a function as a parameter value? Which operations would be likely to create changes that would be visible outside the function? What steps might you take to minimize that risk?

The changes would persist for future uses of the default parameter. Operations such as adding and deleting elements, as well as changing the value of an element, are particularly likely to be problems. To minimize the risk, it’s better not to use mutable types as default parameters.

TRY THIS: GLOBAL VS LOCAL VARIABLES

Assuming that x = 5, what will be the value of x after funct_1() below executes? After funct_2()?

12345def funct_1():
    x = 3
def funct_2():
    global x
    x = 2

copy

After calling funct_1(), x will be unchanged; after funct_2(), the value in the global x will be 2.

QUICK CHECK: GENERATOR FUNCTIONS

What would you need to modify in the code for the function four() above to make it work for any number? What would you need to add to allow the starting point to also be set?

123456789101112131415161718192021>>> def four(limit):
...     x = 0
...     while x < limit:
...         print("in generator, x =", x)
...         yield x
...         x += 1
...
>>> for i in four(4):
...     print(i)

To specify the start:

>>> def four(start, limit):
...     x = start
...     while x < limit:
...         print("in generator, x =", x)
...         yield x
...         x += 1
...
>>> for i in four(1, 4):
...     print(i)

copy

TRY THIS: DECORATORS

How would you modify the code for the decorator function above to remove unneeded messages and enclose the return value of wrapped function in "<html>" and "</html>" so that myfunction ("hello") would return "<html>hello<html>"?

This exercise is a hard one, because to define a function that changes the return value, you need to add an inner wrapper function to call the original function and add to the return value.

12345678910111213141516def decorate(func):
    def wrapper_func(*args):
        def inner_wrapper(*args):
                return_value = func(*args)
                return "<html>{}<html>".format(return_value)

        return inner_wrapper(*args)
    return wrapper_func

@decorate
def myfunction(parameter):
    return parameter

print(myfunction("Test"))

<html>Test<html>

copy

LAB 9: USEFUL FUNCTIONS

Looking back at chapters 6 and 7, refactor the code into functions for cleaning and processing the data. The goal should be that most of the logic is moved into functions. Use your own judgment as to the types of functions and parameters, but keep in mind that functions should do just one thing and that they shouldn’t have any side effects that carry over outside the function.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061punct = str.maketrans("",  "", "!.,:;-?")

def clean_line(line):
    """changes case and removes punctuation"""
    # make all one case
    cleaned_line = line.lower()

    # remove punctuation
    cleaned_line = cleaned_line.translate(punct)
    return cleaned_line


def get_words(line):
    """splits line into words, and rejoins with newlines"""
    words = line.split()
    return "\n".join(words) + "\n"


with open("moby_01.txt") as infile, open("moby_01_clean.txt", "w")
  as outfile:
    for line in infile:
        cleaned_line = clean_line(line)

        cleaned_words = get_words(cleaned_line)

        # write all words for line
        outfile.write(cleaned_words)

def count_words(words):
    """takes list of cleaned words, returns count dictionary"""
    word_count = {}
    for word in moby_words:
        count = word_count.setdefault(word, 0)
        word_count[word] += 1
    return word_count


def word_stats(word_count):
    """Takes word count dictionary and returns top and bottom five
  entries"""
    word_list = list(word_count.items())
    word_list.sort(key=lambda x: x[1])
    least_common = word_list[:5]
    most_common = word_list[-1:-6:-1]
    return most_common, least_common

moby_words = []
with open('moby_01_clean.txt') as infile:
    for word in infile:
        if word.strip():
            moby_words.append(word.strip())

word_count = count_words(moby_words)

most, least = word_stats(word_count)
print("Most common words:")
for word in most:
    print(word)
print("\nLeast common words:")
for word in least:
    print(word)

copy

B.7. Chapter 10

QUICK CHECK: MODULES

Suppose that you have a module called new_math that contains a function called new_divide. What are the ways that you might import and then use that function? What are the pros and cons of each way?

12import new_math
new_math.new_divide(...)

copy

This solution is often preferred because there won’t be a clash between any identifiers in new_module and the importing namespace. This solution is less convenient to type, however.

12from new_math import new_divide
new_divide(...)

copy

This version is more convenient to use but increases the chance of name clashes between identifiers in the module and the importing namespace.

Suppose that the new_math module contains a function call _helper_math(). How will the underscore character affect the way that _helper_math() is imported?

It won’t be imported if you use from new_math import *

QUICK CHECK: NAMESPACES AND SCOPE

Consider a variable width that’s in the module make_window.py. In which of the following contexts is width in scope?

(A) With the module itself

(B) Inside the resize() function in the module

(C) Within the script that imported the make_window.py module

A and B but not C

LAB 10: CREATE A MODULE

Package the functions that you created at the end of chapter 9 as a standalone module. Although you can include code to run the module as the main program, the goal should be for the functions to be completely usable from another script.

(no answer)

B.8. Chapter 11

TRY THIS: MAKING A SCRIPT EXECUTABLE

Experiment with executing scripts on your platform. Also try to redirect input and output into and out of your scripts.

(no answer)

QUICK CHECK: PROGRAMS AND MODULES

What issue is the use of if __name__ == "__main__": meant to prevent, and how does it do that? Can you think of any other way to prevent this issue?

When Python loads a module, all of its code is executed. By using the pattern above, you can have certain code run only if it’s being executed as the main script file.

LAB 11: CREATING A PROGRAM

In chapter 8, you created a version of the UNIX wc utility to count the lines, words, and characters in a file. Now that you have more tools at your disposal, refactor that program to make it work more like the original. In particular, it should have options to show only lines (-l), only words (-w), and only characters (-c). If none of those options is given, all three stats are displayed, but if any of them is present, only the specified stats are shown.

For an extra challenge, look at the man page for wc on a Linux/UNIX system, and add the -L to show the longest line length. Feel free to try to implement the complete behavior as listed in the man page, and test it against your system’s wc utility.

12345678910111213141516171819202122232425262728293031323334353637383940# File: word_count_program.py
""" Reads a file and returns the number of lines, words,
    and characters - similar to the UNIX wc utility
"""
import sys


def main():
    # initialze counts
    line_count = 0
    word_count = 0
    char_count = 0

    option = None
    params = sys.argv[1:]
    if len(params) > 1:
        # if more than one param, pop the first one as the option
        option = params.pop(0).lower().strip()
    filename = params[0]    # open the file
    with  open(filename) as infile:
        for line in infile:
            line_count += 1
            char_count += len(line)
            words = line.split()
            word_count += len(words)

    if option == "-c":
        print("File has {} characters".format(char_count))
    elif option == "-w":
        print("File has {} words".format(word_count))
    elif option == "-l":
        print("File has {} lines".format(line_count))
    else:
        # print the answers using the format() method
        print("File has {0} lines, {1} words, {2}
  characters".format(line_count,
           word_count, char_count))

if __name__ == '__main__':
    main()

copy

B.9. Chapter 12

QUICK CHECK: MANIPULATING PATHS

How would you use the os module’s functions to take a path to a file called test.log and create a new file path in the same directory for a file called test.log.old? How would you do the same thing by using the pathlib module?

123456789101112import os.path
old_path = os.path.abspath('test.log')
print(old_path)
new_path = '{}.{}'.format(old_path, "old")
print(new_path)

import pathlib
path = pathlib.Path('test.log')
abs_path = path.resolve()
print(abs_path)
new_path = str(abs_path) + ".old"
print(new_path)

copy

What path would you get if you created a pathlib Path object from os .pardir? Try it to find out.

123456test_path = pathlib.Path(os.pardir)
print(test_path)
test_path.resolve()

..
PosixPath('/home/naomi/Documents/QPB3E/qpbe3e')

copy

LAB 12: MORE FILE OPERATIONS

How might you calculate the total size of all files ending with .txt that aren’t symlinks in a directory? If your first answer was using os.path, also try it with pathlib, and vice versa.

123456789import pathlib
cur_path = pathlib.Path(".")

size = 0
for text_path in cur_path.glob("*.txt"):
    if not text_path.is_symlink():
        size += text_path.stat().st_size

print(size)

copy

Write some code that builds off your solution above to move the same .txt files in the question above to a new directory called backup in the same directory.

1234567891011import pathlib
cur_path = pathlib.Path(".")
new_path = cur_path.joinpath("backup")

size = 0
for text_path in cur_path.glob("*.txt"):
    if not text_path.is_symlink():
        size += text_path.stat().st_size
        text_path.rename(new_path.joinpath(text_path.name))

print(size)

copy

B.10. Chapter 13

QUICK CHECK

What is the significance of adding a "b" to the file open mode string?

It makes the file open in binary mode, reading and writing bytes, not characters.

Suppose that you want to open a file named myfile.txt and write some additional data at the end of it. What command would you use to open myfile.txt? What command would you use to reopen the file to read from the beginning?

12open("myfile.txt", "a")
open("myfile.txt")

copy

TRY THIS: REDIRECTING INPUT AND OUTPUT

Write some code to use the mio.py module above to capture all of the print output of a script to a file named myfile.txt, reset the standard output to the screen, and print that file to screen.

123456789101112131415161718192021# mio_test.py

import mio

def main():
    mio.capture_output("myfile.txt")
    print("hello")
    print(1 + 3)
    mio.restore_output()

    mio.print_file("myfile.txt")


if __name__ == '__main__':
    main()

output will be sent to file: myfile.txt
restore to normal by calling 'mio.restore_output()'
standard output has been restored back to normal
hello
4

copy

QUICK CHECK: STRUCT

What use cases can you think of in which the struct module would be useful for either reading or writing binary data?

  • You’re trying to read/write from a binary-format application file or image file.

  • You’re reading from some external interface, such as a thermometer or accelerometer, and you want to save the raw data exactly as it was transmitted.

QUICK CHECK: PICKLES

Think about why a pickle would or wouldn’t be a good solution for the following use cases:

(A) Saving some state variables from one run to the next

(B) Keeping a high-score list for a game

(C) Storing usernames and passwords

(D) Storing a large dictionary of English terms

A and B would be reasonable, although pickles aren’t secure.

C and D wouldn’t be good; the lack of security would be a big problem for C, and for D, there’d be a need to load the entire pickle into memory.

QUICK CHECK: SHELVE

Using a shelf object looks very much like using a dictionary. In what ways is using a shelf object different? What disadvantages would you expect there to be in using a shelf object?

The key difference is that the objects are stored on disk, not in memory. With very large amounts of data, particularly with lots of inserts and/or deletes, you’d expect disk access to make things slow.

LAB: FINAL FIXES TO WC

If you look at the man page for the wc utility, you see that two command-line options do very similar things. -c makes the utility count the bytes in the file, and -m makes it count characters (which in the case of some Unicode characters can be two or more bytes long). In addition, if a file is given, it should read from and process that file, but if no file is given, it should read from and process stdin.

Rewrite your version of the wc utility to implement both the distinction between bytes and characters and the ability to read from files and standard input.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849# File: word_count_program_stdin.py
""" Reads a file and returns the number of lines, words,
    and characters - similar to the UNIX wc utility
"""
import sys


def main():
    # initialze counts
    line_count = 0
    word_count = 0
    char_count = 0
    filename = None

    option = None
    if len(sys.argv) > 1:
        params = sys.argv[1:]
        if params[0].startswith("-"):
        # if more than one param, pop the first one as the option
            option = params.pop(0).lower().strip()
        if params:
            filename = params[0]    # open the file
    file_mode = "r"
    if option == "-c":
        file_mode = "rb"
    if filename:
        infile =  open(filename, file_mode)
    else:
        infile = sys.stdin
    with infile:
        for line in infile:
            line_count += 1
            char_count += len(line)
            words = line.split()
            word_count += len(words)

    if option in ("-c", "-m"):
        print("File has {} characters".format(char_count))
    elif option == "-w":
        print("File has {} words".format(word_count))
    elif option == "-l":
        print("File has {} lines".format(line_count))
    else:
        # print the answers using the format() method
        print("File has {0} lines, {1} words, {2}
  characters".format(line_count, word_count, char_count))

if __name__ == '__main__':
    main()

copy

B.11. Chapter 14

TRY THIS: CATCHING EXCEPTIONS

Write some code that gets two numbers from the user and divides the first number by the second. Check for and catch the exception that occurs if the second number is zero (ZeroDivisionError).

123456789101112# the code of your program should do the following
x = int(input("Please enter an integer: "))
y = int(input("Please enter another integer: "))

try:
    z = x / y
except ZeroDivisionError as e:
    print("Can't divide by zero.")

Please enter an integer: 1
Please enter another integer: 0
Can't divide by zero.

copy

QUICK CHECK: EXCEPTIONS AS CLASSES

If MyError inherits from Exception, what will be the difference between except Exception as e and except MyError as e?

The first catches any exception that inherits from Exception (most of them), whereas the second catches only MyError exceptions.

TRY THIS: THE ASSERT STATEMENT

Write a simple program that gets a number from the user and then uses the assert statement to raise an exception if the number is zero. Test to make sure that the assert fires and then turn it off, using one of the methods mentioned above.

12345678910111213x = int(input("Please enter a non-zero integer: "))

assert x != 0, "Integer can not be zero."

Please enter a non-zero integer: 0
----------------------------------------------------------------------
AssertionError                       Traceback (most recent call last)
<ipython-input-222-9f7a09820a1c> in <module>()
      2 x = int(input("Please enter a non-zero integer: "))
      3
----> 4 assert x != 0, "Integer can not be zero."

AssertionError: Integer can not be zero.

copy

QUICK CHECK: EXCEPTIONS

Do Python exceptions force a program to halt?

No. If exceptions are caught and handled correctly, the program won’t need to halt.

Suppose that you want accessing a dictionary x to always return None if a key doesn’t exist in the dictionary (that is, if a KeyError exception is raised). What code would you use to achieve that goal?

1234try:
    x = my_dict[some_key]
except KeyError as e:
    x = None

copy

TRY THIS: EXCEPTIONS

What code would you use to create a custom ValueTooLarge exception and raise that exception if the variable x is over 1000?

123456class ValueTooLarge(Exception):
    pass

x = 1001
if x > 1000:
    raise ValueTooLarge()

copy

QUICK CHECK: CONTEXT MANAGERS

Assume that you’re using a context manager in a script that reads and/or writes several files. Which of the following approaches do you think would be best?

(A) Put the entire script in a block managed by a with statement.

(B) Use one with statement for all file reads and another for all file writes.

(C) Use a with statement each time you read a file or write a file (that is, for each line).

(D) Use a with statement for each file that you read or write.

LAB 14: CUSTOM EXCEPTIONS

Think about the module you wrote in chapter 9 to count word frequencies. What errors might reasonably occur in those functions? Rewrite the code to handle those exception conditions appropriately.

1234567891011121314151617181920212223242526272829303132333435363738394041class EmptyStringError(Exception):
    pass
def clean_line(line):
    """changes case and removes punctuation"""

    # raise exception if line is empty
    if not line.strip():
        raise EmptyStringError()
    # make all one case
    cleaned_line = line.lower()

    # remove punctuation
    cleaned_line = cleaned_line.translate(punct)
    return cleaned_line

def count_words(words):
    """takes list of cleaned words, returns count dictionary"""
    word_count = {}
    for word in words:
        try:
            count = word_count.setdefault(word, 0)
        except TypeError:
            #if 'word' is not hashable, skip to next word.
            pass
        word_count[word] += 1
    return word_count

def word_stats(word_count):
    """Takes word count dictionary and returns top and bottom five
  entries"""
    word_list = list(word_count.items())
    word_list.sort(key=lambda x: x[1])
    try:
        least_common = word_list[:5]
        most_common = word_list[-1:-6:-1]
    except IndexError as e:
        # if list is empty or too short, just return list
        least_common = word_list
        most_common = list(reversed(word_list))

    return most_common, least_common

copy

B.12. Chapter 15

TRY THIS: INSTANCE VARIABLES

What code would you use to create a Rectangle class?

1234class Rectangle:
    def __init__(self):
        self.height = 1
        self.width = 2

copy

TRY THIS: INSTANCE VARIABLES AND METHODS

Update the code for a Rectangle class so that you can set the dimensions when an instance is created, just as for the Circle class above. Also add an area() method.

1234567class Rectangle:
    def __init__(self, width, height):
        self.height = height
        self.width = width

    def area(self):
        return self.height * self.width

copy

TRY THIS: CLASS METHODS

Write a class method that’s similar to total_area() but returns the total circumference of all circles.

1234567891011121314151617181920class Circle:
    pi = 3.14159
    all_circles = []
    def __init__(self, radius):
        self.radius = radius
        self.__class__.all_circles.append(self)

    def area(self):
        return self.radius * self.radius * Circle.pi

    def circumference(self):
        return 2 * self.radius * Circle.pi

    @classmethod
    def total_circumference(cls):
        """class method to total the circumference of all Circles """
        total = 0
        for c in cls.all_circles:
            total = total + c.circumference()
        return total

copy

TRY THIS: INHERITANCE

Rewrite the code for a Rectangle class to inherit from Shape. Because squares and rectangles are related, would it make sense to inherit one from the other? If so, which would be the base class, and which would inherit?

12345678class Shape:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class Rectangle(Shape):
    def __init__(self, x, y):
        super().__init__(x, y)

copy

It probably would make sense to inherit. Because squares are special kinds of rectangles, Square should inherit from the Rectangle class.

If Square was specialized so that it had only one dimension x, you would write

12def area(self):
    return self.x * self.x

copy

How would you write the code to add an area() method for the Square class? Should the area() method be moved into the base Shape class and inherited by Circle, Square, and Rectangle? What issues would that change cause?

It makes sense to put the area() method in a Rectangle class that Square inherits from, but putting it in Shape wouldn’t be very helpful, because different types of shapes have their own rules for calculating area. Every shape would be overriding the base area() method anyway.

TRY THIS: PRIVATE INSTANCE VARIABLES

Modify the Rectangle class’s code to make the dimension variables private. What restriction will this change impose on using the class?

The dimension variables will no longer be accessible outside the class via .x and .y.

1234class Rectangle():
    def __init__(self, x, y):
        self.__x = x
        self.__y = y

copy

TRY THIS: PROPERTIES

Update the dimensions of the Rectangle class to be properties with getters and setters that don’t allow negative sizes.

12345678910111213141516171819202122232425262728293031class Rectangle():
    def __init__(self, x, y):
        self.__x = x
        self.__y = y

    @property
    def x(self):
        return self.__x

    @x.setter
    def x(self, new_x):
        if new_x >= 0:
            self.__x = new_x

    @property
    def y(self):
        return self.__y

    @y.setter
    def y(self, new_y):
        if new_y >= 0:
            self.__y = new_y

my_rect = Rectangle(1,2)
print(my_rect.x, my_rect.y)
my_rect.x = 4
my_rect.y = 5
print(my_rect.x, my_rect.y)

1 2
4 5

copy

LAB 15: HTML CLASSES

In this lab, you create classes to represent an HTML document. To keep things simple, assume that each element can contain only text and one subelement. So the <html> element contains only a <body> element, and the <body> element contains (optional) text and a <p> element, which contains only text.

The key feature to implement is the __str__() method, which in turn calls its subelement’s __str__() method so that the entire document is returned when the str() function is called on an <html> element. You can assume that any text comes before the subelement.

Following is example output from using the classes:

12345678910111213para = p(text="this is some body text")
doc_body = body(text="This is the body", subelement=para)
doc = html(subelement=doc_body)
print(doc)

<html>
<body>
This is the body
<p>
this is some body text
</p>
</body>
</html>

copy

Answer:

12345678910111213141516171819202122232425262728293031323334353637class element:
    def __init__(self, text=None, subelement=None):
        self.subelement = subelement
        self.text = text

    def __str__(self):
        value = "<{}>\n".format(self.__class__.__name__)
        if self.text:
            value += "{}\n".format(self.text)
        if self.subelement:
            value += str(self.subelement)
        value += "</{}>\n".format(self.__class__.__name__)
        return value

class html(element):
    def __init__ (self, text=None, subelement=None):
        super().__init__(text, subelement)
    def __str__(self):
        return super().__str__()

class body(element):
    def __init__ (self, text=None, subelement=None):
        return super().__init__(text, subelement)
    def __str__(self):
        return super().__str__()

class p(element):
    def __init__(self, text=None, subelement=None):
        super().__init__(text, subelement)
    def __str__(self):
        return super().__str__()


para = p(text="this is some body text")
doc_body = body(text="This is the body", subelement=para)
doc = html(subelement=doc_body)
print(doc)

copy

B.13. Chapter 16

QUICK CHECK: SPECIAL CHARACTERS IN REGULAR EXPRESSIONS

What regular expression would you use to match strings that represent the numbers -5 through 5?

`r"-{0,1}[0-5]"` matches strings that represent the numbers -5 through 5.

What regular expression would you use to match a hexadecimal digit? Assume that the allowed hexadecimal digits are 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, A, a, B, b, C, c, D, d, E, e, F, and f.

1`r"[0-9A-Fa-f]"`

copy

TRY THIS: EXTRACTING MATCHED TEXT

Making international calls usually requires a plus sign (+) and the country code. Assuming that the country code is two digits, how would you modify the code above to extract the plus sign and the country code as part of the number? (Again, not all numbers have a country code.) How would you make the code handle country codes of one to three digits?

12re.match(r": (?P<phone>(\+\d{2}-)?(\d\d\d-)?\d\d\d-\d\d\d\d)", ":
  +01-111-222-3333")

copy

or

12re.match(r": (?P<phone>(\+\d{2}-)?(\d{3}-)?\d{3}-\d{4})", ":
  +01-111-222-3333")

copy

For one- to three-digit country codes:

12re.match(r": (?P<phone>(\+\d{1,3}-)?(\d{3}-)?\d{3}-\d{4})", ":
  +011-111-222-3333")

copy

TRY THIS: REPLACING TEXT

In the checkpoint above, you extended a phone-number regular expression to also recognize a country code. How would you use a function to make any numbers that didn’t have a country code now have +1 (the country code for the United States and Canada)?

1234def add_code(match_obj):
    return("+1 "+match_obj.group('phone'))

re.sub(r"(?P<phone>(\d{3}-)?\d{3}-\d{4})", add_code, "111-222-3333")

copy

LAB 16: PHONE NUMBER NORMALIZER

In the United States and Canada, phone numbers consist of 10 digits, usually separated into a 3-digit area code, a 3-digit exchange code, and a 4-digit station code. As mentioned above, phone numbers may or may not be preceded by +1, the country code. In practice, there are many ways of formatting a phone number, such as (NNN) NNN-NNNN, NNN-NNN-NNNN, NNN NNN-NNNN, NNN.NNN.NNNN, and NNN NNN NNNN. Also, the country code may not be present, may not have a plus sign, and is usually (not always) separated from the number by a space or dash. Whew!

In this lab, the task is to create a phone number normalizer that takes any of the formats mentioned above and returns a normalized phone number 1-NNN-NNN-NNNN.

The following are all possible phone numbers:

+1 223-456-7890

1-223-456-7890

+1 223 456-7890

(223) 456-7890

1 223 456 7890

223.456.7890

Bonus: The first digit of the area code and the exchange code can be only 2–9, and the second digit of an area code can’t be 9. Use this information to validate the input and return the message "invalid phone number" if the number is invalid.

123456789101112131415161718192021222324252627282930test_numbers = ["+1 223-456-7890",
                "1-223-456-7890",
                "+1 223 456-7890",
                "(223) 456-7890",
                "1 223 456 7890",
                "223.456.7890",
                "1-989-111-2222"]

def return_number(match_obj):

    # validate number raise ValueError if not valid
    if not re.match(r"[2-9][0-8]\d", match_obj.group("area") ):
        raise ValueError("invalid phone number area code
  {}".format(match_obj.group("area")))
    if not re.match(r"[2-9]\d\d", match_obj.group("exch") ):
        raise ValueError("invalid phone number exchange
  {}".format(match_obj.group("exch")))

    return("{}-{}-{}-{}".format(country, match_obj.group('area'),
                                match_obj.group('exch'),
  match_obj.group('number')))

    country = match_obj.group("country")
    if not country:
        country = "1"

regexp = re.compile(r"\+?(?P<country>\d{1,3})?[- .]?\(?(?P<area>\
  d{3})\)?[- .]?(?P<exch>(\d{3}))[- .](?P<number>\d{4})")
for number in test_numbers:
    print(regexp.sub(return_number, number))

copy

B.14. Chapter 17

QUICK CHECK: TYPES

Suppose that you want to make sure that object x is a list before you try appending to it. What code would you use? What would be the difference between using type() and isinstance()? Would this be the LBYL (look before you leap) or EAFP (easier to ask forgiveness than permission) style of programming? What other options might you have besides checking the type explicitly?

123x = []
if isinstance(x, list):
    print("is list")

copy

Using type would get only lists, not anything that subclasses lists. Either way, it’s LBYL programming.

You might also wrap the append in a try... except block and catch TypeError exceptions, which would be more EAFP.

QUICK CHECK: __GETITEM__

The example use of __getitem__ above is very limited and won’t work correctly in many situations. What are some cases in which the implementation above will fail or work incorrectly?

This implementation will not work if you try to access an item directly by index; neither can you move backward.

TRY THIS: IMPLEMENTING LIST SPECIAL METHODS

Try implementing the __len__ and __delitem__ special methods listed earlier, as well as an append method. The implementation is in bold in the code.

1234567891011121314151617181920212223242526272829303132class TypedList:
    def __init__(self, example_element, initial_list=[]):
        self.type = type(example_element)
        if not isinstance(initial_list, list):
            raise TypeError("Second argument of TypedList must "
                            "be a list.")
        for element in initial_list:
            self.__check(element)
        self.elements = initial_list[:]
    def __check(self, element):
        if type(element) != self.type:
            raise TypeError("Attempted to add an element of "
                            "incorrect type to a typed list.")
    def __setitem__(self, i, element):
        self.__check(element)
        self.elements[i] = element
    def __getitem__(self, i):
        return self.elements[i]

    # added methods
    def __delitem__(self, i):
        del self.elements[i]
    def __len__(self):
        return len(self.elements)
    def append(self, element):
        self.__check(element)
        self.elements.append(element)

x = TypedList(1, [1,2,3])
print(len(x))
x.append(1)
del x[2]

copy

QUICK CHECK: SPECIAL METHOD ATTRIBUTES AND SUBCLASSING EXISTING TYPES

Suppose that you want a dictionary like type that allows only strings as keys (maybe to make it work like a shelf object, as described in Chapter 13). What options would you have for creating such a class? What would be the advantages and disadvantages of each option?

You could use the same approach as you did for TypedList and inherit from the UserDict class. You could also inherit directly from dict, or you could implement all of the dict functionality yourself.

Implementing everything yourself provides the most control but is the most work and most prone to bugs. If the changes you need to make are small (in this case, just checking the type before adding a key), it might make the most sense to inherit directly from dict. On the other hand, inheriting from UserDict is probably safest, because the internal dict object will continue to be a regular dict, which is a highly optimized and mature implementation.

B.15. Chapter 18

QUICK CHECK: PACKAGES

Suppose that you’re writing a package that takes a URL, retrieves all images on the page pointed to by that URL, resizes them to a standard size, and stores them. Leaving aside the exact details of how each of these functions will be coded, how would you organize those features into a package?

The package will be performing three types of actions: fetching a page and parsing the HTML for image URLs, fetching the images, and resizing the images. For this reason, you might consider having three modules to keep the actions separate:

12345picture_fetch/
    __init__.py
    find.py
    fetch.py
    resize.py

copy

LAB 18: CREATE A PACKAGE

In chapter 14, you added error handling to the text cleaning and word frequency counting module you created in chapter 11. Refactor that code into a package containing one module for the cleaning functions, one for the processing functions, and one for the custom exceptions. Then write a simple main function that uses all three modules.

12345word_count
    __init__.py
    exceptions.py
    cleaning.py
    counter.py

copy

B.16. Chapter 20

QUICK CHECK: CONSIDER THE CHOICES

Take a moment to consider your options for handling the tasks identified above. What modules in the standard library can you think of that will do the job? If you want to, you can even stop right now, work out the code to do it, and compare your solution with the one you’ll develop in the next section.

From the standard library, use datetime for managing the dates/times of the files, and either os.path and os or pathlib for renaming and archiving the files.

QUICK CHECK: POTENTIAL PROBLEMS

Because the previous solution is very simple, there are likely to be many situations that it won’t handle well. What are some potential issues or problems that might arise with the script above? How might you remedy these problems?

Multiple files during the same day would be a problem, for one thing. If you have lots of files, navigating the archive directory will become increasingly difficult.

Consider the naming convention used for the files, which is based on the year, month and name, in that order. What advantages do you see in that convention? What might be the disadvantages? Can you make any arguments for putting the date string somewhere else in the filename, such as the beginning or the end?

Using year-month-day date formats makes a text-based sort of the files sort by date as well. Putting the date at the end of the filename but before the extension makes it more difficult to parse the date element visually.

TRY THIS: IMPLEMENTATION OF MULTIPLE DIRECTORIES

Using the code you developed in the section above as a starting point, how would you modify it to implement archiving each set of files in subdirectories named according to the date received? Feel free to take the time to implement the code and test it.

12345678910111213141516171819import datetime
import pathlib

FILE_PATTERN = "*.txt"
ARCHIVE = "archive"

if __name__ == '__main__':

    date_string = datetime.date.today().strftime("%Y-%m-%d")

    cur_path = pathlib.Path(".")

    new_path = cur_path.joinpath(ARCHIVE, date_string)
    new_path.mkdir()

    paths = cur_path.glob(FILE_PATTERN)

    for path in paths:
        path.rename(new_path.joinpath(path.name))

copy

QUICK CHECK: ALTERNATE SOLUTIONS

How might you create a script that does the same thing without using pathlib? What libraries and functions would you use?

You’d use the os.path and os libraries—specifically, os.path.join(), os.mkdir(), and os.rename().

TRY THIS: ARCHIVING TO ZIP FILES PSEUDOCODE

Take a moment to write the pseudocode for a solution that stores data files in zip files as shown above. What modules and functions or methods do you intend to use? Try coding your solution to make sure that it works.

Pseudocode:

12345create path for zip file
create empty zipfile
for each file
    write into zipfile
    remove original file

copy

(See the next section for sample code that does this.)

QUICK CHECK: CONSIDER DIFFERENT PARAMETERS

Take some time to consider different grooming options. How would you modify the code in the previous Try This to keep only one file a month? How would you change the code so that files from the previous month and older are groomed to save one a week? (Note: This is not the same as older than 30 days!)

You could use something similar to the code above but also check the month of the file against the current month.

B.17. Chapter 21

QUICK CHECK: NORMALIZATION

Look closely at the list of words generated above. Do you see any issues with the normalization so far? What other issues do you think you might encounter with a longer section of text? How do you think you might deal with those issues?

Double hyphens for em dashes, hyphenation for line breaks and otherwise, and any other punctuation marks would all be potential problems.

Enhancing the word cleaning module you created in chapter 18 would be a good way to cover most of the issues.

TRY THIS: READ A FILE

Write the code to read a text file (assume that it’s the file temp_data_00a.txt as shown in the example above), split each line of the file into a list of values, and add that list to a single list of records.

(no answer)

What issues or problems did you encounter in implementing this solution? How might you go about converting the last three fields to the correct date, real, and int types?

You could use a list comprehension to explicitly convert those fields.

QUICK CHECK: HANDLING QUOTING

Consider how you’d approach the problems of handling quoted fields and embedded delimiter characters if you didn’t have the csv library. Which is easier to handle: the quoting or the embedded delimiters?

Without using the csv module, you’d have to check whether a field began and ended with the quote characters and then strip() them off.

To handle embedded delimiters without using the csv library, you’d have to isolate the quoted fields and treat them differently; then you’d split the rest of the fields by using the delimiter.

TRY THIS: CLEANING DATA

How would you handle the fields with 'Missing' as a possible value for math calculations? Can you write a snippet of code that averages one of those columns?

12clean_field = [float(x[13]) for x in data_rows if x[13] != 'Missing']
average = sum(clean_field)/len(clean_field)

copy

What would you do with the average column at the end so that you could also report the average coverage? In your opinion, would the solution to this problem be at all linked to the way that the 'Missing' entries were handled?

1coverage_values = [float(x[-1].strip("%"))/100]

copy

It may not be done at the same time as the 'Missing' values are handled.

LAB: WEATHER OBSERVATIONS

The file of weather observations provided here is by month and then by county for the state of Illinois from 1979 to 2011. Write the code to process this file and extract the data for Chicago (Cook County) into a single CSV or spreadsheet file. This code includes replacing the 'Missing' strings with empty strings and translating the percentage to a decimal. You may also consider what fields are repetitive and can be omitted or stored elsewhere. The proof that you’ve got it right occurs when you load the file into a spreadsheet. You can download a solution with the book’s source code.

B.18. Chapter 22

TRY THIS: RETRIEVING A FILE

If you’re working with the data file above and want to break each line into separate fields, how might you do that? What other processing would you expect to do? Try writing some code to retrieve this file and calculate the average annual rainfall or, for more of a challenge, the average maximum and minimum temperature for each year.

123456789101112131415import requests
response = requests.get("http://www.metoffice.gov.uk/pub/data/weather
  /uk/climate/stationdata/heathrowdata.txt")

data = response.text
data_rows = []
rainfall = []
for row in data.split("\r\n")[7:]:
    fields = [x for x in row.split(" ") if x]
    data_rows.append(fields)
    rainfall.append(float(fields[5]))

print("Average rainfall = {} mm".format(sum(rainfall)/len(rainfall)))

Average rainfall = 50.43794749403351 mm

copy

TRY THIS: ACCESSING AN API

Write some code to fetch some data from the city of Chicago site used above. Look at the fields mentioned in the results, and see whether you can select on records based on another field in combination with the date range.

123456import requests
response = requests.get("https://data.cityofchicago.org/resource/
  6zsd-86xi.json?$where=date between '2015-01-10T12:00:00' and
  '2015-01-10T13:00:00'&arrest=true")

print(response.text)

copy

TRY THIS: SAVING SOME JSON CRIME DATA

Modify the code you wrote to fetch Chicago crime data in section 22.2 to convert the fetched data from a JSON-formatted string to a Python object. See whether you can save the crime events both as a series of separate JSON objects in one file and as one JSON object in another file. Then see what code is needed to load each file.

123456789101112131415161718192021222324import json
import requests

response = requests.get("https://data.cityofchicago.org/resource/
  6zsd-86xi.json?$where=date between '2015-01-10T12:00:00' and
  '2015-01-10T13:00:00'&arrest=true")

crime_data = json.loads(response.text)

with open("crime_all.json", "w") as outfile:
    json.dump(crime_data, outfile)

with open("crime_series.json", "w") as outfile:
    for record in crime_data:
        json.dump(record, outfile)
        outfile.write("\n")

with open("crime_all.json") as infile:
    crime_data_2 = json.load(infile)

crime_data_3 = []
with open("crime_series.json") as infile:
    for line in infile:
        crime_data_3 = json.loads(line)

copy

TRY THIS: FETCHING AND PARSING XML

Write the code to pull the Chicago XML weather forecast from http://mng.bz/103V. Then use xmltodict to parse the XML into a Python dictionary and extract tomorrow’s forecast maximum temperature. Hint: To match up time layouts and values, compare the layout-key value of the first time-layout section and the time-layout attribute of the temperature element of the parameters element.

12345678910111213import requests
import xmltodict

response = requests.get("https://graphical.weather.gov/xml/SOAP_server/
  ndfdXMLclient.php?whichClient=NDFDgen&lat=41.87&lon=+-87.65&
  product=glance")

parsed_dict = xmltodict.parse(response.text)
layout_key = parsed_dict['dwml']['data']['time-layout'][0]['layout-key']
forecast_temp =
   parsed_dict['dwml']['data']['parameters']['temperature'][0]['value'][0]
print(layout_key)
print(forecast_temp)

copy

TRY THIS: PARSING HTML

Given the file forecast.html (which you can find with the code on this book’s website), write a script using Beautiful Soup that extracts the data and saves it as a CSV file.

123456789101112131415161718192021222324import csv
import bs4

def read_html(filename):
    with open(filename) as html_file:
        html = html_file.read()
        return html


def parse_html(html):
    bs = bs4.BeautifulSoup(html, "html.parser")
    labels = [x.text for x in bs.select(".forecast-label")]
    forecasts = [x.text for x in bs.select(".forecast-text")]

    return list(zip(labels, forecasts))

def write_to_csv(data, outfilename):
    csv.writer(open(outfilename, "w")).writerows(data)

if __name__ == '__main__':
    html = read_html("forecast.html")
    values = parse_html(html)
    write_to_csv(values, "forecast.csv")
    print(values)

copy

LAB 22: TRACK CURIOSITY’S WEATHER

Use the application programming interface (API) described in section 22.2 of chapter 22 to gather a weather history of Curiosity’s stay on Mars for a month. Hint: You can specify Martian days (sols) by adding ?sol=sol_number to the end of the archive query like this:

http://marsweather.ingenology.com/v1/archive/?sol=155

Transform the data so that you can load it into a spreadsheet and graph it. For a version of this project, see the book’s source code.

12345678910111213import json
import csv
import requests

for sol in range(1830, 1863):
    response = requests.get("http://marsweather.ingenology.com/v1/
       archive/?sol={}&format=json".format(sol))
    result = json.loads(response.text)
    if not result['count']:
        continue
    weather = result['results'][0]
    print(weather)
    csv.DictWriter(open("mars_weather.csv", "a"), list(weather.keys())).writerow(weather)

copy

B.19. Chapter 23

TRY THIS: CREATING AND MODIFYING TABLES

Using sqlite3, write the code that creates a database table for the Illinois weather data you loaded from a flat file in section 21.2 of chapter 21. Suppose that you have similar data for more states and want to store more information about the states themselves. How could you modify your database to use a related table to store the state information?

1234567891011121314151617import sqlite3
conn = sqlite3.connect("datafile.db")

cursor = conn.cursor()

cursor.execute("""create table weather (id integer primary key,
  state text, state_code text,
              year_text text, year_code text, avg_max_temp real,
  max_temp_count integer,
              max_temp_low real, max_temp_high real,
              avg_min_temp real, min_temp_count integer,
              min_temp_low real, min_temp_high real,
              heat_index real, heat_index_count integer,
              heat_index_low real, heat_index_high real,
              heat_index_coverage text)
              """)
conn.commit()

copy

You could add a state table and store only each state’s ID field in the weather database.

TRY THIS: USING AN ORM

Using the database from section 22.3, write a SQLAlchemy class to map to the data table and use it to read the records from the table.

12345678910111213141516171819202122232425262728293031from sqlalchemy import create_engine, select, MetaData, Table, Column,
   Integer, String, Float
from sqlalchemy.orm import sessionmaker
dbPath = 'datafile.db'
engine = create_engine('sqlite:///%s' % dbPath)
metadata = MetaData(engine)
weather  = Table('weather', metadata,
                Column('id', Integer, primary_key=True),
                Column("state", String),
                Column("state_code", String),
                Column("year_text", String ),
                Column("year_code", String),
                Column("avg_max_temp", Float),
                Column("max_temp_count", Integer),
                Column("max_temp_low", Float),
                Column("max_temp_high", Float),
                Column("avg_min_temp", Float),
                Column("min_temp_count", Integer),
                Column("min_temp_low", Float),
                Column("min_temp_high", Float),
                Column("heat_index", Float),
                Column("heat_index_count", Integer),
                Column("heat_index_low", Float),
                Column("heat_index_high", Float),
                Column("heat_index_coverage", String)
                )
Session = sessionmaker(bind=engine)
session = Session()
result = session.execute(select([weather]))
for row in result:
    print(row)

TRY THIS: MODIFYING A DATABASE WITH ALEMBIC

Experiment with creating an a\Alembic upgrade that adds a state table to your database, with columns for ID, state name, and abbreviation. Upgrade and downgrade. What other changes would be necessary if you were going to use the state table along with the existing data table?

(no answer)

QUICK CHECK: USES OF KEY:VALUE STORES

What sorts of data and applications would benefit most from a key:value store like Redis?

  • Quick lookup of data

  • Caching

QUICK CHECK: USES OF MONGODB

Thinking back over the various data samples you’ve seen so far and other types of data in your experience, can you come up with any data that you think would be well suited to being stored in a database such as MongoDB? Would others clearly not be suited, and if so, why not?

Data that comes in large and/or more loosely organized chunks is suited to MongoDB, such as the contents of a web page or document.

Data with a specific structure is better suited to relational data. The weather data you’ve seen is a good example.

LAB 23: CREATE A DATABASE

Choose one of the datasets discussed in the past few chapters, and decide which type of database would be best to store that data. Create that database, and write the code to load the data into it. Then choose the two most common and/or likely types of search criteria, and write the code to retrieve both single and multiple matching records.

(no answer)

B.20. Chapter 24

TRY THIS: USING JUPYTER NOTEBOOK

Enter some code in the notebook, and experiment with running it. Check out the Edit, Cell, and Kernel menus to see what options are there. When you have a little code running, use the Kernel menu to restart the kernel, repeat your steps, and then use the cell menu to rerun the code in all of the cells.

(no answer)

TRY THIS: CLEANING DATA WITH AND WITHOUT PANDAS

Experiment with the operations mentioned above. When the final column has been converted to a fraction, can you think of a way to convert it back to a string with the trailing percentage sign?

By contrast, load the same data into a plain Python list by using the csv module, and apply the same changes by using plain Python.

QUICK CHECK: MERGING DATA SETS

How would you go about actually merging to data sets like the above in Python?

If you’re sure that you have exactly the same number of items in each set and that the items are in the right order, you could use the zip() function. Otherwise, you could create a dictionary, with the keys being something common between the two data sets, and then append the date by key from both sets.

QUICK CHECK: SELECTING IN PYTHON

What Python code structure would you use to select only rows that meet certain conditions?

You’d probably use a list comprehension:

1selected = [x for x in old_list if <x meets selection criteria>]

copy

TRY THIS: GROUPING AND AGGREGATING

Experiment with pandas and the data above. Can you get the calls and amounts by both team member and month?

12calls_revenue[['Team member','Month', 'Calls', 'Amount']]
  .groupby(['Team member','Month']).sum())

copy

TRY THIS: PLOTTING

Plot a line graph of the monthly average amount per call.

123456789101112%matplotlib inline
import pandas as pd
import numpy as np

# see text for these
calls = pd.read_csv("sales_calls.csv")
revenue = pd.read_csv("sales_revenue.csv")
calls_revenue = pd.merge(calls, revenue, on=['Territory', 'Month'])
calls_revenue['Call_Amount'] = calls_revenue.Amount/calls_revenue.Calls

# plot
calls_revenue[['Month', 'Call_Amount']].groupby(['Month']).mean().plot()

Last updated