CS 111
Foundations of Computing for Scientific Discovery

Denison

Exercises

Wednesday, April 13

  1. Design a Python class Point that represents a point in two dimensions. Your class must include variables to hold the x and y coordinates of the point, a constructor to initialize the x and y coordinates, two accessor methods to get the values of the x and y coordinates, two mutator methods to set the x and y coordinates, and a method distance that returns the distance between this point and another Point object that is passed in as a parameter.

  2. Write a complete Python program that creates two new Point objects with coordinates (5, 10) and (0, 2), and uses the distance method of Point to compute the distance between them. Assume that the Point class is contained in a separate module called point.py.

    Solutions: point.py | point_test.py

  3. Design a Python class Tree that represents a tree specimen in an arboretum. Your class must include:
    • 3 instance variables that store:
      • the specimen number of the tree
      • the species of the tree
      • a list of (date, height in meters) pairs that record tree height observations
    • a constructor that initializes the specimen number and species to parameter values and the list to be empty
    • 3 accessor methods to get the value of each instance variable
    • a __str__ method that returns a string representation of the tree that contains its species and specimen number
    • a mutator method that adds a new height observation

  4. Given the class definition you wrote above, show how to create a new Tree object named newTree representing a tree of species Acer palmatum with specimen number 508. Also show how to add to the object a new height observation of 2.3 meters taken on 5/8/2009. Assume that the Tree class is contained in a separate module called tree.py.

    Solutions: tree.py | tree_test.py

Thursday, March 24

  1. Write a recursive function
       factorial(n)
    
    that returns the value of n! = 1 * 2 * 3 * ... * n. (This one is in your book.)

  2. Write a recursive function
       sumList(L)
    
    that returns the sum of the items in the list of numbers named L.

  3. Write a recursive function
       minList(L)
    
    that returns the minimum of the items in the list of numbers named L.

  4. Write a recursive function
       reverse(s)
    
    that returns the reverse of the string named s.

  5. Write a recursive function
       palindrome(s)
    
    that returns True if the string named s is a palindrome and False otherwise.

  6. Modify the recursive function
       tree(trunkLength, t)
    
    in your book so that it incorporates the two suggestions in Exercises 9.11 and 9.12.

  7. Write a recursive function
      koch(length, depth, t)
    
    that draws a Koch curve, as described on page 57 in your textbook. The parameters are the length of a line in the curve, the depth of the recursion (i.e., detail in the curve), and a turtle, respectively.


    Once you have this working, write a (non-recursive) function
      kochSnowFlake(length, depth, sides, t)
    
    that calls koch repeatedly to draw a Koch snowflake like this with a specified number of "sides". (This one has six.)

Solutions

Monday, March 7

  1. Write a function
       printOdd(n)
    
    that uses a while loop to print the odd integers from 1 to n.

  2. Write a function
       sum(L)
    
    that uses a while loop to compute and return the sum of the values in a list of numbers named L.

  3. Write a function
       guessingGame(secret)
    
    that uses a while loop to compute and return the number of guesses Python's pseudorandom number generator takes to guess a secret number (called secret) between 1 and 100. In each iteration of the while loop, the function should call random.randrange(1, 101). If the resulting number does not equal secret, the function will increment the number of guesses and guess again. Otherwise, the loop should end and then return the number of guesses.

  4. We previously wrote a function that decided whether a given string is a palindrome. The function looked like this:
       def palindrome(s):
          for i in range(len(s) // 2):
             if s[i] != s[-i-1]:
                return False
          return True
    
    Rewrite this function using a while loop. The function should not contain a return statement (or a break statement) inside the loop.

  5. Write a function
       halflife(amount, dt)
    
    that uses a while loop to derive the half-life of Carbon-14 using the difference equation formulation for decay that we looked at earlier in the semester:
       A(t) = A(t - Δt) - 0.000120968 A(t - Δt) Δt
    
    To do this, the function should repeatedly decay the original amount over increments of length dt until the amount falls below half the original amount. Then return the amount of time required for this to happen. (The answer should be 5,730 years.)

Solutions

Wednesday, March 2

  1. Write a function
       frequencies(L)
    
    that returns a dictionary containing the frequencies of the numbers in a list of numbers L. (This is like Listing 4.8, but just return the dictionary.)

  2. Write a function
       mode(L)
    
    that returns a list containing the mode of a list of numbers L. Unlike the version in your book, this function should call the function you wrote above.

Solutions

Wednesday, February 23

  1. Write a function
       mean(L)
    
    that returns the mean (average) of a list of numbers L.

  2. Write a function
       median(L)
    
    that returns the median number in a list of numbers L. Use the built-in sort method for lists.

  3. Solutions

  4. The Luhn algorithm is the standard algorithm used to validate credit card numbers to protect against accidental errors. Read about the algorithm using the link above and then write a function
       validateLuhn(number)
    
    that returns True if the number if valid and False otherwise. The number parameter will be a list of digits. For example, to determine is the credit card number 4563 9601 2200 1999 is valid, one would call the function with the parameter [4,5,6,3,9,6,0,1,2,2,0,0,1,9,9,9].

  5. Solution

  6. Write a function
       getCodons(dna)
    
    that returns a list containing the codons in dna. (This is similar to an exercise below, but return the codons in a list instead of printing them.)

  7. Write a function
       countSerine(dna)
    
    that returns the number of codons in dna that code for the amino acid serine. As you may know, the genetic code (the mapping between codons and amino acids) contains a good deal of redundancy. All of the following codons code for serine: TCT, TCC, TCA, TCG, AGT, and AGC. Call (not copy) the previous getCodons function in your implementation.

  8. Solutions

Wednesday, October 6

Solutions to exercises 4.1 - 4.16

Monday, February 16

  1. Write a function
       def dnaUpper(dna):
    
    that returns the string dna in all upper case. (Hint: there is a built-in function that does this.)

  2. Write a function
       transcribe(dna)
    
    that returns the mRNA sequence that would result from the transcription of the parameter dna. The transciption process turns A into U, C into G, G into C, and T into A.

  3. Write a function
       def findATG(dna):
    
    that returns the index of the start of the first occurrence of 'ATG' in the string dna. Do not use the built-in index or find methods.

  4. Write a function
       def palindrome(dna):
    
    that returns the value True if dna is a palindrome and False otherwise. A palindrome is a string that reads the same forwards and backwards.

  5. Write a function
       def printCodons(dna):
    
    that prints the codons, starting at the left end, in the string dna. A codon is 3 consecutive bases of DNA. For example, printCodons('GGTACACTGT') would print
       GGT
       ACA
       CTG
    

  6. Most vertebrates have much lower density of CG pairs (called CG dinucleotides) than would be expected by chance. However, they often have relatively high concentrations upstream from genes (coding regions). For this reason, finding these so-called "CpG islands" is often a good way to find putative sites of genes. (The "p" between C and G denotes the phosphodiester bond between the C and G, versus a hydrogen bond across two complementary strands.)

    First, without using the built-in count() function, write a function

      CpG(dna)
    
    that returns the fraction of dinucleotides that are CG. Use slicing in this function (and some of the ones below). Then write a version of the function
      CpG2(dna)
    
    that does the same thing but does use the count() function.

  7. The Hamming distance between two equal length strings is the number of places in which the two strings differ. Hamming distance was originally applied to bit strings sent over networks as a measure of the number of errors that occur. The same idea is now applied to DNA as a simple measure of the degree of similarity between two sequences. Write a function
      hamming(dna1, dna2)
    
    that returns the Hamming distance between two DNA sequences with equal length. If the sequences do not have equal length, the function should return -1. For example hamming('GGAAT', 'AGCAT') should return 2.

    Also, write a function

      search(dna, motif)
    
    that returns the index of the beginning of the substring in dna with the same length as motif that has the smallest Hamming distance with motif. (Biologists use the term motif to refer to any sequence of DNA that may be of special interest.) You may assume that motif is no longer than dna. For example, search('GGCCATTA', 'CGAT') should return the value 2, the index in 'GGCCATTA' where the substring 'CCAT' begins. The Hamming distance between the motif and this substring is 1, the minimum among all substrings of length 4 in dna. Your search function will need to call (not cut and paste) the hamming function for each tested substring in dna.

  8. A microsatellite or simple sequence repeat (SSR) is a repeat of a short sequence of DNA. The repeated sequence is typically 2-4 bases in length and can be repeated 10-100 times or more. For example CACACACACA is a short SSR of CA, a very common repeat in humans. Microsatellites are very common in the DNA of most organisms. They are highly variable within populations because of replication errors resulting from polymerase "slippage." Comparing the distribution of length variants within and between populations can be used to determine genetic relationships and learn about evolutionary processes.

    Write a function

      ssr(dna, repeat)
    
    that returns the length (number of repeats) of the first SSR in dna that repeats the parameter repeat. If repeat is not found, return 0. You should use (call) the search function that you wrote above to find the first instance of repeat in dna.

    For extra credit, write another version of the function that finds the length of the longest SSR in dna that repeats the parameter repeat. You will probably want to look ahead to the material on while loops to accomplish this.

    For even more extra credit, write a third version of the function that finds the longest SSR of any dinucleotide (sequence with length 2) in dna. Your function should return a tuple containing the repeated dinucleotide and the length of the SSR.

Solutions

Wednesday, February 9

Solutions to exercises 3.1 - 3.16

Monday, February 7

Solutions to exercises 2.38 - 2.39

Friday, February 4

Solutions to exercises 2.28, 2.29, 2.31, 2.33 - 2.35

Wednesday, January 26

Solutions to exercises 2.11 - 2.14

In each of these exercies, use variables to make your program more readable and to eliminate the need to compute the same value multiple times.

  1. Every cell in the human body contains about 6 billion base pairs of DNA (3 billion in each set of 23 chromosomes). The distance between each base pair is about 3.4 angstroms (3.4 * 10^-10 meters). Uncoiled and stretched, how long is the DNA in a single human cell? There are about 50 trillion cells in the human body. If you stretched out all of the DNA in the human body end to end, how long would it be? How many round trips to the sun is this? The distance to the sun is about 149,598,000 kilometers. Write a program that computes and then prints the answer to each of these three questions.

    Solution

  2. When you rip songs from a CD, the digital file is created by sampling the sound at some rate. Common rates are 128 kbps (128 * 2^10 bits per second), 160 kbps, and 256 kbps. Write a function that computes the number of 4 minute songs someone can fit on his or her iPod. The function should take two paratmers: the capacity of the iPod in gigabytes (GB) and the sampling rate in kbps. A gigabyte is 2^30 bytes and a byte contains 8 bits.

    Solution

Friday, January 21

Solutions to exercises 1.25 - 1.40

Wednesday/Thursday, January 19-20

Solutions to exercises 1.1 - 1.23

Solutions to exercises 1.17 and 1.19