Carolyn Holthaus, Zaven A. Karian, and Kenneth P. Klatt*
Abstract: DNASEQ is a simulation that creates a "question/answer/match/response" dialog with students of biology and genetics who wish to learn the Sanger method of DNA sequencing. The simulation, written as a set of Maple procedures, generates a 30 nucleotide strand of DNA followed by a graph that represents a laboratory-produced DNA Sequencing Gel. The interpretation of this graph is the question the simulation poses to the student. After interpreting the sequencing gel, the student answers with a list of letters that represents the sequence of nucleotides of the DNA generated by the simulation. The simulation then matches the student's answer with the correct nucleotide sequence, and responds to the student with error messages or congratulations. Information on implementing the simulation, a sample run of the simulation, and information on obtaining the Maple code are included in the paper.
The technique of DNA sequencing (gene sequencing) is an important tool in the solution of many biological problems. The gene sequencing method employed by most molecular biologists is the Sanger Dideoxynucleotide Method [1,2]. Because the Sanger Method is very time consuming, and requires either radiolabeled or fluorescence tagged compounds, it is not a "practical" method for the undergraduate molecular biology or genetics lab course. Since no one else has developed a computer simulation of the Sanger method that can be used to teach undergraduates, we have developed the following simulation using the superior graphing capabilities of Maple.
DNASEQ is a "question/answer/match/response" dialog with the student. The simulation generates a graph called the DNA Sequencing Gel [the question], an image which is similar to those sequencing gels produced in the laboratory. The student interprets this graph by writing a list of letters that represents the sequence of nucleotides of the DNA generated by the simulation. After the student gives the answer [the list of letters], the simulation matches the student's list with the correct DNA nucleotide sequence and responds with either error messages or congratulations. We hope this simulation might serve as one of many possible models for this type of dialog in Maple.
This method of sequencing employs DNA polymerase, the enzyme that copies the chromosomal DNA in cells during cell growth. To copy the DNA molecules, DNA polymerase "reads" the existing strands, called the templates, and constructs new strands complementary to the templates by connecting the four DNA nucleotides (a combination of deoxyribose with a 5' phosphate and either a purine or a pyrimidine base: A-Adenine, G-Guanine, C-Cytosine, T-Thymine). The nucleotides are connected by the 5' phosphate linked to the 3' hydroxyl group of the next nucleotide. Accordingly, each strand of DNA has 3' hydroxyl and 5' phosphate groups on respective ends.
Furthermore, the 3' end of each of the two DNA strands lies opposite the 5' end of the other strand. A sample DNA molecule is represented below:
3' A-A-T-G-C 5'
5' T-T-A-C-G 3'
The DNA polymerase reads the template strand by starting at the 3' end of the template, and finishes the reading of the template at its 5' end. The polymerase makes the new strand by first establishing the 5' end of the new strand and then adds new nucleotides onto the growing polymer moving toward what will be 3' end of the new strand (the last nucleotide added is the 3' end of the new strand). Furthermore, each nucleotide in the template strand calls for the inclusion of a specific nucleotide in the newly synthesized strand. Therefore, an A in the template specifies a T in the new strand, likewise T specifies A, G specifies C, and C specifies G. So if a template strand of DNA has this sequence of nucleotides, 3' A-A-T-G-C 5', then the DNA polymerase will make the new strand, 5' T-T-A-C-G 3'.
The Sanger Method works by interrupting the action of DNA polymerase by inserting defective nucleotides. This is accomplished with 2,3 dideoxyribonucleotides (no 2' and 3' hydroxyl groups). For example, if the dideoxynucleotide ddA is added to the template DNA, shown above, with the four regular DNA nucleotides and DNA polymerase, then the oligonucleotide products produced will be 5' T-T-ddA 3' and 5' T-T-A-C-G 3'.
In the Sanger Method four reactions are set up; in each reaction the template DNA strand is reacted with DNA polymerase, a mixture of the four DNA nucleotides, and one of the four inhibitor dideoxynucleotides (ddA, ddT, ddC, and ddG). Therefore, in each reaction a family of oligonucleotide products are formed (the 3' end of each oligonucleotide is the dideoxynuelcotide included in the reaction mixture). These mixtures of oligomers are loaded onto the top of a slab of polyacrylamide gel, a direct electric current is applied to the gel slab, and the constituent oligomers are separated by size. The small molecules migrate to the bottom of the gel while larger oligomers migrate to positions near the top. Since the oligomers are radioactively or chemically labeled, the oligonucleotides are seen on the gel as horizontal lines (bands). If all four mixtures of oligomers are separated on the same gel, the observer sees a ladder-like display of horizontal bands. The band that is the fastest migrator contains the dideoxynucleotide at the 5' position of the newly synthesized strand of DNA; therefore this band identifies the first nucleotide of the DNA strand that is complementary to the template strand. The observer can interpret the sequencing gel by laying a straight-edge parallel to the horizontal axis of the gel and at the bottom of the gel, one can then determine which line segment (band) is the next higher band in the display. This next band identifies the next nucleotide in the sequence; and by using this straight-edge technique of locating the next "higher" band, the observer can determine the nucleotide sequence of the entire DNA strand that is complementary to the template.
The Maple simulation contains a routine, GenSeq(), that generates a list of 30 random numbers, each of the numbers is an integer between 0 and 3, created by the Maple function, rand(). Since each integer represents one of the four DNA nucleotides, the list represents a unique 30 nucleotide DNA strand that has been created by DNA polymerase reading the template strand in the presence of dideoxynucleotides. The simulation's graphical output then generates a gel slab showing all of the oligonucleotides - this output is called a SEQUENCING GEL. The student reads the gel, chooses a name for the guess of the nucleotide sequence, and inputs the sequence into the computer as a list. The simulation then "checks" the student's sequence guess.
This simulation is written as a series of Maple procedures. To start the simulation, the student calls up the procedure GenPlot(). In turn GenPlot() calls up GenSeq() which randomly generates the list of 30 integers, the DNA sequence. Then GenPlot() creates 30 different horizontal lines and thus generates the sequencing gel. Unfortunately, this produces the same gel on successive executions. To prevent this, the seed for rand() is read in from a file whose content is modified (by GenPlot()) during each execution, producing different sequences on future attempts. Since the _seed number is in a text file called seed.txt, if you wish to implement this simulation, you need to create a seed.txt text file to be used by this program. All four programs (GenSeq, GenPlot, ConvertToNum, and CheckGuess), as well as the file seed.txt should be located on a default path of the computer being used. If this is not the case, path specifications should be given in the programs (on the lines with readdata, writedata and close all in the GenPlot procedure), and the read command in the illustration that follows should be modified to specify that path. The Maple code for all of the procedures needed for the simulation is given in the Appendix. The code and the file seed.txt can be obtained through the internet by connecting to http://www.denison.edu/fipse/bio/klatt.html and downloading the files.
After reading in the file containing the Maple procedures which create the Sequencing Gel, the student then reads the gel. In the case illustrated below, the guess of the nucleotide sequence was called "Myguess". The first attempt of Myguess contained an incorrect designation of a nucleotide; the second guess had the correct number of nucleotides, but this guess contained a T at position 8 instead of the correct G nucleotide. Finally, the student inputs the correct sequence in a third attempt of Myguess.
> read dnaseq;
> GenPlot()
> Myguess:=[t,g,p,g,c,t,g,t,c,c,c,c,c,a,c,g,a,t,a,g,c,c,c,t,t,a,g,t,g,t]
Myguess := [t,g p,g,c,t,g,t,c,c,c,c,c,a,c,g,a,t,a,g,c,c,c,t,t,a,g,t,g,t]
> Check(Myguess);
Your guess has the correct number of entries.
Error, (in ConvertToNum) Your responses should consist of the characters A, G, C, and T
> Myguess:=[t,g,a,g,c,t,g,t,c,c,c,c,c,a,c,g,a,t,a,g,c,c,c,t,t,a,g,t,g,t];
Myguess:=[t,g,a,g,c,t,g,t,c,c,c,c,c,a,c,g,a,t,a,g,c,c,c,t,t,a,g,t,g,t]
> Check(Myguess);
Your guess has the correct number of entries.
Error, (in CheckGuess) The number, 8, entry in your guess is incorrect. Please change it and try again.
> Myguess:=[t,g,a,g,c,t,g,g,c,c,c,c,c,a,c,g,a,t,a,g,c,c,c,t,t,a,g,t,g,t];
Myguess:=[t,g,a,g,c,t,g,g,c,c,c,c,c,a,c,g,a,t,a,g,c,c,c,t,t,a,g,t,g,t];
> Check(Myguess);
Your guess has the correct number of entries.
You've got it!
We have used this simulation at Denison University in our Introductory Cellular and Molecular Biology course (a freshman/sophomore level course). We find that in a single two hour class we can introduce the principles of the Sanger Method, acclimate the students to the use of this Maple procedure-based simulation, and have the student read three DNASEQ-generated nucleotide sequences.
We have found that students in this course not only become rapidly proficient in the interpretation of Sequencing Gels, they also come away from this lesson with a good knowledge of the mechanism of DNA polymerase. About seven weeks after the implementation of this simulation, we gave the course final exam. The students performed significantly better on questions about DNA polymerase than on questions about any other topic presented in the course. Based upon these results we think the use of this simulation is an important pedagogical tool for use in the type of college-level molecular biology and genetics courses that are offered at most North American Universities.
This work was supported by The Fund for Improvement of Post-secondary Education (grant# P116B30079) and a grant from the W. M. Keck Foundation. The authors wish to thank the Editor of MapleTech for several suggestions that improved the original manuscript.
[I] F. Sanger, S. Nicklen, and A. R. Coulen: DNA sequencing with chain-termination inhibitors, Proc. Nat. Acad. Sc. 74, pp. 5463-5467, (1977).
[2] F. Sanger: Determination of nucleotide sequences in DNA, Science, 214,pp. 1205-1210, (1981).
Carolyn Holthaus is a 1997, cum laude, graduate of Denison University in Computer Science. She currently attends the University of Kentucky where she will receive a Master of Science in Library Science in May, 1998.
Zaven A. Karian has taught at Denison University since 1964, where he currently holds the Benjamin Barney Chair of Mathematics. In addition to a number of articles, he has published (with Eliot Tanis), Probability and Statistics Explorations with Maple, and edited a volume for the Mathematical Association of America on symbolic computation, (Symbolic Computation in Undergraduate Mathematics Instruction, MAA Notes No. 24, 1991).
Kenneth P. Klatt is Professor of Biology at Denison where he has taught since 1969. His interests are in cellular and developmental biology, with current interests in using Maple for teaching cellular biology.
*Departments of Mathematics and Computer Science, and Biology, Denison University, Granville, OH, 43023. Email klatt@denison.edu