UVM Theses and Dissertations
Format:
Print
Author:
Zhang, Wei
Dept./Program:
Microbiology and Molecular Genetics
Year:
2004
Degree:
PhD
Abstract:
Unlike the evolution of non-coding regions which is mainly determined by DNA mutation bias, protein evolution results from two very different kinds of processes: processes involving DNA (for example, errant replication, oxidative damage, failure of base excision repair, and lesion bypass by polymerases) and selection based on changes in protein function. Firstly, we estimated DNA mutation rates using primate genomic sequences based on maximum likelihood calculations and a maximum parsimony method. Since it is understood that DNA and amino acid substitution rates are highly sequence context-dependent, we quantified the context-dependence of nucleotide substitution rates for the 96 classes of mutations of the form 5' 3' 5' 3', where , , , and are nucleotides and . Our results confirm that C T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions CpG transversions > non-CpG transversions, captures the qualitative features of the mutation spectrum. We find that, while mutation rates are qualitatively similar among sequences from different genomic regions, there are significant differences between regions of the chromosome. The probability that a particular nucleotide substitution occurs in a protein-coding gene can be expressed as the probability that DNA-level processes result in a properly paired sequence change in a daughter cell (referred to here as a mutation) times the probability that, given such a sequence change, it is observed (referred to here as a substitution). Here we use our estimated DNA mutation rates as described above to estimate the contribution, to protein evolution, of selection based only on changes in protein function. We applied the estimated mutation rates to somatic mutations of a human tumor suppressor gene, TP53, using the IARC/WHO somatic mutation database. We identified sets of TP53 mutations that are expected to exhibit equivalent selection bias (for example, mutation of any of the four amino acids that ligate Zn). According to our hypothesis, the selection bias should be similar for each such mutation. We found that the coefficient of variation of our estimates of selection bias is substantially (two-fold) less than the coefficient of variation prior to removal of DNA mutation bias. Our results validate our method for separating DNA mutation from selection bias in amino acid substitution, which we expect to improve our ability to quantify functional effects of amino acid substitution and thus improve our ability to evaluate the consequences of human genetic variation, understand protein function, and infer protein phylogenies.