#FFFFFF 井上潤:アミノ酸置換行列
Note on amino acid R matrix

2007 年 11 月 3 日 改訂
井上 潤

アミノ酸置換行列に関する個人的なメモです.

総説
Lio, P., and N. Goldman. 2006. Models of Molecular Evolution and Phylogeny. Genome Research 8: 1233-1244.

Yang, Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford.

Dimmic, M. W. 2005. Markov models of protein sequence evolution. Pages 259-287 in Statistical Methods in Molecular Evolution (R. Nielsen, ed.) Springer, New York.


ミトコンドリア遺伝子モデル
mtREV24
Adachi, J., and M. Hasegawa. 1996. Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 42: 459-468.

Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Computer Science Monographs of Institute of Statistical Mathematics 28: 1-150.

12 protein-coding genes in the mitochondrial genomes of 24 vertebrate species.



MTMAM,
Cao, Y., A. Janke, P. J. Waddell, M. Westerman, O. Takenaka, S. Murata, N. Okada, S. Paabo, and M. Hasegawa. 1998. Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders. J. Mol. Evol. 47: 307-322.

Yang, Z. H., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol 15: 1600-1611.

12 protein-coding genes in the mitochondrial genomes of 20 mammalian species.


核遺伝子モデル

DAYHOFF

Dayhoff, M., R. Schwartz, and O. BC. 1978. A model of evolutionary change in proteins. Pages 345-352 in Atlas of protein sequence and structure. Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C.

Dayhoff's matrix represents data from 1572 counted substitutions (Dimmic, 2005).
Generally globular proteins in an aqueous solvent.
Too crude for phylogenetic analysis in general (Yang, 2006).




JTT
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8: 275-282.

Jones et al. (1992) updated the matrix parameters using 59,190 substitutions from 16,130 protein sequences (Dimmic, 2005).
Generally globular proteins in an aqueous solvent.




tmREV
D. T. Jones, W. R. Taylor, and J. M. Thornton. A mutation data matrix
for transmembrane proteins. FEBS Lett, 339(3):269. V275, Feb 1994.

A model for transmembrane proteins.




WAG
Whelan, S., and N. Goldman. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18: 691-699.

To create this matrix, 3905 protein sequences were divided into 182 protein families. A neighbor-joining tree was inferred for each family, and then the combined likelihood was maximized by adjusting the values of the R-matrix. Using the likelihood ratio test, the increase in likelihood over the former models was found to be statistically significant for all families in the analyzed dataset. In fact, the increase in likelihood from the JTT matrix to the WAG matrix is even greater than the increase from Dayhoff to JTT, despite the fact that WAG was optimized using fewer protein sequences than JTT, an indication of the power of the ML estimation method. Because ML estimation can be computationally expensive, approximate methods have been developed as a compromise between accuracy and speed.

Dimmic (2005)



RTREV
Dimmic, M. W., J. S. Rest, D. P. Mindell, and R. A. Goldstein. 2002.
rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny.
J Mol Evol 55: 65-73.

A model for retroviral polymerase proteins.



VT
Muller T, Vingron M
Modeling amino acid replacement
JOURNAL OF COMPUTATIONAL BIOLOGY 7 (6): 761-776 2000
website

Muller T, Spang R, Vingron M
Estimating amino acid substitution models: A comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method 
MOLECULAR BIOLOGY AND EVOLUTION 19 (1): 8-13 JAN 2002



BLOSUM62
Henikoff, and Henikoff. 1992. Amino acid substitution matrices from protein blocksin Proc. Natl. Acad. Sci.

Heinkoff and Heinkoff (1992) used local, ungapped alignments of distantly related sequences to derive the BLOSUM series of matrices. Matrices of this series are identified by a number after the matrix (e.g., BLOSUM50), which refers to the minimum percentage identity of the blocks of multiple aligned amino acids used to construct the matrix. It is noteworthy that these matrices are directly calculated without extrapolations, and are analogous to transition probability matrices P(T) for different values of T, estimated without reference to any rate matrix Q. The BLOSUM matrices often perform better than PAM matrices for local similarity searches but have not been widely used in phylogenetics (Lio and Goldman, 2006).

Too crude for phylogenetic analysis in general (Yang, 2006).



DCMUT
Kosiol, C., and N. Goldman. 2005. Different versions of the Dayhoff rate matrix. Mol Biol Evol 22: 193-199.



葉緑体遺伝子モデル
CPREV
Adachi, J., P. Waddell, W. Martin, and M. Hasegawa. 2000. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA.
Journal of Molecular Evolution 50:348-358.