dndsml

Estimate synonymous and nonsynonymous substitution rates using maximum likelihood method

Syntax

[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2)

[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2, ...'GeneticCode', GeneticCodeValue, ...)
[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2, ...'Verbose', VerboseValue, ...)

Input Arguments

SeqNT1, SeqNT2Nucleotide sequences. Enter either a string or a structure with the field Sequence.
GeneticCodeValueProperty to specify a genetic code. Enter a Code Number or a string with a Code Name from the table Genetic Code. If you use a Code Name, you can truncate it to the first two characters. Default is 1 or Standard.
VerboseValueProperty to control the display of the codons considered in the computations and their amino acid translations. Choices are true or false (default).

    Tip   Specify true to use this display to manually verify the codon alignment of the two input sequences. The presence of stop codons (*) in the amino acid translation can indicate that SeqNT1 and SeqNT2 are not codon-aligned.

Output Arguments

Dn Nonsynonymous substitution rate(s).
DsSynonymous substitution rate(s).
LikeLikelihood of estimate of substitution rates.

Description

[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2) estimates the synonymous and nonsynonymous substitution rates between the two homologous sequences, SeqNT1 and SeqNT2, using the Goldman-Yang method (1994). This maximum likelihood method estimates an explicit model for codon substitution that accounts for transition/transversion rate bias and base/codon frequency bias. Then it uses the model to correct synonymous and nonsynonymous counts to account for multiple substitutions at the same site. The maximum likelihood method is best suited when the sample size is significant (larger than 100 bases) and when the sequences being compared can have transition/transversion rate biases and base/codon frequency biases.

dndsml returns:

  • Dn — Nonsynonymous substitution rate(s).

  • Ds — Synonymous substitution rate(s).

  • Like — Likelihood of this estimate.

This analysis:

  • Assumes that the nucleotide sequences, SeqNT1 and SeqNT2, are codon-aligned, that is, do not have frame shifts.

  • Excludes any ambiguous nucleotide characters or codons that include gaps.

  • Considers the number of codons in the shorter of the two nucleotide sequences.

    Caution   If SeqNT1 and SeqNT2 are too short or too divergent, saturation can be reached, and dndsml returns NaNs and a warning message.

[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2, ...'PropertyName', PropertyValue, ...) calls dnds with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2, ...'GeneticCode', GeneticCodeValue, ...)
calculates synonymous and nonsynonymous substitution rates using the specified genetic code. Enter a Code Number or a string with a Code Name from the table Genetic Code. If you use a Code Name, you can truncate it to the first two characters. Default is 1 or Standard.

[Dn, Ds, Like] = dndsml(SeqNT1, SeqNT2, ...'Verbose', VerboseValue, ...) controls the display of the codons considered in the computations and their amino acid translations. Choices are true or false (default).

    Tip   Specify true to use this display to manually verify the codon alignment of the two input sequences, SeqNT1 and SeqNT2. The presence of stop codons (*) in the amino acid translation can indicate that SeqNT1 and SeqNT2 are not codon-aligned.

Examples

expand all

Estimate synonymous and nonsynonymous substitution rates between two nucleotide sequences using maximum likelihood method

This example shows how to estimate synonymous and nonsynonymous substitution rates between two nucleotide sequences that are not codon-aligned using maximum likelihood method.

This example uses two nucleotide sequences representing the human HEXA gene (accession number: NM_000520) and mouse HEXA gene (accession number: AK080777).

If you have live internet connection, you can use getgenbank function to retrieve the sequence information from the NCBI data repository and load the data into MATLAB®.

humanHEXA = getgenbank('NM_000520');
mouseHEXA = getgenbank('AK080777');

For your convenience, MATLAB provides these two sequences in the following mat file. Note that data in public databases are frequently updated and curated, and the results in this example may slightly differ if you use the latest data.

load hexosaminidase.mat

Extract the coding regions from the two nucleotide sequences.

humanHEXA_cds = featuresparse(humanHEXA,'feature','CDS','Sequence',true);
mouseHEXA_cds = featuresparse(mouseHEXA,'feature','CDS','Sequence',true);

Align the amino acid sequences converted from the nucleotide sequences.

[sc,al] = nwalign(nt2aa(humanHEXA_cds),nt2aa(mouseHEXA_cds),'extendgap',1);

Use the seqinsertgaps function to copy the gaps from the aligned amino acid sequences to their corresponding nucleotide sequences, thus codon-aligning them.

humanHEXA_aligned = seqinsertgaps(humanHEXA_cds,al(1,:))
mouseHEXA_aligned = seqinsertgaps(mouseHEXA_cds,al(3,:))
humanHEXA_aligned =

atgacaagctccaggctttggttttcgctgctgctggcggcagcgttcgcaggacgggcgacggccctctggccctggcctcagaacttccaaacctccgaccagcgctacgtcctttacccgaacaactttcaattccagtacgatgtcagctcggccgcgcagcccggctgctcagtcctcgacgaggccttccagcgctatcgtgacctgcttttcggttccgggtcttggccccgtccttacctcacagggaaacggcatacactggagaagaatgtgttggttgtctctgtagtcacacctggatgtaaccagcttcctactttggagtcagtggagaattataccctgaccataaatgatgaccagtgtttactcctctctgagactgtctggggagctctccgaggtctggagacttttagccagcttgtttggaaatctgctgagggcacattctttatcaacaagactgagattgaggactttccccgctttcctcaccggggcttgctgttggatacatctcgccattacctgccactctctagcatcctggacactctggatgtcatggcgtacaataaattgaacgtgttccactggcatctggtagatgatccttccttcccatatgagagcttcacttttccagagctcatgagaaaggggtcctacaaccctgtcacccacatctacacagcacaggatgtgaaggaggtcattgaatacgcacggctccggggtatccgtgtgcttgcagagtttgacactcctggccacactttgtcctggggaccaggtatccctggattactgactccttgctactctgggtctgagccctctggcacctttggaccagtgaatcccagtctcaataatacctatgagttcatgagcacattcttcttagaagtcagctctgtcttcccagatttttatcttcatcttggaggagatgaggttgatttcacctgctggaagtccaacccagagatccaggactttatgaggaagaaaggcttcggtgaggacttcaagcagctggagtccttctacatccagacgctgctggacatcgtctcttcttatggcaagggctatgtggtgtggcaggaggtgtttgataataaagtaaagattcagccagacacaatcatacaggtgtggcgagaggatattccagtgaactatatgaaggagctggaactggtcaccaaggccggcttccgggcccttctctctgccccctggtacctgaaccgtatatcctatggccctgactggaaggatttctacatagtggaacccctggcatttgaaggtacccctgagcagaaggctctggtgattggtggagaggcttgtatgtggggagaatatgtggacaacacaaacctggtccccaggctctggcccagagcaggggctgttgccgaaaggctgtggagcaacaagttgacatctgacctgacatttgcctatgaacgtttgtcacacttccgctgtgaattgctgaggcgaggtgtccaggcccaacccctcaatgtaggcttctgtgagcaggagtttgaacagacctga


mouseHEXA_aligned =

atggccggctgcaggctctgggtttcgctgctgctggcggcggcgttggcttgcttggccacggcactgtggccgtggccccagtacatccaaacctaccaccggcgctacaccctgtaccccaacaacttccagttccggtaccatgtcagttcggccgcgcaggcgggctgcgtcgtcctcgacgaggcctttcgacgctaccgtaacctgctcttcggttccggctcttggccccgacccagcttctcaaataaacagcaaacgttggggaagaacattctggtggtctccgtcgtcacagctgaatgtaatgaatttcctaatttggagtcggtagaaaattacaccctaaccattaatgatgaccagtgtttactcgcctctgagactgtctggggcgctctccgaggtctggagactttcagtcagcttgtttggaaatcagctgagggcacgttctttatcaacaagacaaagattaaagactttcctcgattccctcaccggggcgtactgctggatacatctcgccattacctgccattgtctagcatcctggatacactggatgtcatggcatacaataaattcaacgtgttccactggcacttggtggacgactcttccttcccatatgagagcttcactttcccagagctcaccagaaaggggtccttcaaccctgtcactcacatctacacagcacaggatgtgaaggaggtcattgaatacgcaaggcttcggggtatccgtgtgctggcagaatttgacactcctggccacactttgtcctgggggccaggtgcccctgggttattaacaccttgctactctgggtctcatctctctggcacatttggaccggtgaaccccagtctcaacagcacctatgacttcatgagcacactcttcctggagatcagctcagtcttcccggacttttatctccacctgggaggggatgaagtcgacttcacctgctggaagtccaaccccaacatccaggccttcatgaagaaaaagggcttt---actgacttcaagcagctggagtccttctacatccagacgctgctggacatcgtctctgattatgacaagggctatgtggtgtggcaggaggtatttgataataaagtgaaggttcggccagatacaatcatacaggtgtggcgggaagaaatgccagtagagtacatgttggagatgcaagatatcaccagggctggcttccgggccctgctgtctgctccctggtacctgaaccgtgtaaagtatggccctgactggaaggacatgtacaaagtggagcccctggcgtttcatggtacgcctgaacagaaggctctggtcattggaggggaggcctgtatgtggggagagtatgtggacagcaccaacctggtccccagactctggcccagagcgggtgccgtcgctgagagactgtggagcagtaacctgacaactaatatagactttgcctttaaacgtttgtcgcatttccgttgtgagctggtgaggagaggaatccaggcccagcccatcagtgtaggctgctgtgagcaggagtttgagcagacttga

Estimate the synonymous and nonsynonymous substitutions rates of the codon-aligned nucleotide sequences and also display the codons considered in the computations and their amino acid translations.

[nonsynSubRate,synSubRate] = dndsml(humanHEXA_aligned,mouseHEXA_aligned,'verbose',true)
DNDSML: 
Codons considered in the computations:
ATGACAAGCTCCAGGCTTTGGTTTTCGCTGCTGCTGGCGGCAGCGTTCGCAGGACGGGCGACGGCCCTCTGGCCCTGGCCTCAGAACTTCCAAACCTCCGACCAGCGCTACGTCCTTTACCCGAACAACTTTCAATTCCAGTACGATGTCAGCTCGGCCGCGCAGCCCGGCTGCTCAGTCCTCGACGAGGCCTTCCAGCGCTATCGTGACCTGCTTTTCGGTTCCGGGTCTTGGCCCCGTCCTTACCTCACAGGGAAACGGCATACACTGGAGAAGAATGTGTTGGTTGTCTCTGTAGTCACACCTGGATGTAACCAGCTTCCTACTTTGGAGTCAGTGGAGAATTATACCCTGACCATAAATGATGACCAGTGTTTACTCCTCTCTGAGACTGTCTGGGGAGCTCTCCGAGGTCTGGAGACTTTTAGCCAGCTTGTTTGGAAATCTGCTGAGGGCACATTCTTTATCAACAAGACTGAGATTGAGGACTTTCCCCGCTTTCCTCACCGGGGCTTGCTGTTGGATACATCTCGCCATTACCTGCCACTCTCTAGCATCCTGGACACTCTGGATGTCATGGCGTACAATAAATTGAACGTGTTCCACTGGCATCTGGTAGATGATCCTTCCTTCCCATATGAGAGCTTCACTTTTCCAGAGCTCATGAGAAAGGGGTCCTACAACCCTGTCACCCACATCTACACAGCACAGGATGTGAAGGAGGTCATTGAATACGCACGGCTCCGGGGTATCCGTGTGCTTGCAGAGTTTGACACTCCTGGCCACACTTTGTCCTGGGGACCAGGTATCCCTGGATTACTGACTCCTTGCTACTCTGGGTCTGAGCCCTCTGGCACCTTTGGACCAGTGAATCCCAGTCTCAATAATACCTATGAGTTCATGAGCACATTCTTCTTAGAAGTCAGCTCTGTCTTCCCAGATTTTTATCTTCATCTTGGAGGAGATGAGGTTGATTTCACCTGCTGGAAGTCCAACCCAGAGATCCAGGACTTTATGAGGAAGAAAGGCTTCGAGGACTTCAAGCAGCTGGAGTCCTTCTACATCCAGACGCTGCTGGACATCGTCTCTTCTTATGGCAAGGGCTATGTGGTGTGGCAGGAGGTGTTTGATAATAAAGTAAAGATTCAGCCAGACACAATCATACAGGTGTGGCGAGAGGATATTCCAGTGAACTATATGAAGGAGCTGGAACTGGTCACCAAGGCCGGCTTCCGGGCCCTTCTCTCTGCCCCCTGGTACCTGAACCGTATATCCTATGGCCCTGACTGGAAGGATTTCTACATAGTGGAACCCCTGGCATTTGAAGGTACCCCTGAGCAGAAGGCTCTGGTGATTGGTGGAGAGGCTTGTATGTGGGGAGAATATGTGGACAACACAAACCTGGTCCCCAGGCTCTGGCCCAGAGCAGGGGCTGTTGCCGAAAGGCTGTGGAGCAACAAGTTGACATCTGACCTGACATTTGCCTATGAACGTTTGTCACACTTCCGCTGTGAATTGCTGAGGCGAGGTGTCCAGGCCCAACCCCTCAATGTAGGCTTCTGTGAGCAGGAGTTTGAACAGACC
ATGGCCGGCTGCAGGCTCTGGGTTTCGCTGCTGCTGGCGGCGGCGTTGGCTTGCTTGGCCACGGCACTGTGGCCGTGGCCCCAGTACATCCAAACCTACCACCGGCGCTACACCCTGTACCCCAACAACTTCCAGTTCCGGTACCATGTCAGTTCGGCCGCGCAGGCGGGCTGCGTCGTCCTCGACGAGGCCTTTCGACGCTACCGTAACCTGCTCTTCGGTTCCGGCTCTTGGCCCCGACCCAGCTTCTCAAATAAACAGCAAACGTTGGGGAAGAACATTCTGGTGGTCTCCGTCGTCACAGCTGAATGTAATGAATTTCCTAATTTGGAGTCGGTAGAAAATTACACCCTAACCATTAATGATGACCAGTGTTTACTCGCCTCTGAGACTGTCTGGGGCGCTCTCCGAGGTCTGGAGACTTTCAGTCAGCTTGTTTGGAAATCAGCTGAGGGCACGTTCTTTATCAACAAGACAAAGATTAAAGACTTTCCTCGATTCCCTCACCGGGGCGTACTGCTGGATACATCTCGCCATTACCTGCCATTGTCTAGCATCCTGGATACACTGGATGTCATGGCATACAATAAATTCAACGTGTTCCACTGGCACTTGGTGGACGACTCTTCCTTCCCATATGAGAGCTTCACTTTCCCAGAGCTCACCAGAAAGGGGTCCTTCAACCCTGTCACTCACATCTACACAGCACAGGATGTGAAGGAGGTCATTGAATACGCAAGGCTTCGGGGTATCCGTGTGCTGGCAGAATTTGACACTCCTGGCCACACTTTGTCCTGGGGGCCAGGTGCCCCTGGGTTATTAACACCTTGCTACTCTGGGTCTCATCTCTCTGGCACATTTGGACCGGTGAACCCCAGTCTCAACAGCACCTATGACTTCATGAGCACACTCTTCCTGGAGATCAGCTCAGTCTTCCCGGACTTTTATCTCCACCTGGGAGGGGATGAAGTCGACTTCACCTGCTGGAAGTCCAACCCCAACATCCAGGCCTTCATGAAGAAAAAGGGCTTTACTGACTTCAAGCAGCTGGAGTCCTTCTACATCCAGACGCTGCTGGACATCGTCTCTGATTATGACAAGGGCTATGTGGTGTGGCAGGAGGTATTTGATAATAAAGTGAAGGTTCGGCCAGATACAATCATACAGGTGTGGCGGGAAGAAATGCCAGTAGAGTACATGTTGGAGATGCAAGATATCACCAGGGCTGGCTTCCGGGCCCTGCTGTCTGCTCCCTGGTACCTGAACCGTGTAAAGTATGGCCCTGACTGGAAGGACATGTACAAAGTGGAGCCCCTGGCGTTTCATGGTACGCCTGAACAGAAGGCTCTGGTCATTGGAGGGGAGGCCTGTATGTGGGGAGAGTATGTGGACAGCACCAACCTGGTCCCCAGACTCTGGCCCAGAGCGGGTGCCGTCGCTGAGAGACTGTGGAGCAGTAACCTGACAACTAATATAGACTTTGCCTTTAAACGTTTGTCGCATTTCCGTTGTGAGCTGGTGAGGAGAGGAATCCAGGCCCAGCCCATCAGTGTAGGCTGCTGTGAGCAGGAGTTTGAGCAGACT
Translations:
M  T  S  S  R  L  W  F  S  L  L  L  A  A  A  F  A  G  R  A  T  A  L  W  P  W  P  Q  N  F  Q  T  S  D  Q  R  Y  V  L  Y  P  N  N  F  Q  F  Q  Y  D  V  S  S  A  A  Q  P  G  C  S  V  L  D  E  A  F  Q  R  Y  R  D  L  L  F  G  S  G  S  W  P  R  P  Y  L  T  G  K  R  H  T  L  E  K  N  V  L  V  V  S  V  V  T  P  G  C  N  Q  L  P  T  L  E  S  V  E  N  Y  T  L  T  I  N  D  D  Q  C  L  L  L  S  E  T  V  W  G  A  L  R  G  L  E  T  F  S  Q  L  V  W  K  S  A  E  G  T  F  F  I  N  K  T  E  I  E  D  F  P  R  F  P  H  R  G  L  L  L  D  T  S  R  H  Y  L  P  L  S  S  I  L  D  T  L  D  V  M  A  Y  N  K  L  N  V  F  H  W  H  L  V  D  D  P  S  F  P  Y  E  S  F  T  F  P  E  L  M  R  K  G  S  Y  N  P  V  T  H  I  Y  T  A  Q  D  V  K  E  V  I  E  Y  A  R  L  R  G  I  R  V  L  A  E  F  D  T  P  G  H  T  L  S  W  G  P  G  I  P  G  L  L  T  P  C  Y  S  G  S  E  P  S  G  T  F  G  P  V  N  P  S  L  N  N  T  Y  E  F  M  S  T  F  F  L  E  V  S  S  V  F  P  D  F  Y  L  H  L  G  G  D  E  V  D  F  T  C  W  K  S  N  P  E  I  Q  D  F  M  R  K  K  G  F  E  D  F  K  Q  L  E  S  F  Y  I  Q  T  L  L  D  I  V  S  S  Y  G  K  G  Y  V  V  W  Q  E  V  F  D  N  K  V  K  I  Q  P  D  T  I  I  Q  V  W  R  E  D  I  P  V  N  Y  M  K  E  L  E  L  V  T  K  A  G  F  R  A  L  L  S  A  P  W  Y  L  N  R  I  S  Y  G  P  D  W  K  D  F  Y  I  V  E  P  L  A  F  E  G  T  P  E  Q  K  A  L  V  I  G  G  E  A  C  M  W  G  E  Y  V  D  N  T  N  L  V  P  R  L  W  P  R  A  G  A  V  A  E  R  L  W  S  N  K  L  T  S  D  L  T  F  A  Y  E  R  L  S  H  F  R  C  E  L  L  R  R  G  V  Q  A  Q  P  L  N  V  G  F  C  E  Q  E  F  E  Q  T  
M  A  G  C  R  L  W  V  S  L  L  L  A  A  A  L  A  C  L  A  T  A  L  W  P  W  P  Q  Y  I  Q  T  Y  H  R  R  Y  T  L  Y  P  N  N  F  Q  F  R  Y  H  V  S  S  A  A  Q  A  G  C  V  V  L  D  E  A  F  R  R  Y  R  N  L  L  F  G  S  G  S  W  P  R  P  S  F  S  N  K  Q  Q  T  L  G  K  N  I  L  V  V  S  V  V  T  A  E  C  N  E  F  P  N  L  E  S  V  E  N  Y  T  L  T  I  N  D  D  Q  C  L  L  A  S  E  T  V  W  G  A  L  R  G  L  E  T  F  S  Q  L  V  W  K  S  A  E  G  T  F  F  I  N  K  T  K  I  K  D  F  P  R  F  P  H  R  G  V  L  L  D  T  S  R  H  Y  L  P  L  S  S  I  L  D  T  L  D  V  M  A  Y  N  K  F  N  V  F  H  W  H  L  V  D  D  S  S  F  P  Y  E  S  F  T  F  P  E  L  T  R  K  G  S  F  N  P  V  T  H  I  Y  T  A  Q  D  V  K  E  V  I  E  Y  A  R  L  R  G  I  R  V  L  A  E  F  D  T  P  G  H  T  L  S  W  G  P  G  A  P  G  L  L  T  P  C  Y  S  G  S  H  L  S  G  T  F  G  P  V  N  P  S  L  N  S  T  Y  D  F  M  S  T  L  F  L  E  I  S  S  V  F  P  D  F  Y  L  H  L  G  G  D  E  V  D  F  T  C  W  K  S  N  P  N  I  Q  A  F  M  K  K  K  G  F  T  D  F  K  Q  L  E  S  F  Y  I  Q  T  L  L  D  I  V  S  D  Y  D  K  G  Y  V  V  W  Q  E  V  F  D  N  K  V  K  V  R  P  D  T  I  I  Q  V  W  R  E  E  M  P  V  E  Y  M  L  E  M  Q  D  I  T  R  A  G  F  R  A  L  L  S  A  P  W  Y  L  N  R  V  K  Y  G  P  D  W  K  D  M  Y  K  V  E  P  L  A  F  H  G  T  P  E  Q  K  A  L  V  I  G  G  E  A  C  M  W  G  E  Y  V  D  S  T  N  L  V  P  R  L  W  P  R  A  G  A  V  A  E  R  L  W  S  S  N  L  T  T  N  I  D  F  A  F  K  R  L  S  H  F  R  C  E  L  V  R  R  G  I  Q  A  Q  P  I  S  V  G  C  C  E  Q  E  F  E  Q  T  

Initial estimates: Kappa=3.301203, dn=0.093274, ds=0.518095, t=0.353716
ML estimates: Kappa=2.498253, omega(dn/ds)=0.185577, t=0.602465

nonsynSubRate =

    0.0943


synSubRate =

    0.5080

References

[1] Tamura, K., and Mei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512–526.

[2] Yang, Z., and Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution 17, 32–43.

[3] Goldman, N., and Yang, Z. (1994). A Codon-based Model of Nucleotide Substitution for Protein-coding DNA Sequences. Mol. Biol. Evol. 11(5), 725–736.

Was this topic helpful?