Strings 2008 2 4

From MDWiki
Jump to navigationJump to search

Visual comparison of strings

Humans are very good at identifying patterns visually. One of the first ways to compare biological sequences was to generate identity matrices and visualise them in a so called dot plot.

Computing of a dot plot

The similarity between two sequences are represented by comparison matrix. Each element in the matrix compares the character in one sequence with the character of the other sequence. Identical character pairings are visualised by a black pixel while different characters are coloured white.

The table below illustrates an example of a dot plot comparing DNA fragments of hemoglobin from a person with sickle-cell disease and a healthy person. (This example can be generated interactively in class. It also is a nice example to show the effects of different mutations on the transcriptional outcome.)


Multiplication table
× C T G A C T C C T G A G G A G A A G T C T G C C
C X X X X X X X
T X X X X X X
G X X X X X X X
A X X X X
C X X X X X X X
T X X X X X X
C X X X X X X X
C X X X X X X X
T X X X X X X
G X X X X X X X
T X X X X
G X X X X X X X
G X X X X X X X
A X X X X
G X X X X X X X
A X X X X
A X X X X
G X X X X X X X
T X X X X X X
C X X X X X X X
T X X X X X X
G X X X X X X X
C X X X X X X X
C X X X X X X X


Discovery questions:
  • Suppose that two sequences are identical except that a segment is inverted in one sequence, relative to the other sequence. Explain how such an inversion would look like in a dot plot.
  • What would be the value of using a dot plot to compare a sequence to a second sequence, as well as the reverse compliment of the second sequence?
  • Sketch a dot plot of two sequences which are identical except that a segment is deleted from the middle of one sequence and not the other.


Further (very advanced) reading for the Mathematics and Physics inclined

Recurrence plot of phase space trajectories



goto Similarity of strings


--ThomasHuber 14:02, 10 January 2008 (EST)