BLAST Background
BLAST Background
Alignment Search Tools) are a set of sequence comparison algorithms introduced in 1990 that are used to search sequence databases for optimal local alignments to a query.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J. Mol. Biol. 215:403-410. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25:33893402.
3
Finds best local alignments Provides statistical significance www, standalone, and network clients
6
BLAST programs
Progra m
blastp blastn blastx
Description
Compares an amino acid query sequence against a protein sequence database. Compares a nucleotide query sequence against a nucleotide sequence database. Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
7
tblastn
nucleotide only
8
protein only
BLAST Algorithm
Scoring of matches done using
scoring matrices
artifactual matches is done using an estimate of probability that the match might occur by chance.
Where does the score (S) come of each pair-wise The qualityfrom?
Scoring matrices are used to
alignment is represented as a score and the scores are ranked. calculate the score of the alignment base by base (DNA) or amino acid by amino acid (protein). of the scores for each position.
11
Substitution matrices
12
13
BLOSUM vs PAM
BLOSUM 45 PAM 250
More Divergent
Expectation value. The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.
15
Notes on E-values
Low E-values suggest that
both the size of the alignments and the size of the sequence database
Important consideration for comparing results across different searches E-value increases as database gets bigger E-value decreases as alignments get longer
16
ge: ssa e nts me eM om lign eH ur a k TaChapter yo Bioinformatics: Source: ook at 11 ys l Guide to the Analysis of A Practical lwa A Genes and Proteins
For nucleotide based searches, one For protein based searches, one
18
should look for hits with E-values of 10-6 or less and sequence identity of 70% or more should look for hits with E-values of 10-3 or less and sequence identity of 25% or more
BLAST Algorithm
Scoring of matches done using
scoring matrices (default n=3)
Sequences are split into words BLAST algorithm extends the initial
seed hit into an HSP
How Does BLAST The Really Work? the BLAST programs improved
overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments. either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".
20
BLAST Algorithm
21
How Does BLAST The Really Work? the BLAST programs improved
overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments. either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".
22
BLAST Algorithm
23
25
BLAST Algorithm
Scoring of matches done using
scoring matrices (default n=3)
Sequences are split into words BLAST algorithm extends the initial
seed hit into an HSP
Credits
Materials for this presentation have
been adapted from the following sources:
NCBI HelpDesk - Field Guide Course Materials Bioinformatics: A practical guide to the analysis of genes and proteins
Questions?
Please contact: