100% found this document useful (1 vote)
183 views27 pages

BLAST Background

The document provides an overview of the BLAST algorithm and how it works. BLAST (Basic Local Alignment Search Tool) breaks query and database sequences into fragments called words to initially find matches between fragments. These word hits are then extended in both directions to generate high scoring segment pairs (HSPs), which are local optimal alignments. BLAST uses scoring matrices to assign scores to matches and introduces gaps to optimize alignments. It provides statistically significant expectations values (E-values) to discriminate between real matches and those occurring by chance based on alignment scores and database size.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
183 views27 pages

BLAST Background

The document provides an overview of the BLAST algorithm and how it works. BLAST (Basic Local Alignment Search Tool) breaks query and database sequences into fragments called words to initially find matches between fragments. These word hits are then extended in both directions to generate high scoring segment pairs (HSPs), which are local optimal alignments. BLAST uses scoring matrices to assign scores to matches and introduces gaps to optimize alignments. It provides statistically significant expectations values (E-values) to discriminate between real matches and those occurring by chance based on alignment scores and database size.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

BLAST

Finding Function By Sequence Similarity

Concepts of Sequence Similarity Searching


The premise: One sequence by itself is not
informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function.

The BLAST programs (Basic Local

The BLAST algorithm

Alignment Search Tools) are a set of sequence comparison algorithms introduced in 1990 that are used to search sequence databases for optimal local alignments to a query.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J. Mol. Biol. 215:403-410. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25:33893402.
3

What BLAST tells you ...

BLAST reports surprising alignments Assumptions Conclusions


Surprising similarities imply evolutionary homology Evolutionary Homology: descent from a common ancestor Does not always imply similar function
5

Different than chance

Random sequences Constant composition

Basic Local Alignment Search Tool


Widely used similarity search tool Heuristic approach based on Smith
Waterman algorithm

Finds best local alignments Provides statistical significance www, standalone, and network clients
6

BLAST programs
Progra m
blastp blastn blastx

Description
Compares an amino acid query sequence against a protein sequence database. Compares a nucleotide query sequence against a nucleotide sequence database. Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
7

tblastn

more BLAST programsNotes Program


Megablas Contiguous t Discontiguo us Position PSI-BLAST Specific RPS-BLAST
Nearly identical sequences Cross-species comparison Automatically generates a position specific score matrix (PSSM) Searches a database of PSIBLAST PSSMs

nucleotide only
8

protein only

BLAST Algorithm
Scoring of matches done using
scoring matrices

Sequences are split into words (default n=3)

Speed, computational efficiency

BLAST algorithm extends the initial seed hit into an HSP

HSP = high scoring segment pair = Local optimal alignment


9

Sequence Similarity Searching The statistics are important

Discriminating between real and

artifactual matches is done using an estimate of probability that the match might occur by chance.

Well talk more about the meaning of


the scores (S) and e-values (E) that are associated with BLAST hits
10

Where does the score (S) come of each pair-wise The qualityfrom?
Scoring matrices are used to

alignment is represented as a score and the scores are ranked. calculate the score of the alignment base by base (DNA) or amino acid by amino acid (protein). of the scores for each position.
11

The alignment score will be the sum

Substitution matrices

Whats a scoring matrix?

are used for amino acid alignments.

each possible residue

substitution is given a score

A simpler unitary matrix


is used for DNA pairs (+1 for match, -2 mismatch)

12

13

BLOSUM vs PAM
BLOSUM 45 PAM 250
More Divergent

BLOSUM 62 PAM 160

BLOSUM 90 PAM 100


Less Divergent

BLOSUM 62 is the default matrix in


BLAST 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives may be more sensitive with a different matrix.
14

The quality of the alignment is


represented by the Score (S).

What do the Score and the e-value really mean?


The score of an alignment is calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (PAM, BLOSUM) whereas gap scores are assigned empirically .

The significance of each alignment is


computed as an E value (E).

Expectation value. The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.
15

Notes on E-values
Low E-values suggest that

Cant show non-homology

sequences are homologous

Statistical significance depends on


both the size of the alignments and the size of the sequence database
Important consideration for comparing results across different searches E-value increases as database gets bigger E-value decreases as alignments get longer
16

Similarity can be indicative of


homology

Homology: Some Guidelines


significantly similar over entire length they are likely homologous highly similar without being homologous highly similar

Generally, if two sequences are Low complexity regions can be

Homologous sequences not always


17

Suggested BLAST Cutoffs

ge: ssa e nts me eM om lign eH ur a k TaChapter yo Bioinformatics: Source: ook at 11 ys l Guide to the Analysis of A Practical lwa A Genes and Proteins

For nucleotide based searches, one For protein based searches, one
18

should look for hits with E-values of 10-6 or less and sequence identity of 70% or more should look for hits with E-values of 10-3 or less and sequence identity of 25% or more

BLAST Algorithm
Scoring of matches done using
scoring matrices (default n=3)

Sequences are split into words BLAST algorithm extends the initial
seed hit into an HSP

Speed, computational efficiency

HSP = high scoring segment pair = Local optimal alignment


19

How Does BLAST The Really Work? the BLAST programs improved

overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments. either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".
20

Word hits are then extended in

BLAST Algorithm

21

How Does BLAST The Really Work? the BLAST programs improved

overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments. either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".
22

Word hits are then extended in

BLAST Algorithm

23

Extending the High Scoring Segment Pair (HSP)

Minimum Score (S)

Neighborhood Score Threshold (T


24

25

BLAST Algorithm
Scoring of matches done using
scoring matrices (default n=3)

Sequences are split into words BLAST algorithm extends the initial
seed hit into an HSP

Speed, computational efficiency

HSP = high scoring segment pair = Local optimal alignment


26

Credits
Materials for this presentation have
been adapted from the following sources:
NCBI HelpDesk - Field Guide Course Materials Bioinformatics: A practical guide to the analysis of genes and proteins

Questions?

Please contact:

Dr. Joanne Fox Michael Smith Laboratories joanne@msl.ubc.ca


27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy