04B. Bioinformatics-Lecture 4 (Alternative) - Blast
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
Ly Le, PhD
School of Biotechnology
Email: ly.le@hcmiu.edu.vn
Office: Rm705, HCM International University
BLAST and SIMILARITY
SEARCH
Why is similarity important
Lecture 3.1 9
BLAST TYPES
• BLAST at NCBI
http://www.ncbi.nlm.nih.gov/BLAST
• and BLAST at EMBnet
http://www.ch.embnet.org/software/aBLAST.html
• Assumptions
- Random sequences
- Constant composition
• Conclusions
- Surprising similarities imply evolutionary homology
21
BLAST Algorithm
22
4.Organize the remaining high-scoring words into an
efficient search tree. This allows the program to rapidly
compare the high-scoring words to the database sequences.
5. Repeat step 3 to 4 for each k-letter word in the query
sequence.
6. Scan the database sequences for exact matches with the
remaining high-scoring words. The BLAST program
scans the database sequences for the remaining high-scoring
word, such as PEG, of each position. If an exact match is
found, this match is used to seed a possible un-gapped
alignment between the query and database sequences
7.Extend the exact matches to
high-scoring segment pair
(HSP).
BLAST stretches a longer alignment between
the query and the database sequence in the left
and right directions, from the position where
the exact match occurred. The extension does
not stop until the accumulated total score of
the HSP begins to .
Extending the High Scoring
Segment Pair (HSP)
Minimum
Score (S)
Neighborhood
Score Threshold (T)
25
8.List all of the HSPs in the database whose
score is high enough to be considered.
•For protein database searches (blastp and blastx), the default option is the
nonredundant (nr) database (GenBank, the Protein Data Bank (PDB),
SwissProt, PIR, and PRF). Another option is to search only
Refseq proteins.
•For DNA database searches (blastn, tblastn, tblastx), the default option is to
search the human (or mouse) genomic plus transcript database. Other
commonly
used options include the nucleotide nr database (GenBank, EMBL, DDBJ, and
PDB).
Step 4: Selecting Optional Search
Parameters
BLAST result