Fundamentals of Bioinformatics - L5
Fundamentals of Bioinformatics - L5
Lectures 5
Dr. Marwa N.M.E. Sanad
• Alignment
• Ancestor
• Identity
• Similarity
• Homology
• Analogous
• Ortholog
• Paralog
Homology (Common ancestor)
http://evolution.berkeley.edu/evolibrary/article/0_0_0/similarity_ms_06
Homology (Common ancestor)
http://www.ncbi.nlm.nih.gov/books/NBK62051/
Analogy (Convergent ancestor)
Fish Mammals
Sequence
alignment
Dot plot
Sliding Sliding
Dotlet
Sequence Alignment
Global alignment Local alignment
•Pairwise alignment •Pairwise alignment
•Multiple alignment
•Smith- waterman algorithms
•Sliding alignment •Sliding alignment
cg gg ta - - tccaa Gap
cc c - ta gg tccca Indel
A scoring scheme:
Using to discriminate between good and bad alignments.
Score of alignment=
Ʃ ( identities, mismatches)- Ʃ (gap penalties)
Substitution Matrices
Mismatches
alignment .
output data.
Dr. Marwa Sanad 20
Basic Local Alignment Search Tool (BLAST)
1
Blastp protein protein
6
Blastx DNA protein
6
tblastn protein DNA
36
tblastx DNA DNA
Nucleotide Peptide
Megablast Blastp
Highly similar Protein-protein blast
DELTA-Blast
Domain Enhanced lookup time
accelerated balst
Dr. Marwa Sanad 34
Step 5: The algorithms parameters
[a] General properties : word size, threshold
1- Expect (e) value:
Control with the expected number of matches
2- EXPECT thresholds:
Lower EXPECT thresholds are more stringent, leading to
fewer chance matches being reported.
E
Cost to extend gap [Integer]: default = 2 for nucleotides/ 1 for proteins
2. Gap cost
•Existence :extension
•Increasing the Gap Costs will decrease the number of Gaps
introduced.
• Cost to create and extend a gap in an alignment. Linear costs
are available only with megablast and are determined by the
match/mismatch scores
Dr. Marwa Sanad 40
[b] Scoring parameters
1. Matrix :
Substitution matrices PAM&BLOSUM
2. Gap cost
• Existence :extension
• Cost to create and extend a gap in
2. Mask
➢ The lower value, the more significant the hit. If you want to
be certain of homology, your E-value must be lower than 10-
4/10-6
• The % identity:
o A subsititute for the E-value..
o The fraction of residues that are either identical or
similar. (+)
• Length:
o This is the length of the alignment, which indicates how
long are the two segments of your sequences that BLAST
has aligned.
o Note: very short alignments can come up with high E-
values and not be very meaningful.
Dr. Marwa Sanad 50
Interpreting Results
• Generally:
o Bit matches below 50 are unreliable
o E scores greater than 0.0001 are often close to the
twilight zone
Possible problem 1: Short query sequences: Short alignments may have Expect
values above the default threshold, which is 10 on most pages,
and, therefore, are not displayed.
Solution: Try increasing the Expect threshold (under 'Algorithm
parameters').
Possible problem 2: The low complexity regions are not allowed to initiate
alignments, so if your query is largely low complexity, the
filter may prevent all hits to the database. On the Basic BLAST
pages,
Solution: Adjust the filter settings in the section 'Filters and Masking',
under 'Algorithm parameters'. For a description of low
complexity filters,
Dr. Marwa Sanad 53
Troubleshooting
ERROR 2: An error has occurred on the server, Too many HSPs to save all
Possible problem 1: The total number of high-scoring segment pairs (HSPs) is far
too many for the BLAST servers to return the results. This is
rare as the results have to be several hundred megabytes of
information for this to happen. However, there are certain
searches which could generate a huge amount of data. Most
typically this error occurs when the default filters are turned
off or when the query sequences have repeat elements in
them.
Solution1: 4. Break up large queries into smaller pieces; submit each piece in a
separate search. A common cause of errors in BLAST is searching with
a huge sequence, like a complete chromosome, against a large
database like nr. This is better accomplished in portions rather than
one large, continuous sequence.
7. For megablast and blastn searches, try increasing the word size and/or
decreasing the Expect threshold
Dr. Marwa Sanad 56