Bioinfo MCQS
Bioinfo MCQS
6. Motifs that can form α/β horseshoes conformation are rich with which protein residue?
a) Proline
b) Arginine
c) Valine
d) Leucine
8. The protein structural motif domain- helix loop helix are contained by all of the following except
________
a) Scleraxis
b) Neurogenins
c) Transcription Factor 4
d) Leucine zipper
3. In regular expressions, which of the following pair of pattern is wrongly matched with its
significance?
a) [ ] – Or
b) { } – Not
c) ( ) – Repeats
d) Z – Any
4. In terminologies related to regular expressions which of the following is false about terms and
operators?
a) Terms are strings or substrings
b) Operators combine terms and expressions
c) Operators do not have precedence
d) Operators have precedence like arithmetic operators
5. In regular expressions, which of the following pair of pattern is wrongly matched with its
significance?
a) ‘-’ – separator
b) < – N-terminal
c) > – C-terminal
d) ‘>>’ – end
7. While analysing motif sequences, what is the major disadvantageous feature of PROSITE?
a) The database constructs profiles to complement some of the sequence patterns
b) The functional information of these patterns is primarily based on published literature
c) Some of the sequence patterns are too short to be specific
d) Lack of specificity about probability and variation and relation between them
1. Which of the following is not an advantage of Statistical models’ methods in analyzing protein
motifs?
a) Sequence information is preserved from a multiple sequence alignment and expresses it with
probabilistic models
b) Statistical models allow partial matches and compensate for unobserved sequence patterns using
pseudo-counts
c) Statistical models have stronger predictive power than the regular expression based approach,
even when they are derived from a limited set of sequences
d) The comparative flexibility is less in case of these methods when compared to regular expressions
methods
2. For motif scanning which of the following programs or databases is for regulated sites curated from
scientific literature?
a) ENSEMBL
b) ORegAnno
c) MAST
d) Clover
3. Which of the following is not an advantageous feature or algorithm of the database PRINTS?
a) This program breaks down a motif into even smaller non-overlapping units called ‘fingerprints’,
which are represented by unweighted PSSMs
b) To define a motif, at least a majority of fingerprints are required to match with a query sequence
c) A query that has simultaneous high-scoring matches to a majority of fingerprints belonging to a
motif is a good indication of containing the functional motif
d) The difficulty to recognize short motifs when they reach the size of single fingerprints
6. Which of the following is false in case of the database Pfam and its algorithm?
a) Each motif or domain is represented by an HMM profile generated from the seed alignment of a
number of conserved homologous proteins
b) Since the probability scoring mechanism is more complex in HMM than in a profile-based approach
the use of HMM yields further increases in sensitivity of the database matches
c) Pfam-B only contains sequence families not covered in Pfam
d) The functional annotation of motifs in Pfam-A is often related to that in UNIPROT
7. Which of the following is false in case of the database SMART and its algorithm?
a) Contains HMM profiles constructed from manually refined protein domain alignments
b) Alignments in the database are built based on tertiary structures whenever available or based on
PSI-BLAST profiles
c) Alignments are further checked but not refined by human annotators before HMM profile
construction
d) SMART stands for Simple Modular Architecture Research Tool
8. Which of the following is false in case of the database InterPro and its algorithm?
a) InterPro is an integrated pattern database designed to unify multiple databases for protein
domains and functional sites
b) This database integrates information from PROSITE, Pfam, PRINTS, ProDom, and SMART databases
c) Only overlapping motifs and domains in a protein sequence derived by all five databases are
included
d) All the motifs and domains in a protein sequence derived by all five databases are included
9. Which of the following is false in case of the CDART and its algorithm?
a) CDART is a domain search program that combines the results from RPS-BLAST, SMART, and Pfam
b) The program is now an integral part of the regular BLAST search function
c) CDART is a substitute for individual database searches
d) It stands for Conserved Domain Architecture
10. Point out the wrong or irrelevant mathematical method in motif analysis.
a) Enumeration
b) Probabilistic Optimization
c) Deterministic Optimization
d) Literature mining
Protein Family Databases
1. Which of the following statements about COG is incorrect regarding its features?
a) Currently, there are 4,873 clusters in the COG databases derived from unicellular organisms
b) It is constructed by comparing protein sequences encoded in forty-three completely sequenced
genomes, which are mainly from prokaryotes, representing thirty major phylogenetic lineages
c) The interface for sequence searching in the COG database is the COGnitor program, which is based
on gapped BLAST
d) It is a protein family database based on structural classification
2. Which of the following statements about InterPro is incorrect regarding its features?
a) Protein relatedness is defined by the P-values from the BLAST alignments
b) The most closely related sequences are grouped into the lowest level clusters
c) More distant protein groups are merged into higher levels of clusters
d) The outcome of this cluster merging is a tree-like structure of functional categories
3. Pfam is available at four locations around the world. Which of the following is not one of them?
a) UK
b) Sweden
c) US
d) Japan
5. Which of the following statements about SCOP is incorrect regarding its features?
a) Proteins with the same shapes but having little sequence or functional similarity are placed in
different super families, and are assumed to have only a very distant common ancestor
b) Proteins having the same shape and some similarity of sequence and/or function are placed in
‘families’, and are assumed to have a closer common ancestor
c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London
d) It aims to determine the evolutionary relationship between proteins
7. Which of the following statements about SUPERFAMILY database is incorrect regarding its
features?
a) Sequences can be submitted raw or FASTA format
b) Sequences must be submitted in FASTA format only
c) It searches the database using a superfamily, family, or species name plus a sequence, SCOP, PDB
or HMM ID’s
d) It has generated GO annotations for evolutionarily closed domains and distant domains
8. Which of the following statements about PRINTS and ProDom databases is incorrect regarding its
features?
a) PRINTS is a compendium of protein fingerprints
b) Usually the motifs do not overlap, but are separated along a sequence, though they may be
contiguous in 3D-space
c) Current versions of ProDom are built using a novel procedure based on recursive BLAST searches
d) ProDom domain database consists of an automatic compilation of homologous domains
9. Which of the following statements about CATH-Gene3D and HAMAP databases is incorrect
regarding its features?
a) CATH-Gene3D describes protein families and domain architectures in complete genomes
b) In CATH-Gene3D the functional annotation is provided to proteins from single resource
c) HAMAP profiles are manually created by expert curators they identify proteins that are part of well-
conserved bacterial, archaeal and plastid-encoded proteins families or subfamilies.
d) HAMAP stands for High-quality Automated and Manual Annotation of microbial Proteomes
10. Which of the following statements about PANTHER and TIGRFAMs databases is incorrect
regarding its features?
a) TIGRFAMs provides a tool for identifying functionally related proteins based on sequence homology
b) TIGRFAMs is a collection of protein families, featuring curated multiple sequence alignments,
hidden Markov models (HMMs) and annotation
c) Hidden Markov models (HMMs) are not used in PANTHER
d) PANTHER is a large collection of protein families that have been subdivided into functionally
related subfamilies, using human expertise.
Global Sequence Alignment
1. When did Needleman-Wunsch first describe the algorithm for global alignment?
a) 1899
b) 1970
c) 1930
d) 1950
7. Which of the following is untrue regarding the scoring system used in dynamic programming?
a) If the residues are same in both the sequences the match score is assumed as +5 which is added to
the diagonally positioned cell of the current cell
b) If the residues are not same, the mismatch score is assumed as -3
c) If the residues are not same, the mismatch score is assumed as 3
d) The score should be added to the diagonally positioned cell of the current cell
1. When did Smith–Waterman first describe the algorithm for local alignment?
a) 1950
b) 1970
c) 1981
d) 1925
7. Among the following which one is not the approach to the local alignment?
a) Smith-Waterman algorithm
b) K-tuple method
c) Words method
d) Needleman-Wunsch algorithm
8. The overall height of a logo position reflects how conserved the position is, and the _____ of each
letter in a position reflects the _______ of the residue in the alignment.
a) height, relative frequency
b) width, relative frequency
c) height, amplitude
d) width, amplitude
2. The softwares for dot plot analysis perform several tasks. Which one of them is not performed by
them?
a) Gap open penalty
b) Gap extend penalty
c) Expectation threshold
d) Change or mutate residues
4. For significantly aligning sequences what is the resulting structure on the plot?
a) Intercrossing lines
b) Crosses everywhere
c) Vertical lines
d) A diagonal and lines parallel to diagonal
8. Isolated dots that are not on the diagonal represent exact matches.
a) True
b) False
9. Vertical frame shifts show ______ while the horizontal ones show _______
a) insertion, insertion
b) insertion, deletion
c) deletion, deletion
d) deletion, insertion
1. Use of the dynamic programming method requires a scoring system for the comparison of symbol
pairs, and a scheme for GAP penalties.
a) True
b) False
2. After the derivation, the outputs of the dynamic programming are the ratios are called even scores.
a) True
b) False
7. Which of the following is not a site on internet for alignment of sequence pairs?
a) BLASTX
b) BLASTN
c) SIM
d) BCM Search Launcher
8. Dayhoff PAM matrices, are based on an evolutionary model of protein change, whereas, BLOSUM
matrices, are designed to identify members of the same family.
a) True
b) False
9. A feature of the dynamic programming algorithm is that the alignments obtained depend on the
choice of a scoring system for comparing character pairs and penalty scores for gaps.
a) True
b) False
1. In scoring matrices, for convenience, odds scores are converted to log odds scores.
a) True
b) False
View Answer
3. The assumption in this evolutionary model is that the amino acid substitutions observed over short
periods of evolutionary history can be extrapolated to longer distances.
a) True
b) False
7. The more conserved amino acids in similar proteins from different species are ones that play an
essential role in structure and function and the less conserved are in sites that can vary without
having a significant effect on function.
a) True
b) False
View Answer
8. A gap opening penalty for any gap (g) and a gap extension penalty for each element in the gap (r)
are most often used, to give a total gap score wx, according to the equation ______
a) wx – rx = -g
b) wx = g – rx
c) wx = g + rx
d) wx + g + rx = 0
View Answer
9. In the GCG and FASTA program suites, the scoring matrix itself is formatted in a way that includes
default ______
a) gap additions
b) alignment scores
c) score penalties
d) gap penalties
View Answer
10. In case of the varying alignment, penalizing gaps heavily might occur. Then the best scoring local
alignment between the sequences will be one that optimizes the score between matches and
mismatches, without any gaps.
a) True
b) False
Assessing the Significance of Sequence Alignments”.
1. On analysis of the alignment scores of random sequences will reveal that the scores follow a
different distribution than the normal distribution called the _________
a) Gumbel equal value distribution
b) Gumbel extreme value distribution
c) Gumbel end value distribution
d) Gumbel distribution
View Answer
2. The statistical analysis of alignment scores is much better understood for ________ than for
_______
a) global alignments, local alignments
b) local alignments, global alignments
c) global alignments, any other alignment method
d) Needleman-Wunsch alignment, Smith-Waterman alignment
View Answer
3. When random or unrelated sequences are compared using a global alignment method, they can
have ____________ reflecting the tendency of the global algorithm to match as many characters as
possible.
a) very low scores
b) very high scores
c) moderate scores
d) low scores
View Answer
5. Waterman, in1989, provided a set of means and standard deviations of global alignment scores
between random DNA sequences, using mismatch and gap penalties that produce a linear increase in
score with _______ a distinguishing feature of global alignments.
a) alignment score
b) sequence score
c) sequence length
d) scoring system
View Answer
6. Who suggested that the global alignment scores between unrelated protein sequences followed
the extreme value distribution, similar to local alignment scores? And when?
a) Abagyan and Batalov, in 1981
b) Chvátal and Lipman, in 1984
c) Abagyan and Batalov, in 1997
d) Chvátal and Sankoff, in 1995
View Answer
7. _______ analyzed the distribution of scores among 100 vertebrate nucleic acid sequences and
compared these scores with randomized sequences prepared in different ways.
a) Lipman, in 1984
b) Batalov, in 1964
c) Waterman, in 1987
d) Lipman, in 1967
View Answer
8. If the random sequences were prepared in a way that maintained the local base composition by
producing them from overlapping fragments of sequence, the distribution of scores has a _______
standard deviation that is closer to the distribution of the natural sequences.
a) lowest
b) higher
c) lower
d) moderate
View Answer
9. The GCG alignment programs have a RANDOMIZATION option, which shuffles the second sequence
and calculates similarity scores between the unshuffled sequence and each of the shuffled copies.
a) True
b) False
View Answer
10. Dayhoff, 1978- 1983, devised a second method for testing the relatedness of two protein
sequences that can accommodate some local variation. Where this method is useful?
a) For finding repeated regions within a sequence
b) For finding similar regions that are in a different order in two sequences
c) For finding small conserved region such as an active site
d) For finding huge regions within sequences
Bayesian Statistics
2. With the application of Bayesian methods, the most probable repeat length and evolutionary time
since the repeat was formed may be derived.
a) True
b) False
View Answer
3. If the purpose is to calculate the probability of one event AND a second event, the odds scores for
the events are _________
a) added
b) multiplied
c) multiplied and added
d) subtracted
View Answer
4. In a type of probability, analysis is to calculate the odds score for one event OR a second event, or
of a series of events. In this case, the odds scores are _______
a) multiplied
b) subtracted
c) added and multiplied
d) added
View Answer
5. In Bayesian methods, difficulty with making estimations is that the estimate depends on the
following Assumption. (Assumption – The mutation rate in sequences has been constant with time
and that the rate of mutation of all nucleotides is the same.)
a) True
b) False
View Answer
6. Another difficulty in Bayesian methods is deciding on the length of sequence that was duplicated.
a) True
b) False
View Answer
7. A length and distance that gives the highest overall probability may then be determined. Such
alignments are initially found using ________
a) a particular scoring matrix only
b) an alignment algorithm only
c) an alignment algorithm and a particular scoring matrix
d) dot method
View Answer
9. Zhu (1998) have devised a computer program called the Bayes block aligner which in effect slides
____ sequences along each other to find the ______ ungapped regions or blocks.
a) two, least scoring
b) two, highest scoring
c) multiple, highest scoring
d) multiple, least scoring
View Answer
10. Unlike the commonly used methods for aligning a pair of sequences, the Bayesian method
_______ using a particular scoring matrix or designated gap penalties.
a) does not depend on
b) depends on
c) is based on
d) involves
Sequence Homology Versus Sequence Similarity and Identity.
3. The presence of evolutionary traces is because some of the residues that perform key functional
and structural roles tend to be preserved by natural selection; other residues that may be less crucial
for structure and function tend to mutate more frequently.
a) True
b) False
View Answer
4. The degree of sequence variation in the alignment reveals evolutionary relatedness of different
sequences, whereas the conservation between sequences reflects the changes that have occurred
during evolution in the form of substitutions, insertions, and deletions.
a) True
b) False
View Answer
5. If the two sequences share significant similarity, it is extremely ______ that the extensive similarity
between the two sequences has been acquired randomly, meaning that the two sequences must have
derived from a common evolutionary origin.
a) unlikely
b) possible
c) likely
d) relevant
View Answer
6. Sometimes, it is also possible that two sequences have derived from a common ancestor, but may
have diverged to such an extent that the common ancestral relationships are not recognizable at the
sequence level.
a) True
b) False
View Answer
9. Shorter sequences require higher cutoffs for inferring homologous relationships than longer
sequences.
a) True
b) False
View Answer
10. Sequence similarity and sequence identity are synonymous for nucleotide sequences and protein
sequences as well.
a) True
b) False
Methods.
1. The overall goal of pair wise sequence alignment is to find the best pairing of two sequences, such
that there is maximum correspondence among residues.
a) True
b) False
View Answer
2. In local alignment, the two sequences to be aligned cannot be of unequal lengths.
a) True
b) False
View Answer
3. Alignment algorithms, both global and local, are fundamentally similar and only differ in the
optimization strategy used in aligning similar residues.
a) True
b) False
View Answer
4. In a dot matrix, two sequences to be compared are written in the _____________ of the matrix.
a) horizontal and vertical axes
b) 2 parallel horizontal axes
c) 2 parallel vertical axes
d) horizontal axis (one preceding another)
View Answer
5. When the two sequences have substantial regions of similarity, many dots line up to form
contiguous _______ lines.
a) crossings on
b) horizontal
c) diagonal
d) vertical
View Answer
6. A problem exists when comparing _____ sequences using the dot matrix method, namely, the
_______
a) small, amplification
b) large, amplification
c) small, high noise level
d) large, high noise level
View Answer
7. If the selected window size is too long, sensitivity of the alignment is lost.
a) True
b) False
View Answer
10. Which of the following is untrue about dot plot method and its applications?
a) This method gives a direct visual statement of the relationship between two sequences
b) One of its advantages is the identification of sequence repeat regions based on the presence of
parallel diagonals of the same size vertically or horizontally in the matrix
c) It is not useful in identifying chromosomal repeats
d) The method can be used in identifying nucleic acid secondary structures through detecting self-
complementarity of a sequence.
Statistical Significance of Sequence Alignment”.
1. The truly statistically significant sequence alignment will be able to provide evidence of homology
between the sequences involved.
a) True
b) False
View Answer
2. By calculating alignment scores of a large number of ______ sequence pairs, a distribution model of
the ______ sequence scores can be derived.
a) related, randomized
b) unrelated, randomized
c) unrelated, unrandomized
d) related, unrandomized
View Answer
3. Many studies have demonstrated that the distribution of similarity scores assumes a peculiar shape
that resembles a highly skewed normal distribution with a long tail on one side. The distribution
matches the _______
a) Gumble elective value distribution
b) Gumble extreme void distribution
c) Gumble end value distribution
d) Gumble extreme value distribution
View Answer
5. In the statistical test, randomization process in which one of the two given sequences is randomly
shuffled.
a) True
b) False
View Answer
7. If the score is located in the extreme margin of the distribution, that means that the alignment
between the two sequences is ______ due to random chance and is thus considered ______
a) unlikely, significant
b) unlikely, insignificant
c) unlikely, insignificant
d) very likely, significant
View Answer
8. It is not known whether the Gumble distribution applies equally well to gapped alignments.
a) True
b) False
View Answer
9. Which of the following is untrue about the PRSS program?
a) It stands for Probability of Random Shuffles
b) It is a web-based program that can be used to evaluate the statistical significance of DNA or protein
sequence alignment
c) It first aligns two sequences using the Needleman-Wunsch algorithm and calculates the score
d) It holds one sequence in its original form and randomizes the order of residues in the other
sequence.
View Answer
10. The major disadvantage of the PRSS program is that it doesn’t allow partial shuffling.
a) True
b) False
Exhaustive Algorithms.
1. Related sequences are identified through the database similarity searching and as the process
generates multiple matching sequence pairs, it is often necessary to convert the numerous pair wise
alignments into a single alignment.
a) True
b) False
View Answer
2. There is a unique advantage of multiple sequence alignment because it reveals more biological
information than many pair wise alignments can.
a) True
b) False
View Answer
4. The scoring function for multiple sequence alignment is based on the concept of sum of pairs (SP).
a) True
b) False
View Answer
5. Which of the following scores are not considered while calculating the SP scores?
a) All possible pair wise matches
b) All possible mismatches
c) All possible gap costs
d) Number of gap penalties
View Answer
6. Given a multiple alignment of three sequences, the sum of scores is calculated as the sum of the
dissimilarity scores of every pair of sequences at each position.
a) True
b) False
View Answer
7. There are two approaches viz. exhaustive and heuristic approaches used in multiple sequence
alignment.
a) True
b) False
View Answer
9. As the amount of computational time and memory space required increases exponentially with the
number of sequences, it makes the multidimensional search matrix method computationally
prohibitive to use for a large data set.
a) True
b) False
View Answer
Heuristic Algorithms
1. Which of the following is untrue regarding the Progressive Alignment Method?
a) Progressive alignment depends on the stepwise assembly of multiple alignments and is heuristic in
nature
b) It speeds up the alignment of multiple sequences through a multistep process
c) It first conducts pair wise alignments for each possible pair of sequences using the Needleman–
Wunsch global alignment method and records these similarity scores from the pair wise comparisons
d) Its drawback is it slows down the alignment of multiple sequences through a single step process
View Answer
8. The major drawback of the progressive and iterative alignment strategies is that they are largely
global alignment based and may therefore fail to recognize conserved domains and motifs among
highly divergent sequences of varying lengths.
a) True
b) False
View Answer
10. Match-Box compares segments of some of the nine residues of possible Pair wise alignments.
a) True
b) False
Needleman – Wunsch Algorithm.
3. The global sequence alignment is suitable when the two sequences are of dissimilar length, with a
negligible degree of similarity throughout.
a) True
b) False
View Answer
4. The alignment score is the sum of substitution scores and gap penalties in this type of algorithm.
a) True
b) False
View Answer
8. The number of possible global alignments between two sequences of length N is _____
a) 2NπN√
b) 22NπN√
c) 2(N−1)πN√
d) 22NN√
View Answer
10. There are two types matrices involved in the study- score matrices and trace matrices.
a) True
b) False