MCQ A
MCQ A
Apply what is the name given to the method of making DNA from
RNA?
What is the number of bones are present in human body?
Which type of RNA carries the genetic information from the DNA to
the ribosome for protein synthesis
31. In gene expression data analysis, insect what does the term
\"fold change\" refer to?
33. Discover the acronym \"NCBI\" stand for in the context of
biological databases?
31. In gene expression data analysis, insect what does the term
\"fold change\" refer to?
33. Discover the acronym \"NCBI\" stand for in the context of
biological databases?
31. In gene expression data analysis, insect what does the term
\"fold change\" refer to?
Describe the steps for BLAST and classify its different types.
Identify the main idea behind the Naïve Bayes classifier, and infer
that how does it handle feature independence?
Outline the primary advantage of the BIRCH (Balanced Iterative
Reducing and Clustering using Hierarchies) clustering algorithm?
Recall the Central Dogma of molecular biology, and define its three
main processes?
Interpret some common challenges faced in the integration of
biological data from various sources in systems biology studies?
Focus on the key points and the problems associated with using a
dot plot alogorithm
Suppose A,B,C,D are amino acid sequence for four proteins. A and
B are phylogenetically close. C and D are phylogenetically close.
Drow a rooted arbitrary phylogenetic tree with A,B,C,D
Interpret what is Sequence identity
Illustrate K means
Deduce the organization and functions of the cellular system within
eukaryotic cells
peptide bond(Y)
2,
Di ester bond(N)
phosphate bond(N)
quaternary structure(Y)
tertiary structure(N)
1,
secondary structure(N)
primary structure(N)
Nitrogenous base(N)
2,
Purine or pyrimidine base + phosphorous(N)
Guanine(N)
adenine(N)
4,
thiamine(N)
uracil(Y)
Peptide bonds(N)
Phosphodiester bridges(Y)
2,
Glycosidic bonds(N)
All of them(N)
1,
2 amino acids and 2 peptide bond(N)
adenine(N)
cytosine(N)
3,
thymine(Y)
uracil(N)
Electrophoresis(N)
Chromatography(N)
4,
Centrifugation(N)
X-ray crystallography(Y)
Crick and Neck(N)
2,
Watson and Franklin(N)
Transcription(Y)
Translation(N)
1,
Metabolism(N)
Reduction(N)
3,
Sugar and pyrimidines(Y)
TAAGCTAC(N)
UAAGCUAC(Y)
2,
ATTCGATG(N)
AUUCGAUG(N)
2,
UAA , UGA, UUA(N)
2,
beta sheet formation(N)
Nitrogenous base(N)
2,
Purine or pyrimidine base + phosphorous(N)
Nitrogenous base(N)
2,
Purine or pyrimidine base + phosphorous(N)
Purine or pyrimidine base + sugar + Phosphorous(N)
Conservative(N)
Non-conservative(N)
3,
Semi-conservative(Y)
None(N)
proteins(N)
RNA(N)
3,
both (a) and (b)(Y)
lipids(N)
Transcription(N)
Replication(N)
3,
translation(Y)
Reverse transcription(N)
Reverse transcriptase(N)
Reverse transcription(Y)
2,
Reverse Replication(N)
Reverse Translation(N)
206(Y)
306(N)
1,
106(N)
602(N)
Stomach(N)
Oesophagus(N)
3,
Mouth(Y)
Intestine(N)
Nostril(N)
Skin(N)
3,
Alveoli(Y)
bronchai(N)
Embryo(N)
Brastula(N)
3,
3,
Zygote(Y)
Infant(N)
3,
Biology, computerscience and IT(Y)
None of them(N)
2,
mobile hard disk(N)
online library(N)
PRINT(N)
PROSITE(Y)
2,
PIR.(N)
.PDB(N)
Dayhoff(N)
Pearson(N)
4,
Richard Durbin(N)
Michael.J.Dunn(Y)
BLAST(Y)
RasMol(N)
1,
EMBOSS(N)
PROSPECT(N)
3,
3 billion base pairs(Y)
BLAST(N)
COPIA(Y)
2,
PROSPECT(N)
Pattern hunter(N)
DNA probes(N)
DNA polymerase(N)
3,
DNA microarrays(Y)
DNA fingerprinting(N)
Genomics(N)
Pharmacogenomics(Y)
2,
Pharmacogenetics(N)
Proteomics(N)
Biomolecules(N)
4,
Set of proteins(N)
Gene tracking(N)
Genome walking(N)
3,
Genome mapping(Y)
Chromosome walking(N)
Molecular fitting(N)
Molecular matching(N)
4,
Molecule affinity checking(N)
Molecular docking(Y)
Drug designing(N)
4,
Understand the relationships between organisms(N)
Flowchart(N)
Algorithm(Y)
2,
Procedure(N)
Procedure(N)
In silico(Y)
Dry lab(N)
1,
Wet lab(N)
Dry lab(N)
Invitro(N)
3,
In silico(Y)
Invivo(N)
J.D Watson(N)
Pauline Hogeweg(Y)
2,
Margaret Dayhoff(N)
Frederic Sanger(N)
DDBJ(N)
NCBI(Y)
2,
EMBL(N)
CSIR(N)
PIR(N)
PSD(N)
3,
EMBL(Y)
SWISS PORT(N)
Entrez(Y)
SeqIn(N)
1,
Text Search(N)
STAG.(N)
A Collection of Data(N)
Analysing Data(N)
4,
Storage of Data(N)
3,
information about the locus name, length of the sequence,(Y)
None of them(N)
Insertion(N)
Selection,(N)
3,
deletion,(Y)
exchange(N)
as pseudo codes(N)
as syntax(Y)
2,
as programs(N)
as flowcharts(N)
Prosite-(Y)
Uni prot(N)
1,
NCBI(N)
EMBL(N)
1,
they belong to the same fold family.(N)
2,
A cluster of protein sequences gathered from a BLAST search.(N)
3,
reflect areas of both functional and structural importance.(Y)
3,
To allow you to distinguish conserved residues and residue groups more easily(Y)
CATH(N)
SCOP(N)
4,
PDBsum(N)
PDB(Y)
divergence event(Y)
Convergence event(N)
1,
Multivergence event(N)
No eventual significance(N)
1,
m-RNA sequence(N)
2,
3.3 billion base(N)
3,
Both The techniques(Y)
None of them(N)
Electrophoresis(N)
Chromatography(N)
4,
Centrifugation(N)
X-ray crystallography(Y)
Yes(Y)
No(N)
1,
Unknown(N)
NA(N)
2,
assignment of each point to clusters(N)
Yes(Y)
No(N)
1,
Unknown(N)
NA(N)
Clover leaf(Y)
Pepal leaf(N)
1,
Banana leaf(N)
Fig leaf(N)
Kidney(N)
Liver(N)
3,
Bone marrow(Y)
Brain(N)
tRNA (N)
mRNA(Y)
2,
rRNA(N)
DNA(N)
2,
Providing a template for protein synthesis(N)
P(A) * P(B(N)
A) / P(B)(Y)
2,
2,
P(A(N)
B) * P(B) / P(A)(N)
A step-by-step process(Y)
A type of computer(N)
1,
A mathametical formula(N)
41275(N)
2,
41365(Y)
18994(N)
19085(N)
0.7(N)
3,
0.6(N)
0.5(Y)
0.42(N)
1,
Regression analysis(N)
Data visualization(N)
Number of clusters(Y)
1,
Number of features(N)
Number of iterations(N)
Centroid(N)
Mean(N)
3,
Median(Y)
Mode(N)
3,
K-means uses centroids(Y)
Playing chess(N)
1,
Sorting data(N)
Data points(N)
Variables(Y)
2,
Arithmetic operations(N)
Graph edges(N)
Causal relationships(Y)
Musical notes(N)
1,
Geographical locations(N)
Cooking recipes(N)
To define probabilities(Y)
1,
To display colorful charts(N)
To play music(N)
Classification(N)
Regression(N)
3,
Clustering(Y)
Data visualization(N)
Number of clusters(Y)
1,
Number of features(N)
Number of iterations(N)
To calculate regression(N)
1,
To perform classification(N)
To visualize data(N)
Sorting animals(N)
1,
Calculating DNA sequences(N)
Types of leaves(N)
Tree branches(N)
3,
Evolutionary relationships(Y)
To classify animals(N)
1,
To predict future mutations in DNA(N)
To count the number of species(N)
Weather data(N)
1,
Social media posts(N)
Baking cookies(N)
Maximum Likelihood(Y)
2,
Solving algebraic equations(N)
Building bridges(N)
4,
Ungapped Global Alignment(N)
Local Alignment(Y)
1,
1,
Align sequences without gaps(N)
Local Alignment(N)
3,
Gapped Global Alignment(Y)
Local Alignment(N)
1,
Gapped Global Alignment(N)
1,
Counting the number of proteins(N)
1,
They synthesize proteins(N)
Structural motifs(N)
Random motifs(N)
4,
Inactive motifs(N)
Functional motifs(Y)
Protein structures(N)
4,
RNA and protein levels(N)
DNA replication(N)
1,
The study of fossils(N)
GPS coordinates(N)
RNA-Seq(Y)
2,
Recipe ingredients(N)
Weather forecasts(N)
To perform surgery(N)
3,
To understand gene function(Y)
To build bridges(N)
To bake cookies(N)
1,
To study the weather(N)
To build a computer(N)
To record TV shows(N)
To hold genetic samples(Y)
2,
To make a sandwich(N)
To measure temperature(N)
Car mileage(N)
Shoe sizes(N)
4,
Wind speed(N)
A painting(N)
1,
A recipe for a dish(N)
By listening to music(N)
4,
By reading a book(N)
Data accuracy(Y)
Rocket science(N)
1,
Political history(N)
Cooking recipes(N)
1,
To confuse users(N)
1,
Memorizing phone numbers(N)
4,
To confuse users(N)
3,
National Center for Biotechnology Information(Y)
Weather forecasts(N)
3,
Movie reviews(Y)
Historical events(N)
A type of protein(N)
1,
A specific protein sequence(N)
4,
A sequence with many gaps(N)
DNA replication(N)
1,
The study of fossils(N)
DNA replication(N)
RNA transcription(Y)
2,
Protein translation(N)
Protein transcription(N)
GenBank(Y)
UniProt(N)
1,
Enzyme Commission Database(N)
NCBI(N)
Protein concentrations(N)
2,
Enzyme activity(N)
Cell size(N)
BLAST(N)
Smith-Waterman(Y)
2,
Needleman-Wunsch(N)
K-means clustering(N)
PDB(N)
GenBank(Y)
2,
Swiss-Prot(N)
NCBI(N)
BLOSUM(Y)
FASTA(N)
1,
Smith-Waterman(N)
Gibbs sampling(N)
Needleman-Wunsch(N)
Smith-Waterman(N)
3,
ClustalW(Y)
BLAST(N)
Protein structures(N)
2,
RNA secondary structures(N)
Phylogenetic trees(N)
Valine(Y)
Tyrosine(N)
1,
Proline(N)
Arginine(N)
Energy production(N)
Information storage(N)
4,
Cellular respiration(N)
Enzyme catalysis(Y)
K-means(N)
2,
Dynamic Programming(N)
Support Vector Machine(N)
Bayes\' Theorem(N)
2,
Hidden Markov Model(N)
K-medoid(N)
AGNES(N)
3,
DBSCAN(Y)
Hyper-Molecular Mapping(N)
1,
High Memory Management(N)
Gibbs Sampling(N)
BLAST(Y)
2,
MEME(N)
MotifX(N)
1,
To classify organisms(N)
DBSCAN(N)
K-means(Y)
2,
AGNES(N)
Isodata(N)
2,
A phylogenetic tree branch(N)
A sequence annotation(N)
3,
To identify the best alignment(Y)
To create phylogenetic trees(N)
BLAST(Y)
Smith-Waterman(N)
1,
Needleman-Wunsch(N)
ClustalW(N)
Prior probability(N)
Posterior probability(N)
4,
Likelihood(N)
Confidence interval(Y)
4,
National Consortium for Bioinformatics(N)
Global alignment(Y)
Local alignment(N)
1,
1,
Pairwise alignment(N)
Multiple alignment(N)
K-means(N)
AGNES(Y)
2,
DBSCAN(N)
1,
Analyzing gene expression data(N)
Probability modeling(Y)
2,
Sequence alignment(N)
Machine learning(N)
X-ray crystallography(N)
NMR spectroscopy(N)
4,
Homology modeling(N)
PCR amplification(Y)
DNA replication(N)
Protein translation(N)
4,
Information storage(N)
RNA transcription(Y)
Protein structures(N)
Gene sequences(Y)
2,
Phylogenetic trees(N)
Microarray data(N)
3,
The difference in expression levels(Y)
DNA sequencing(N)
3,
Microarray experiments(Y)
Protein purification(N)
BLOSUM(Y)
PAM(N)
1,
FASTA(N)
Gibbs sampling(N)
1,
Alignment of DNA and RNA(N)
1,
Predicting secondary structures(N)
Gene splicing(N)
Glycine(N)
Thymine(Y)
2,
Leucine(N)
Cysteine(N)
Nucleotide(N)
Polypeptide chain(Y)
2,
Carbohydrate(N)
Lipid(N)
1,
DBSCAN(N)
K-means clustering(N)
K-means(N)
Isodata(N)
Hierarchical
DIANA(N) clustering,
Hierarchical clustering(N)
To calculate p-values(N)
3,
To score amino acid substitutions(Y)
Finding highly
Finding distant homologs(N) similar regions,
The difference in
The difference in expression levels(N) expression levels,
Alignment with
gaps between
Alignment of DNA and RNA(N) sequences,
Modeling random
Sequence alignment(N) processes,
Pairwise alignment(N)
Local alignment(N)
Multiple alignment,
Global alignment(N)
Multiple alignment(N)
The process of translation(N)
The difference in
The difference in expression levels(N) expression levels,
Alignment with
gaps between
Alignment of DNA and RNA(N) sequences,
The difference in
The difference in expression levels(N) expression levels,
Alignment with
gaps between
Alignment of DNA and RNA(N) sequences,
The difference in
The difference in expression levels(N) expression levels,
Alignment with
gaps between
Alignment of DNA and RNA(N) sequences,
DNA replication(N)
RNA transcription(N)
RNA transcription,
Protein translation(N)
Protein transcription(N)
GenBank(N)
UniProt(N)
GenBank,
Enzyme Commission Database(N)
NCBI(N)
Protein concentrations(N)
Gene expression
Enzyme activity(N) levels,
Cell size(N)
BLAST(N)
Smith-Waterman(N)
Smith-Waterman,
Needleman-Wunsch(N)
K-means clustering(N)
PDB(N)
GenBank(N)
GenBank,
Swiss-Prot(N)
NCBI(N)
BLOSUM(N)
FASTA(N)
BLOSUM,
Smith-Waterman(N)
Gibbs sampling(N)
Needleman-Wunsch(N)
Smith-Waterman(N)
3,
ClustalW(Y)
BLAST(N)
Protein structures(N)
Occurrence of
RNA secondary structures(N) known sites,
Phylogenetic trees(N)
Valine(N)
Tyrosine(N)
Valine,
Proline(N)
Arginine(N)
Energy production(N)
Information storage(N)
Enzyme catalysis,
Cellular respiration(N)
Enzyme catalysis(N)
K-means(N)
Neighbor Joining
Dynamic Programming(N) Algorithm,
Support Vector Machine(N)
Bayes\' Theorem(N)
Naïve Bayes
Hidden Markov Model(N) Classifier,
K-medoid(N)
AGNES(N)
DBSCAN,
DBSCAN(N)
Hyper-Molecular Mapping(N)
Hidden Markov
High Memory Management(N) Model,
Finding highly
similar regions,
Finding distant homologs(N) similar regions,
Gibbs Sampling(N)
BLAST(N)
BLAST,
MEME(N)
MotifX(N)
To display a family
To classify organisms(N) tree,
DBSCAN(N)
K-means(N)
K-means,
AGNES(N)
Isodata(N)
An insertion or
An insertion or
deletion in a
A phylogenetic tree branch(N) sequence,
A sequence annotation(N)
To identify the
To identify the best alignment(N) best alignment,
BLAST(N)
Smith-Waterman(N)
BLAST,
Needleman-Wunsch(N)
ClustalW(N)
Prior probability(N)
Posterior probability(N)
Confidence
Likelihood(N) interval,
Confidence interval(N)
Global alignment(N)
Local alignment(N)
Global alignment,
Pairwise alignment(N)
Multiple alignment(N)
K-means(N)
AGNES(N)
AGNES,
DBSCAN(N)
Identifying highly
Analyzing gene expression data(N) conserved regions,
Probability
Sequence alignment(N) modeling,
Machine learning(N)
X-ray crystallography(N)
NMR spectroscopy(N)
PCR amplification,
Homology modeling(N)
PCR amplification(N)
DNA replication(N)
Protein translation(N)
Information
Information storage(N) storage,
RNA transcription(N)
Protein structures(N)
Gene sequences(N)
Gene sequences,
Phylogenetic trees(N)
Microarray data(N)
The difference in
The difference in expression levels(N) expression levels,
DNA sequencing(N)
Microarray
Microarray experiments(N) experiments,
Protein purification(N)
BLOSUM(N)
PAM(N)
BLOSUM,
FASTA(N)
Gibbs sampling(N)
Alignment with
gaps between
Alignment of DNA and RNA(N) sequences,
Protein purification(N)
Identifying highly
Predicting secondary structures(N) conserved regions,
Gene splicing(N)
Glycine(N)
Thymine(N)
Thymine,
Leucine(N)
Cysteine(N)
Nucleotide(N)
Polypeptide chain(N)
Polypeptide chain,
Carbohydrate(N)
Lipid(N)
Neighbor Joining Algorithm(N)
Neighbor Joining
DBSCAN(N) Algorithm,
K-means clustering(N)
K-means(N)
Isodata(N)
Hierarchical
DIANA(N) clustering,
Hierarchical clustering(N)
To calculate p-values(N)
To score amino
To score amino acid substitutions(N) acid substitutions,
Modeling random
Sequence alignment(N) processes,
Local alignment(N)
Multiple alignment,
Global alignment(N)
Multiple alignment(N)
The five traditional BLAST programs are: BLASTN, BLASTP, BLASTX, TBLASTN, and
TBLASTX. BLASTN compares nucleotide sequences to one another (hence the N). All
,
other programs compare protein sequences. Here are the basic steps involved in
using BLAST: Select a Query Sequence: Begin by selecting the sequence you want to
.Rooted tree. Make the inference about the most common ancestor of the leaves or
branches of the tree. Un-rooted tree. Make an illustration about the leaves or
,
branches and do not make any assumption regarding the most common ancestor.
Bifurcating tree. ... The multifurcating tree.(Y)
In bioinformatics, a phylogenetic tree is frequently used to determine the
evolutionary relationships among a group of viruses, bacteria, animals, or plants.
,
Phylogenetic tree is used to learn more about a new pathogen outbreak and helps in
drug discovery by using molecular sequencing technologies.(Y)
.Cladograms and phylogenetic trees are functionally very similar, but they show
different things. Cladograms do not indicate time or the amount of difference
,
between groups, whereas phylogenetic trees often indicate time spans between
branching points.(Y)
.The complete structure of a protein can be described at four different levels of
complexity: primary, secondary, tertiary, and quaternary structure.The primary
structure is comprised of a linear chain of amino acids. The secondary structure
,
contains regions of amino acid chains that are stabilized by hydrogen bonds from the
polypeptide backbone. These hydrogen bonds create alpha-helix and beta-pleated
sheets of the secondary structure.(Y)
A nucleotide is the basic building block of nucleic acids (RNA and DNA). A nucleotide
consists of a sugar molecule (either ribose in RNA or deoxyribose in DNA) attached to
,
a phosphate group and a nitrogen-containing base. The bases used in DNA are
adenine (A), cytosine (C), guanine (G) and thymine (T).(Y)
Replication is the process by which DNA is duplicated by a series of enzyme actions.
Transcription is the process of copying a gene's DNA sequence to make an RNA
,
molecule and translation is the process in which proteins are synthesized after the
process of transcription of DNA to RNA in the cell's nucleus.(Y)
In recent years, bioinformatics has become an essential tool in many areas of
biology, including genomics, proteomics, drug discovery, and disease diagnosis. It is
,
also playing an increasingly important role in public health initiatives such as
outbreak detection and surveillance.(Y)
The main difference lies in their molecular composition as Nucleosides contain only
sugar and a base whereas Nucleotides contain sugar, base and a phosphate group as
,
well. A nucleotide is what occurs before RNA and DNA, while the nucleoside occurs
before the nucleotide itself.(Y)
The study of the interactions and behaviour of the components of biological
entities—i.e. molecules, cells, organs, and organisms, is known as ‘systems biology’.
Every human being, for example, is a system. Our organs, tissues, cells, and the
,
molecules they are made of, as well as bacteria and other organisms that live on our
skin and in our digestive system, are all part of the system. Systems biology studies
these parts and how they work together.(Y)
The Golgi apparatus, or Golgi complex, functions as a factory in which proteins
received from the ER are further processed and sorted for transport to their eventual
destinations: lysosomes, the plasma membrane, or secretion.The ER has a central
role in lipid and protein biosynthesis. Its membrane is the site of production of all the
transmembrane proteins and lipids for most of the cell's organelles.The cell ,
membrane, therefore, has two functions: first, to be a barrier keeping the
constituents of the cell in and unwanted substances out and, second, to be a gate
allowing transport into the cell of essential nutrients and movement from the cell of
waste products.(Y)
Information contained in biological databases includes gene function, structure,
localization (both cellular and chromosomal), clinical effects of mutations as well as
similarities of biological sequences and structures.4 objectives of biological
,
databases:- • To make all relevant data available at one place. • To store all relevant
information easily. • To make biological data available to the scientist. • To update
existing information easily.(Y)
Primary databases store and make data available to the public, acting as repositories.
eg:-GenBANK,DDBJ Secondary databases make use of publicly available sequence
data in primary databases to to provide layers of information to DNA or protein
,
sequence data.eg:-UniProt Knowledgebase. Composite databases are meant for
keeping records of specific datasets meant for specific purpose and applications.
Example- OMIM(Y)
K-means uses the mean (centroid) of data points in a cluster to represent it, whereas
,
k-medoid uses the actual data point (medoid) to represent the cluster.(Y)
DBSCAN stands for "Density-Based Spatial Clustering of Applications with Noise." Its
primary advantage over k-means is its ability to discover clusters of arbitrary shapes ,
and handle noise in the data.(Y)
The Naïve Bayes classifier assumes that all features are conditionally independent,
given the class label. It calculates the probability of a data point belonging to a class
,
by multiplying the individual conditional probabilities of each feature given that
class.(Y)
BIRCH is known for its ability to efficiently handle large datasets due to its
,
hierarchical structure and space-saving data summarization techniques.(Y)
Grid-based methods rely on a grid structure to divide the data space into cells or
,
regions, making them suitable for handling uniformly distributed data.(Y)
Heuristic alignment algorithms aim to find reasonably good alignments quickly, often
by making simplified assumptions. An example is the BLAST (Basic Local Alignment ,
Search Tool) algorithm, which rapidly identifies local sequence similarities.(Y)
HMMs can model complex dependencies in biological sequences, making them
suitable for aligning sequences with various evolutionary relationships and structural ,
features.(Y)
Local alignment finds subsequences within sequences that are similar, even if the
sequences themselves are dissimilar elsewhere. Global alignment aligns entire
,
sequences from start to end, forcing the alignment to span the entire length of both
sequences.(Y)
The Neighbor Joining Algorithm is used to construct phylogenetic trees from distance
matrices. It iteratively joins pairs of taxa or clusters based on their pairwise distances ,
to build a hierarchical tree representing evolutionary relationships.(Y)
The Central Dogma states that genetic information flows from DNA to RNA to
protein. The three main processes are DNA replication, transcription (DNA to RNA), ,
and translation (RNA to protein).(Y)
Challenges include data heterogeneity, differing data formats, data quality issues,
and the need for robust data integration methods to combine diverse biological ,
datasets effectively.(Y)
Global alignment is a method of comparing two sequences, which aligns the entire
length of the sequences by maximizing the overall similarity. This method is used
when comparing sequences that are of the same length. Global alignment is based on
Needleman- Wunsch alignment. In global alignment Sequence to be aligned assume
,
to be genetically similar over there entire length. Alignment is carried out from
beginning to end of both sequences to find the best possible alignment across the
entire length between the sequence sequence .The two sequences are treated as
potentially equivalent. (Y)
A scoring model in bioinformatics is a mathematical system used to assign scores or
values to various biological sequence alignments. It helps determine how well two
sequences align with each other, with higher scores indicating better alignment. ,
Scoring models are essential in tasks such as sequence alignment, where they aid in
identifying similarities and differences between biological sequences.(Y)
Positive scores are assigned to matching nucleotides or amino acids in scoring
models because they reflect the idea that identical or similar sequences in biological
molecules are biologically significant. A positive score encourages the alignment ,
algorithm to prioritize regions of similarity, helping to identify homologous sequences
and functional similarities between biological molecules.(Y)
Gap penalties in scoring models for sequence alignment are used to account for the
introduction of gaps (insertions or deletions) in the alignment. Gap penalties are
typically negative values. They discourage excessive gap creation, ensuring that the
,
alignment algorithm favors alignments with fewer gaps. This helps to maintain
biologically meaningful alignments by penalizing gaps that may not reflect true
evolutionary relationships.(Y)
The NCBI database, or the National Center for Biotechnology Information database,
serves as a central repository for a wide range of biological and genetic information.
Its primary purpose is to provide researchers, scientists, and the public with access
to data related to genetics, genomics, and other biological sciences. It hosts DNA and
,
protein sequences, genomic data, literature references, and tools for sequence
analysis. Researchers use NCBI to study genetic variations, conduct comparative
genomics, and access valuable information for various biological research
purposes.(Y)
GenBank is a critical component of the NCBI database. It is a repository for DNA and
RNA sequences submitted by researchers worldwide. The significance of GenBank lies
in its role as a comprehensive and freely accessible collection of genetic information.
Researchers can deposit their sequences into GenBank, allowing others to access and
,
use this data for various research purposes, including gene discovery, phylogenetic
studies, and understanding the genetic basis of diseases. GenBank promotes data
sharing, collaboration, and scientific advancement in the field of molecular biology
and genetics.(Y)
Users can search for specific sequences or genes in the NCBI database by using
various search tools and algorithms provided on the NCBI website. One of the most
commonly used tools is the Basic Local Alignment Search Tool (BLAST), which allows
users to input a query sequence and find similar sequences in the database. This
feature is essential because it enables researchers to quickly locate relevant genetic ,
information, identify homologous genes or sequences, and compare their data with
existing knowledge. It aids in the discovery of new genes, understanding genetic
relationships, and conducting various types of sequence analyses, all of which are
crucial in the field of bioinformatics and molecular biology.(Y)
Hidden Markov Models (HMMs) are used in pairwise sequence alignment to
statistically model the relationships between sequences. Their main purpose is to
identify the most probable alignment between two sequences while considering the
,
probabilistic nature of sequence evolution. HMMs are particularly useful when dealing
with sequences that have undergone complex evolutionary processes, such as
mutations, insertions, and deletions.(Y)
In Hidden Markov Models (HMMs) for sequence alignment, a \"hidden state\"
represents a hypothetical state or position in the alignment that is not directly
observed but is inferred probabilistically. These hidden states correspond to various
,
events, such as matching a residue, inserting a gap, or deleting a residue during the
alignment process. HMMs use these hidden states to model the underlying structure
and evolution of sequences.(Y)
The use of probabilities and statistical models, such as Hidden Markov Models
(HMMs), is important in pairwise sequence alignment because it allows for a more
realistic and accurate representation of sequence evolution. Probabilistic modeling
takes into account the uncertainties and variations in biological sequences, including
,
mutations and indels (insertions and deletions). This approach enables researchers to
make informed decisions about the most likely alignment and better handle complex
evolutionary scenarios, resulting in more reliable sequence alignments and improved
biological insights.(Y)
The primary goal of the Neighbor-Joining algorithm in phylogenetic tree construction
is to build a tree that represents the evolutionary relationships among a set of
biological sequences (e.g., DNA, proteins) or species. This algorithm aims to find a
tree topology that minimizes the total branch length, making it a distance-based ,
method. It iteratively joins neighboring branches (nodes) in a way that minimizes the
overall phylogenetic distance between the sequences or species, creating a balanced
and realistic tree representation of their evolutionary history.(Y)
In the Neighbor-Joining algorithm, a \"neighbor\" refers to a sequence or species that
is considered a potential candidate for joining with another neighbor to form a new
node (branch) in the phylogenetic tree. The significance of neighbors lies in their role
in building the tree iteratively. At each step, the algorithm selects the pair of
,
neighbors with the lowest joint branch length, indicating a close evolutionary
relationship. By iteratively joining neighbors, the algorithm constructs the
phylogenetic tree in a way that reflects the relationships among all sequences or
species.(Y)
The Neighbor-Joining algorithm involves the following key steps: Calculate a distance
matrix: Compute pairwise distances between all sequences or species based on some
evolutionary distance measure (e.g., genetic distances, substitution rates). Initialize
the tree: Start with a star-like tree, where all sequences or species are connected to
a central node. Identify neighbors: Identify the pair of sequences or species with the ,
lowest joint branch length in the current tree. Join neighbors: Create a new node
(branch) connecting the selected neighbors to the tree, and update the distances in
the distance matrix accordingly. Repeat: Repeat steps 3 and 4 until only one node
remains, forming the final phylogenetic tree.(Y)
In local alignment, instead of attempting to align the entire length of the sequences,
only the regions with the highest density of matches are aligned. This is useful for
identifying short conserved regions in protein or nucleotide sequences. Local
alignment programs are based on the Smith- Waterman algorithm. Local alignment
does not assume that two sequences in question have similarity over the entirement;
rather it only finds local regions with the highest level of similarities between the two
,
sequences and aligns these sequences without regard for the alignment of the rest of
the sequence regions. There are three primary methods for producing local
alignments, dot Matrix method. dynamic programming and word or k tuple method.
Goal: See whether a substring in one sequence aligns well with a substring in the
other. Application : 1. Searching for local similarities in large sequence (example
newly sequenced genome). 2. Searching conserved domains or motifs.(Y)
The top X and the left y axes of a rectangular array are used to represent the two
sequences to be compared. Calculation: Matrix • Columns = residues of sequence 1 •
Rows = residues of sequence 2. A dot is plotted at every co-ordinate where there is ,
similarity between the bases. Any region of similarity is revealed by a diagonal row of
dots. Isolated dots not on diagonal show random matches.(Y)
Several global sequence alignment tools are available for aligning pairs of sequences.
Some popular ones include: Needleman-Wunsch Algorithm: A dynamic programming
,
algorithm used for global sequence alignment. Smith-Waterman Algorithm: A
dynamic programming algorithm that is often used for local sequence alignment but
Bioinformatics is an interdisciplinary field that combines biology, computer science,
and mathematics to analyze and interpret biological data. It involves the
,
development and application of software tools and databases to store, retrieve, and
analyze large volumes of biological and genetic information. Here's a more detailed
The score is -18 and A higher score indicates better similarity between two
sequences and can mean that structurally,functionally they might be similar to each ,
other.Since they score quite low hence similarity between sequences is reduced.(Y)
Depending upon the region of comparison alignments are divided into local and
global alignment Global alignment: Global alignment is a method of comparing two
,
sequences, which aligns the entire length of the sequences by maximizing the overall
similarity. This method is used when comparing sequences that are of the same
Dot plot algorithm: • As an initial example for dot plots one can imagine the same
sequence written onto two strips of checkered paper. • Every symbol of the sequence
,
is written consecutively into one chequer, with its index number next to it. By
overlaying a frame containing a window that allows viewing exactly one symbol of
Hidden Markov Models (HMMs) are a class of probabilistic graphical models that allow
us to predict a sequence of unknown (hidden) variables from a set of observed
,
variables. A simple example of an HMM is predicting the weather (hidden variable)
based on the type of clothes that someone wears (observed). Hidden Markov models
Several key features of BLAST make it a widely used tool in bioinformatics. Some of
these are: • BLAST is fast and efficient, making it possible to handle large databases
,
of sequences. • It is a flexible and versatile tool as it can be used to search for
similarities in both nucleotide and protein sequences. • It is highly sensitive which
Step 1: Identifying Regions The first step is identifying regions with high
similarity by creating a lookup table for the query sequence. This step is
,
also called hashing step. To create the lookup table, the query sequence is
first broken down into smaller words known as k-tuples (ktup). When the
The basic steps in any phylogenetic analysis include: 1.Assemble and align a dataset
• The first step is to identify a protein or DNA sequence of interest and assemble a
,
dataset consisting of other related sequences. • DNA sequences of interest can be
retrieved using NCBI BLAST or similar search tools. • Once sequences are selected
• Clustering in Bioinformatics: In bioinformatics, researchers often deal with large
datasets containing biological entities like genes, proteins, or samples. Clustering
,
helps in identifying groups of entities with similar characteristics or expression
patterns, which can aid in understanding biological processes and relationships.
In the DBSCAN algorithm, core points play a crucial role in identifying clusters within
a dataset. Core points are data points that have at least a minimum number of other
,
data points (specified by the "minPts" parameter) within a certain distance (specified
by the "eps" parameter) from them. Significance of core points in the clustering
The main objective of the BIRCH (Balanced Iterative Reducing and Clustering using
Hierarchies) algorithm in data mining is to efficiently perform clustering on large
,
datasets by building a compact representation of the data while preserving the
underlying cluster structures. BIRCH achieves this objective through the following
DIANA (Divisive Analysis) is a clustering algorithm in data mining that focuses on
dividing a dataset into smaller and more homogeneous subsets in a top-down or
,
divisive manner. Here are the key steps involved in the DIANA clustering process:
Initialization: DIANA starts with the entire dataset considered as one cluster. Initially,
Hidden Markov Models (HMMs) are widely used in various fields to model and analyze
sequential data. The concept of state transitions is a fundamental aspect of HMMs. In
,
HMMs, a system is assumed to exist in a finite set of states, and transitions between
these states occur over time. Here, we'll delve into the significance of state
Systems biology is an interdisciplinary field that aims to understand biological
systems as a whole, rather than focusing solely on individual components. It
combines experimental data with computational modeling to unravel the complexity
of biological processes. One example of systems biology's application is in
,
understanding the circadian clock. Researchers have used mathematical models and
gene expression data to uncover the regulatory mechanisms behind the circadian
rhythm, demonstrating how systems biology can reveal the intricate interplay of
genes and proteins governing such processes.(Y)
The Central Dogma states that genetic information flows from DNA to RNA to
protein. DNA replication, transcription (DNA to RNA), and translation (RNA to
protein) are the three central processes. However, exceptions and modifications have
been identified, such as the discovery of reverse transcription in retroviruses, which ,
involves the synthesis of DNA from RNA. This exception led to the development of
the concept of the "reverse Central Dogma." These exceptions have implications for
understanding the diversity of genetic processes in biology.(Y)
Biological databases, particularly those storing sequence data, are essential for
researchers to access and analyze genetic information. For example, the NCBI
(National Center for Biotechnology Information) database contains vast amounts of
DNA and protein sequence data, as well as associated metadata. Researchers can
,
utilize these databases to perform sequence alignments, identify genes, study genetic
variation, and conduct phylogenetic analyses. These resources enable scientists to
explore genetic relationships, investigate functional elements, and contribute to
advances in fields like genomics and proteomics.(Y)
Microarray experiments involve hybridizing RNA samples to microarray chips
containing thousands of gene probes, allowing simultaneous measurement of gene
expression levels. They have significantly advanced our understanding of gene
regulation by enabling the study of gene expression on a genome-wide scale.
,
Researchers can identify differentially expressed genes under various conditions,
discover biomarkers, and uncover regulatory networks. This technology has been
instrumental in fields such as cancer research, where it has helped identify genes
associated with specific cancer subtypes and potential therapeutic targets.(Y)
The NCBI database is a central hub for biological information, providing researchers
with access to a wide range of data and tools. It offers DNA and protein sequence
data, genomic information, literature references, and bioinformatics tools.
Researchers can use the NCBI's BLAST tool to search for sequence similarities,
retrieve genomic sequences, and access the PubMed database for literature searches.
,
Additionally, the NCBI hosts resources like GenBank, which stores DNA sequences,
and the Gene Expression Omnibus (GEO), which houses gene expression data. These
resources are invaluable for conducting comparative genomics, studying gene
function, and exploring gene expression patterns across various biological
contexts.(Y)
The Neighbor Joining Algorithm is used for constructing phylogenetic trees from
distance matrices. It involves the following steps: Calculate the pairwise distance
matrix between all taxa. Initialize a tree with each taxon as a leaf node. Find the pair
of taxa with the lowest branch length in the distance matrix. Create a new internal
node connected to the two selected taxa, and update the tree. Recalculate the
,
branch lengths between the new internal node and the remaining taxa. Repeat steps
3-5 until a single tree remains. The Neighbor Joining Algorithm is used in
evolutionary biology to construct phylogenetic trees representing the evolutionary
relationships between species or genes. It is particularly useful when dealing with
large datasets and when accurate tree topology is needed for further analysis.(Y)
Local alignment focuses on finding the most similar subsequences within two
sequences, allowing for gaps at the beginning and end. It is suitable for identifying
conserved regions within larger sequences. Global alignment, on the other hand,
aligns the entire length of two sequences, even if they are dissimilar in some regions. ,
For example, global alignment is used when comparing homologous proteins or
genes across species to study their overall similarity. Local alignment is employed to
identify protein domains or functional motifs within a protein sequence.(Y)
A scoring model assigns numerical values to different alignment choices to evaluate
the quality of an alignment. It typically includes match scores for identical residues,
mismatch penalties for non-identical residues, and gap penalties for introducing gaps
in the alignment. The choice of scoring model affects the alignment results; for ,
example, higher gap penalties favor fewer gaps in alignments, while lower mismatch
penalties encourage alignments with more mismatches. Scoring models are critical in
guiding the alignment algorithm to find biologically meaningful alignments.(Y)
Hidden Markov Models (HMMs) are probabilistic models that can represent the
underlying structure of biological sequences. In pairwise sequence alignment, HMMs
are used to model the probabilistic relationships between sequences. They handle
gapped alignments by allowing for insertions and deletions within the model states.
,
For example, HMMs are employed in profile-profile alignments to compare protein
families. HMMs represent the evolutionary information of protein families, and
aligning two HMMs allows for sensitive and accurate comparisons, even in cases of
distant homology.(Y)
Motif finding involves identifying recurring patterns (motifs) within DNA sequences
that have functional significance, such as transcription factor binding sites. Motif
models consist of position weight matrices (PWMs) that represent the probability of
each nucleotide at each position in the motif. To discover motifs, algorithms like ,
MEME search for overrepresented motifs in a set of sequences. For example, the
TATA box is a well-known motif in DNA sequences, crucial for initiating transcription
in eukaryotes.(Y)
To find occurrences of known motifs, bioinformaticians use tools like FIMO that scan
DNA sequences for matches to a given motif model. Motif searching is crucial for
understanding gene regulation because it helps identify potential transcription factor
,
binding sites and regulatory elements. By pinpointing where known motifs occur in
genomic sequences, researchers can gain insights into how genes are controlled and
their roles in cellular processes.(Y)
Discovering new motifs is challenging due to the vast sequence space and the
potential diversity of motifs. Strategies involve motif-finding algorithms like Gibbs
Sampling that search for statistically significant motifs in sequences. Discovering
,
novel motifs can provide insights into previously unknown regulatory elements,
protein-binding sites, and functional regions in the genome, leading to discoveries
about gene regulation and biological processes.(Y)
CHAMELEON is a hierarchical clustering algorithm designed to discover cluster
hierarchies in large and complex datasets. It uses a combination of similarity
measures, hierarchical clustering, and graph partitioning techniques. CHAMELEON
can identify clusters at different scales, making it useful for datasets with nested or ,
overlapping clusters. For example, in social network analysis, CHAMELEON can
uncover communities at various levels of granularity within a network, providing a
multi-scale view of the data.(Y)
The k-means algorithm is an iterative clustering technique that aims to partition a
dataset into 'k' clusters. The key steps include initializing cluster centroids, assigning
data points to the nearest centroid, updating centroids based on assigned points, and
,
repeating these steps until convergence. Strengths of k-means include simplicity and
scalability, but it assumes that clusters are spherical and equally sized, making it
sensitive to initializations and outliers.(Y)
The k-medoid algorithm is a variant of k-means that uses actual data points
(medoids) to represent cluster centers. It differs from k-means in that it is less
sensitive to outliers and works well with non-spherical clusters. It selects the data
,
point that minimizes the sum of distances to other points in the same cluster as the
medoid. K-medoid clustering is advantageous when dealing with datasets where
outliers can significantly impact clustering results.(Y)
A rooted tree with A,B in one clade abd c,d in another clade(Y) ,
Sequence identity is the amount of characters which match exactly between two
different sequences. Hereby, gaps are not counted and the measurement is relational
to the shorter of the two sequences. This has the effect that sequence identity is not
transitive, i.e. if sequence A=B and B=C then A is not necessarily equal C (in terms of
,
the identity distance measure) : A: AAGGCTT B: AAGGC C:AAGGCAT Here
identity(A,B)=100% (5 identical nucleotides). Identity(B,C)=100%, but
identity(A,C)=85% ((6 identical nucleotides / 7)). So 100% identity does not mean
two sequences are the same. (Y)