0% found this document useful (0 votes)

176 views50 pages

Biological Database 1

Biological databases are collections of biological information organized so they can be easily accessed and updated. They store data from experiments, literature, and computational analysis. The key types are primary databases containing original sequence data, secondary databases with additional annotation, and composite databases combining data from multiple primary sources. Major databases include GenBank, EMBL, DDBJ for nucleic acid sequences and PIR, SWISS-PROT, and TrEMBL for protein sequences, which collaborate internationally for data sharing and standardization.

Uploaded by

Muhammad uzair

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

176 views50 pages

Biological Database 1

Uploaded by

Muhammad uzair

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Biological Databases

Databases
• A database is a collection of information that is organized so that it can be easily
accessed, managed and updated

• The db are located at different places

• They exchange information on a daily basis so that they are up-to-date

Biological databases
• Biological databases are libraries of life sciences information, collected from
scientific experiments, published literature, high-throughput experiment
technology, and computational analysis
• Stores biological data in electronic form
Purpose
• Systemization of database
• Availability of biological data
• Analysis of computed biological data
THE ‘PERFECT’ DATABASE
• Comprehensive, but easy to search

• Annotated

• A simple, easy to understand structure

• Cross-referenced

• Minimum redundancy

• Easy retrieval of data

HISTORY
• Insulin, first protein that was sequenced; composed of 55 amino acid
• The sequence was published in “Atlas Of Protein Sequence” in 1965 by Margaret
Day Hoff
• Became base for PIR database
• First nucleotide sequenced was of Yeast tRNA, composed of 77 bp
• First organism whose genome was sequenced, a free living virus Haemophilus
influenzae in 1995 by Craig Venta
CLASSIFICATION
Types of database
Primary database
• It is also called sequence data
• It gives information about sequence of DNA nucleotides or protein amino acids
Secondary database
• Store secondary structure information or results of searches of the primary data
base
Composite database
• They compile and filter sequence data from different primary databases to
produce combined non-redundant sets that are more complete than the individual
databases
SEQUENCE DATABASES
• Sequence Databases are classified as:
 Genome sequence databases
 Nucleic acid sequence databases
 Protein sequence databases
 Amino acid sequence databases
• SD’s also fall into three database categories:
 Primary databases
 Secondary databases
 Composite databases
FUNDAMENTAL ELEMENTS OF SEQUENCE DATABASES
• All of the following elements represent the “ideal minimal content of annotation entry in a
Sequence Database”
 Name :LOCUS, ENTRY, ID all unique identifiers
 Definition: A brief, one-line, textual sequence description
 Accession: A constant data identifier
 Version
 Gene identifier (GI)
 Comments & Keywords
 Source
 Organism & Taxonomy Information
 Literature References
 Features table
 Base count & Origin
 The Sequence itself
FUNDAMENTAL ELEMENTS OF
SEQUENCE DATABASES
• The LOCUS field: It consists of five different subfields, namely:
 1a Locus Name (e.g. HSHFE) - It is a tag for grouping similar sequences. The first two or three letters usually
designate the organism. In this case HS stands for Homo sapiens. The last several characters are associated
with another group designation, such as gene product. In this example, the last three digits represent the gene
symbol, HFE. Currently, the only requirement for assigning a locus name to a record is that it is unique
 1b Sequence Length (12146 bp) – It is the total number of nucleotide base pairs (or amino acid residues) in
the sequence record
 1c Molecule Type (e.g. DNA) - Type of molecule that was sequenced. All sequence data in an entry must be
of the same type
 1d GenBank Division (PRI) - GenBank has different divisions. In this example, PRI stands for primate
sequences. Other divisions include ROD (rodent sequences), MAM (other mammal sequences), PLN (plant,
fungal, and algal sequences), & BCT (bacterial sequences)
 1e Modification Date (23-July-1999) - Date of most recent modification made to the record. The date of first
public release is not available in the sequence record. This information can be obtained only by contacting
NCBI at info@ncbi.nlm.nih.gov
FUNDAMENTAL ELEMENTS OF
SEQUENCE DATABASES
DEFINITION:
 It is a brief description of the sequence
 The description may include source organism name, gene or protein name, or designation as un-
transcribed or untranslated sequences (e.g., a promoter region)
 For sequences containing a coding region (CDS), the definition field may also contain a
“completeness” qualifier such as "complete CDS" or "exon 1”

ACCESSION (Z92910):
 It is a unique identifier assigned to a complete sequence record
 This number never changes, even if the record is modified
 An “accession number” is a combination of letters and numbers that are usually in the format of one
letter followed by five digits (e.g., M12345) or two letters followed by six digits (e.g., AC123456)
FUNDAMENTAL ELEMENTS OF
SEQUENCE DATABASES
VERSION (Z92910.1) :
 It is an identification number assigned to a single, specific sequence in the database
 This number is in the format “accession.version”
 If any changes are made to the sequence data, the version part of the number will increase by one
 E.g. U12345.1 becomes U12345.2
 A version number of Z92910.1 for this HFE sequence indicates that the sequence data has not been
altered thus it is an original submission
Gene Identifier (GI) (1890179) :
 Also a sequence identification number
 Whenever a sequence is changed, the version number is increased and a new GI is assigned
 If a nucleotide sequence record contains a protein translation of the sequence, the translation will have
its own GI number
FUNDAMENTAL ELEMENTS OF
SEQUENCE DATABASES
KEYWORDS (haemochromatosis; HFE gene) :
A “keyword” can be “any word or phrase used to describe the sequence”
SOURCE (human):
Usually contains an abbreviated or common name of the source organism
ORGANISM (Homo sapiens) :
The scientific name (usually genus & species) & phylogenetic lineage
REFERENCE :
It is a citation of publications by sequence authors that supports information
presented in the sequence record
The FEATURES Table
FUNDAMENTAL ELEMENTS OF
SEQUENCE DATABASES
BASE COUNT:
Base Count gives the total number of adenine (A), cytosine (C), guanine (G), and
thymine (T) bases in the sequence
ORIGIN:
Origin contains the sequence data, which begins on the line immediately below the
field title
Major Primary DB
Nucleic Acid Protein
EMBL (Europe) PIR -
Protein Information Resource

GenBank (USA) MIPS

DDBJ (DNA Data Bank of Japan) SWISS-PROT
University of Geneva, now with EBI

TrEMBL
A supplement to SWISS-PROT

NRL-3D
TYPES OF NUCLEIC ACID
DATABASES
PRIMARY NUCLEIC ACID DATABASES:
 Contain complete annotations of all the nucleic acid sequence information of
organisms whose genomes have been successfully sequenced
 Examples include GenBank, DDBJ and EMBL
International Nucleotide Sequence Database
Collaboration (INSDC)
• These 3 combined make-up the International Nucleotide Sequence Database
Collaboration (INSDC)
International Nucleotide Sequence Database
Collaboration (INSDC)
INSDC is a synchronization of GenBank, DDBJ and EMBL databases
• Properties of INSDC include:
 Consistent Accession numbers;
 No legal restrictions. Although there are some patented sequences stored and
managed
 Holds both sequences submitted directly by scientists and genome sequencing
groups & sequences taken from literature & patents
 Has very limited error checking thus there is a fair amount of redundancy
 Access is provided via ftp & www interfaces
 Sequences are listed in the 5’-3’ orientation
DNA Data bank of Japan
Overview
DNA Data Bank of Japan (DDBJ)
The DNA Data Bank of Japan is a biological database that collects DNA sequences
Collects and supplies DNA data since its inception in 1986
Data entry as in GenBank
It is also a member of the International Nucleotide Sequence Database
Collaboration or INSDC
DDBJ exchanges data via the SINET3 computer network
European Molecular
Biology Laboratory
Overview
European Molecular Biology Laboratories
(EMBL)
It is a comprehensive database of DNA and RNA sequences collected from the scientific
literature and patent applications and directly submitted from researchers and sequencing groups
Data collection is done in collaboration with GenBank (USA) and the DNA Database of Japan
(DDBJ)
It doubles in size every 18 months and as of June 1994 it contained nearly 2 million bases from
182,615 sequence entries
It is maintained by the European Bioinformatics Institute (EBI)
Data entry is friendly both to computers and humans
Standard English used (explanations, descriptions etc)
Sequences are stored in the database as they would occur in the biological state
T H E N AT I O N A L C E N T E R F O R
B I O T E C H N O L O G Y I N F O R M AT I O N

Bethesda, D
M

Created in 1988 as a part of the

National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
WEB ACCESS: WWW.NCBI.NLM.NIH.GOV

New
New pages!
pages!

New Homepage
Common
Common footer
footer
NCBI DATABASES AND SERVICES
• GenBank primary sequence database
• Free public access to biomedical literature
 PubMed free Medline (3 million searches per day)
 PubMed Central full text online access
• Entrez integrated molecular and literature databases
PubMed

• PubMed is a free search engine accessing primarily the MEDLINE database of

references and abstracts on life sciences and biomedical topics

• The United States National Library of Medicine (NLM) at the National Institutes
of Health maintains the database as part of the Entrez system of information
retrieval
Pubmed: click on the drop down menu select the
pubmed option. Type any topic which you want to find
in the search box
After typing the topic of our interest lots of research papers w ill appear
on window from where we select the specific papers for our study
Entrez : A retrieval system
• Capable of accessing integrated
information by searching many
of the NCBI databases with just
one query

• Instead of searching only one

database per query, then
repeating the same query to find
information on the same topic
from another NCBI database
Genetic Sequence Databank
• GenBank is one of the fastest growing repositories of known genetic sequences
• text file, readable & downloadable
• It is maintained by the National Center for Biotechnology (NCB)
• Entry data contains information on:
 The sequence
 Accession numbers
 The scientific and gene names
 Taxonomy/phylogenetic classification of the source organism
 A feature that identifies coding regions
 References to published literature
 Transcription units
 Mutation sites
TYPES OF NUCLEIC ACID DATABASES
• SECONDARY NUCLEIC ACID DATABASES

 They contain additional information derived from analysis of data available in

primary repositories

 They deal with particular classes of sequences

 Examples include UniGene, the HIV sequence database and REBASE

SECONDARY NUCLEIC ACID DATABASES
• UniGene
 It has records with unique gene clusters
 Each cluster contains: sequences that represent a unique gene and related
information e.g tissue types in which the gene have been expressed
 The database is populated with Expressed Sequence Tags (EST’s)

• HIV SEQUENCE DATABASE

 The HIV Sequence Database (HSD) collects, curates & annotates HIV sequence
data.
Protein Sequence Database
PROTEIN SEQUENCE DATABABES
• They consists of:

 All the proteins that have been translated from the RNA sequences and Protein
sequenced

• Three (3) types of protein sequence databases exist:

 Primary protein databases

 Secondary protein databases

 Composite protein databases

PRIMARY PROTEIN SEQUENCE
DATABASES
• Primary protein sequence databases are:
 SWISS-PROT
 PIR (protein information resource)
• Both SWISS-PROT & PIR are curated
This means groups of designated curators (database managers) prepare the entries
from literature and/ or contacts with external experts prior to submission into the
respective databases
Swiss-Prot
• It provides high level notations describing:
 Functions of a protein
 Protein domain structure
 Post-translational modifications
 Protein variants and other variables
• It also provides a minimum level of redundancy & a high level of integration with
other databases
• It has legal restrictions in that entries are copyrighted, but freely accessible and
usable by academic researchers
PROTEIN INFORMATION RESOURCE
(PIR)
• It is a division of the National Biomedical Research Foundation (NBRF) in the US
• It is a database that produces the NRL-3D (a database of sequences extracted from the
three dimensional structures in the Protein Databank (PDB))
• It’s existence allows sequence information in PDB to be available for similarity searches
& retrieval & provides cross reference information for use with other PIR Protein
Sequence databases
• It provides comprehensive, well organized, & accurate information about proteins such as
sequence similarity
SECONDARY PROTEIN SEQUENCE
DATABASES
• Major examples of Secondary protein sequence databases are:
 TrEMBL
 Prosite
 Pfam
 TrEMBL:
• TrEMBL stands for Translation of EMBL nucleotide sequence database
• It is a computer-annotated supplement of SWISSPROT
• It contains all translations of EMBL nucleotide sequence entries not yet integrated in SWISS-
PROT
• TrEMBL speeds new sequence information to the public
SECONDARY PROTEIN SEQUENCE
 PROSITE
DATABASES
• It is a database of protein families and domains

• It consists of biologically significant sites, patterns and profiles that help to reliably identify to
which known protein family (if any) a new sequence belongs

• It is part of and is maintained much like Swiss-Prot

• It is based on regular expressions describing characteristic sub-sequences of specific protein

families or domains

 Pfam
• It is a database of protein families defined as domains
• It can be searched and used to identify domains in sequence
• It is licensed under the GNU General Public License making it available to anyone
COMPOSITE DATABASES
• They compile and filter sequence data from different primary databases to
produce combined non-redundant sets that are more complete than the individual
databases
• An example of a composite database is OWL
• OWL combines 4 publicly available primary sources:
 SWISS-PROT
 PIR,
 GenBank
 NRL-3D
Databases
Sequence database a) Nucleotide database : GenBank,
EMBLBank
b) Protein database: Swiss-Prot, PIR
Structure database PDB, NDB, DALI, MSD
Microarray database ArrayExpress, MIAME
Chemical database PubChem
Pathway database KEGG, BioSilico
Enzyme database ExPASy, REBASE
Disease database OMIM, OMIA
Literature database PubMed, ScoPUS
Your Assignment
• What is curated data?
• What is patent?
• What is domain and motif?
• How many formats in which nucleotide and protein sequences were downloaded?
• What is FASTA and GFF formats?
• What is redundancy?
• What is GRCh38?
Practical
Genes
• AMY2B • SGLT3
• MGAM • ARID1B
• GP2 • CRYM
• TAS2R38 • FRMD6
• ACMSD • GALR1
• METAP2 • GPR139
• FABP5 • GRIK3
• SGLT1
Select one gene
• Go to NCBI and collect following information for human:

Gene full name

 Gene ID
 Gene Type
Gene Location
Number of Exon
Number of A,T,G,C bases
Length of Gene
Download gene, mRNA and protein sequence in FASTA format

ChatGPT Premium Guide
67% (3)
ChatGPT Premium Guide
152 pages
Flash BTC Sender Guide
No ratings yet
Flash BTC Sender Guide
3 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
40 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Almera n16 Europa Em-K9k
No ratings yet
Almera n16 Europa Em-K9k
98 pages
Bigmart PDF
No ratings yet
Bigmart PDF
20 pages
Defecte Multiplexare
No ratings yet
Defecte Multiplexare
22 pages
7 Linkage Mapping
100% (2)
7 Linkage Mapping
86 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
The Marxist Approach in Comparative Politics
75% (4)
The Marxist Approach in Comparative Politics
2 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Substitution Matrix
No ratings yet
Substitution Matrix
10 pages
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
No ratings yet
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
2 pages
Genome Annotation and Tools
No ratings yet
Genome Annotation and Tools
20 pages
BLAST
100% (1)
BLAST
4 pages
(Corus) SHS Jointing - Flowdrill and Hollo-Bolt
No ratings yet
(Corus) SHS Jointing - Flowdrill and Hollo-Bolt
13 pages
Molecular Assisted Selection in Plant Breeding Programs
No ratings yet
Molecular Assisted Selection in Plant Breeding Programs
48 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Bi 341 Chapter 1 The Genetic Code of Genes and Genomes & Introduction - KB
No ratings yet
Bi 341 Chapter 1 The Genetic Code of Genes and Genomes & Introduction - KB
76 pages
Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
Gucci Strategic MGT
0% (1)
Gucci Strategic MGT
18 pages
Lecture12 Functional Pathway Analysis
No ratings yet
Lecture12 Functional Pathway Analysis
13 pages
Unit1 - Bioinformatics (KBT-603)
No ratings yet
Unit1 - Bioinformatics (KBT-603)
91 pages
Primer Design For PCR Assignment
100% (1)
Primer Design For PCR Assignment
5 pages
Blast
100% (1)
Blast
21 pages
Lecture 3: DNA Recombination
100% (2)
Lecture 3: DNA Recombination
22 pages
Construction of Phylogenetic Tree.
No ratings yet
Construction of Phylogenetic Tree.
4 pages
Selection of rDNA
100% (1)
Selection of rDNA
31 pages
FD ch5 PPT Hull
No ratings yet
FD ch5 PPT Hull
37 pages
Manual PDF
100% (1)
Manual PDF
53 pages
Bioinformatics Tutorial 2019
No ratings yet
Bioinformatics Tutorial 2019
54 pages
Creating Phylogenetic Trees With Mega: Prat Thiru
100% (1)
Creating Phylogenetic Trees With Mega: Prat Thiru
18 pages
Primer Design Exercise
No ratings yet
Primer Design Exercise
34 pages
Gene Prediction
No ratings yet
Gene Prediction
50 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Group # 13
No ratings yet
Group # 13
49 pages
FINM7402 Case Study Questions
No ratings yet
FINM7402 Case Study Questions
6 pages
Bi0505 Lab
No ratings yet
Bi0505 Lab
102 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Wiring Diagram: Security Control System
No ratings yet
Wiring Diagram: Security Control System
1 page
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
11 pages
Chapter 12 Molecular Markers
No ratings yet
Chapter 12 Molecular Markers
39 pages
Topswitch - GX: Flyback Quick Selection Curves
No ratings yet
Topswitch - GX: Flyback Quick Selection Curves
12 pages
Tutorial For Proteome Data Analysis Using The Perseus Software Platform
No ratings yet
Tutorial For Proteome Data Analysis Using The Perseus Software Platform
22 pages
7.1 Linkage and Crossing Over
No ratings yet
7.1 Linkage and Crossing Over
34 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Molecular Marker
No ratings yet
Molecular Marker
3 pages
Legal Memorandum Kcedited
No ratings yet
Legal Memorandum Kcedited
8 pages
FASTA
No ratings yet
FASTA
33 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Single Nucleotide Polymorphism Analysis
No ratings yet
Single Nucleotide Polymorphism Analysis
34 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
100% (1)
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
66 pages
Unit 5-Introduction To Biological Databases
No ratings yet
Unit 5-Introduction To Biological Databases
14 pages
Cibse Ken Dale Award Report 2020 2022 John Smyth
No ratings yet
Cibse Ken Dale Award Report 2020 2022 John Smyth
213 pages
PFAM Database
No ratings yet
PFAM Database
22 pages
PCR Based Molecualr, Genetic Markers
No ratings yet
PCR Based Molecualr, Genetic Markers
59 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
FSSC 22000 Ing 2022
No ratings yet
FSSC 22000 Ing 2022
2 pages
Omics Technology: October 2010
No ratings yet
Omics Technology: October 2010
28 pages
Biological Databases: - Bio-Informatics
No ratings yet
Biological Databases: - Bio-Informatics
16 pages
Ieee 484-02
No ratings yet
Ieee 484-02
23 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Nop 180
No ratings yet
Nop 180
2 pages
Internship Report
No ratings yet
Internship Report
38 pages
DNA Sequencing at 40 - Past Present and Future
No ratings yet
DNA Sequencing at 40 - Past Present and Future
10 pages
LSM2241 Practical 4: Introduction To BLAST
No ratings yet
LSM2241 Practical 4: Introduction To BLAST
12 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Insilico Gene Analysis
No ratings yet
Insilico Gene Analysis
34 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
RocGwalior863 12072017
No ratings yet
RocGwalior863 12072017
16 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Bioinformatics Exercises Print
No ratings yet
Bioinformatics Exercises Print
6 pages
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
Mid Semester Theory Exam17079936871961
No ratings yet
Mid Semester Theory Exam17079936871961
17 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
Omics
No ratings yet
Omics
6 pages
Test Automation
No ratings yet
Test Automation
1 page
DRRM Minutes DSV, ZMDV
No ratings yet
DRRM Minutes DSV, ZMDV
17 pages
Chief Financial Officer CFO in Los Angeles CA Resume David Yodkovik
No ratings yet
Chief Financial Officer CFO in Los Angeles CA Resume David Yodkovik
2 pages
Upfile - 0 Tybsclab Semii cs349 5b75428743344
No ratings yet
Upfile - 0 Tybsclab Semii cs349 5b75428743344
35 pages
How To Add or Remove An Employee
No ratings yet
How To Add or Remove An Employee
4 pages
Seminars - 09-12-2022 - Vanessa AQUINO CHAVES
No ratings yet
Seminars - 09-12-2022 - Vanessa AQUINO CHAVES
3 pages
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
No ratings yet
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
9 pages
PT205 Hydrogen Sulfide
No ratings yet
PT205 Hydrogen Sulfide
2 pages
ZKTeco-Quốc - Phone 0904848459
No ratings yet
ZKTeco-Quốc - Phone 0904848459
10 pages
Sutton Construction Inc Is A Privately Held Family Founded Corporation That
No ratings yet
Sutton Construction Inc Is A Privately Held Family Founded Corporation That
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Biological Database 1

Uploaded by

Biological Database 1

Uploaded by

Biological Databases

• The db are located at different places

• They exchange information on a daily basis so that they are up-to-date

• A simple, easy to understand structure

• Easy retrieval of data

GenBank (USA) MIPS

Created in 1988 as a part of the

• PubMed is a free search engine accessing primarily the MEDLINE database of

• Instead of searching only one

 They contain additional information derived from analysis of data available in

 They deal with particular classes of sequences

 Examples include UniGene, the HIV sequence database and REBASE

• HIV SEQUENCE DATABASE

• Three (3) types of protein sequence databases exist:

 Primary protein databases

 Secondary protein databases

 Composite protein databases

• It is part of and is maintained much like Swiss-Prot

• It is based on regular expressions describing characteristic sub-sequences of specific protein

Gene full name

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.