0% found this document useful (0 votes)

22 views4 pages

2.3 - History of Biological Databases

Biological databases are categorized into sequence and structure databases, with significant milestones dating back to the 1960s, including the development of the first protein sequence database by Margaret Dayhoff. The Human Genome Project, launched in 1990, marked a pivotal moment for bioinformatics, leading to advancements in computational methods for biological analysis. Today, bioinformatics encompasses various fields, utilizing AI and machine learning to analyze large datasets and improve discoveries in biology and medicine.

Uploaded by

chawanganeshofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views4 pages

2.3 - History of Biological Databases

Uploaded by

chawanganeshofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Biological databases can be broadly classified into sequence and structure databases.

Sequence databases are applicable to both nucleic acid sequences and protein sequences,
whereas structure database is applicable only to proteins. The first database was created within a
short period after the insulin protein sequence was made available in 1956. Incidentally, insulin
is the first protein to be sequenced. The sequence of insulin consisted of just 51 residues. Around
mid 1960s, the first nucleic acid sequence of Yeast tRNA with 77 bases was determined. During
this period, three dimensional structures of proteins were studied and the well known Protein
Data Bank (PDB) was developed as the first protein structure database with 10 entries in 1972.
This now has grown into a large database with over 10,000 entries. While the initial databases of
protein sequences were maintained at the individual laboratories, the development of a
consolidated formal database known as SWISS-PROT protein sequence database was initiated in
1986, which now has about 70,000 protein sequences from more than 5,000 model organisms, a
small fraction of all known organisms. These huge varieties of divergent data resources are now
available for study and research by both academic institutions and industries. These are made
available as public domain information in the larger interest of research community through
internet (www.ncbi.nlm.nih.gov) and CD ROMs (on request from www.rcsb.org). These
databases are constantly updated with additional entries.

The practice of bioinformatics can be traced as far back as the 1960s. This is when
Margaret Oakley Dayhoff, who is sometimes referred to as the mother of bioinformatics,
developed a computer program to aid in the determination of protein sequences. Dr. Dayhoff
developed the one-letter amino acid codes to make sequences easier to input into a computer
using punch cards. Her single-letter codes are still used to this day.
The actual term of “Bioinformatics” has been around since at least as early as 1970, when Ben
Hesper and Pauline Hogeweg used it to describe “the study of informatic processes in biotic
systems”. From then through the 1980s, however, the concept of bioinformatics shifted away
from generally describing biochemical networks to become synonymous with sequence analysis
using algorithms to compare data. In this phase of its history, two of the most important
contributors were Elvin Kabat and Tai Te Wu. They collected and aligned amino acid sequences
from humans and mice. Kabat and Wu, “used a simple mathematical formula to calculate the
various amino acid substitutions at each position and predict the precise locations of segments of
the [protein]”. Their database was released in print throughout the late 70s and 80s until it
became so expansive that it was impossible to print.

By the end of the 90s, bioinformatics became known as the use of computational methods
for comparative analysis in biology. This is more in line with today’s definition, but in the 90s
sequence analysis was still the major focus — largely because bioinformatics gained public
attention during the Human Genome Project (HGP). An argument can be made that the HGP was
a springboard for bioinformatics as the study became a dramatic scientific race. The HGP was
initiated in 1990 as a publicly funded project. With the technology of the time, sequencing all 3
billion base pairs in the human genome was a huge challenge! Scientists had to map a gene,
sequence it in small segments, and reconstruct the sequences into a whole using the map. Suffice
it to say, it was a slow process! A privately-owned company called Celera arose to compete with
the public project in ’98. Headed by Dr. J. Craig Venter, Celera was a biotech company that used
computational methods to automatically match the overlapping sections of sequences — no more
mapping or slow, grueling human assembly. This is what bioinformatics is all about!

In the years since the completion of the HGP, the use of computers in biological research
has only increased. Bioinformatics has grown to encompass a huge variety of fields, from
immunology to cardiology to neuroscience and more. People working in all of these fields use
computer science to advance our understanding of life science every day! As bioinformaticians
do the work to hitch progress in biochemistry and medicine to the rapid pace of improvements in
computer processing power, we have begun to approach a world where medical science
improves at pace with Moore’s law.

With that, we have arrived at our answer, at least as it is understood today: bioinformatics
is the creation, advancement, and understanding of immense sets of data using mathematical and
computational techniques, in order to improve the quality and pace of new discoveries.

Major milestones in bioinformatics

Over the past few decades, numerous milestones in bioinformatics have significantly impacted
our understanding of biology and the development of new therapies and treatments.
 1965: Margaret Dayhoff developed the first protein sequence database, which was called
the Atlas of Protein Sequence and Structure. This was a major step towards
understanding the relationship between protein structure and function.
 1970: Saul B. Needleman and Christian D. Wunsch published the first sequence
alignment method to align and compare protein and nucleotide sequences.
 1971: The RCSB Protein Data Bank.
 1977: Frederick Sanger developed a rapid method for determining the base sequence of
DNA. This was the first time that DNA sequencing had been automated, and it paved the
way for the Human Genome Project.
 1981: The Smith-Waterman sequence alignment algorithm, useful in identifying regions
of similarity that may indicate functional, structural, or evolutionary relationships
between two sequences.
 1982: GenBank, a database of nucleotide sequences, created in 1982 by the National
Institutes of Health (NIH) as a way to store and share genetic information.
 1984: The PIR-International Protein Sequence Database.
 1990: The Human Genome Project was launched. This ambitious project aimed to
sequence the entire human genome, and it was completed in 2003.
 1996: The first proteomics database, SWISS-PROT, was created. This database contains
information about protein sequences, functions, and structures.
 Late 1990s and early 2000s: The field of metagenomics was established. This field
focuses on studying the genetic material of entire microbial communities, rather than just
individual organisms.
 2001: The first draft of the human genome was published. This was a major breakthrough
in our understanding of human biology, and it opened up new avenues for research and
drug development.
 2002: UniProt protein sequence database.
 2010: The first synthetic genome was created. This was a landmark achievement in the
field of synthetic biology, and it paved the way for the creation of new organisms with
custom-designed genomes.
 2012: The CRISPR-Cas9 system was discovered. This revolutionary technology allows
scientists to edit genomes with unprecedented precision and accuracy.
 2023: The integration of artificial intelligence (AI) and machine learning (ML) into
bioinformatics tools and workflows is revolutionizing the field. AI and ML are being
used to analyze large datasets, predict protein structures, and develop new drugs.

The RCSB Protein Data Bank is a resource that provides access to information about the three-
dimensional structures of proteins, nucleic acids, and complex assemblies. It was established in
1971 as a repository for structural data and has since grown to contain over 150,000 structures.
The database provides a valuable resource for researchers studying the structure and function of
biological macromolecules and has played a vital role in advancing the field of structural
biology. In recent years, the database has also begun incorporating drug discovery and design
data, making it an essential tool for the pharmaceutical industry.
In 1982, GenBank, a database of nucleotide sequences, was created by the National Institutes of
Health (NIH) to store and share genetic information. However, Walter Goad started the database
at Los Alamos National Laboratory. Today, the database contains millions of sequences from a
wide range of organisms, and it has been an essential tool for researchers in the field of
bioinformatics.
GenBank has undergone significant changes since its creation. In the early days, the database
was maintained manually, with researchers submitting their sequences on paper forms. However,
as the number of submissions grew, this approach became impractical.
In 1986, the NIH began accepting electronic submissions; by 1988, the entire GenBank database
was available in electronic form. The database continued to grow in the following years, and new
features were added to make it easier to search and analyze the data.
In 1992, GenBank was made available over the internet, which made it accessible to researchers
all over the world. Access over the internet was a significant milestone in the history of
bioinformatics, as it allowed researchers to share and access genetic information more efficiently
than ever before.
Since then, GenBank has continued to evolve, with new data types being added and new tools
being developed to analyze the data. Today, it remains one of the most important resources for
researchers in bioinformatics, and it continues to play a critical role in advancing our
understanding of genetics and genomics.
The PIR-International Protein Sequence Database was one of the earliest databases established in
1984. It was an essential milestone in bioinformatics, allowing researchers to analyze and
compare protein sequences on a large scale. The database was later incorporated into the UniProt
Knowledgebase, which is now one of the world's most widely used protein sequence databases.
UniProt, which stands for Universal Protein Resource, is a comprehensive protein database
created in 2002 by merging three separate databases: the Swiss-Prot, TrEMBL, and PIR-PSD.
Swiss-Prot was initially created as a protein sequence database in 1986 by Amos Bairoch and his
team at the University of Geneva. TrEMBL, on the other hand, was a computer-annotated
supplement to Swiss-Prot that was created in 1996. Swiss-Prot and TrEMBL were later merged
with the PIR-PSD database to create UniProt.
Today, UniProt is one of the largest protein databases in the world, containing information on
millions of proteins from a wide range of species. Researchers in the field of bioinformatics
widely use it for a variety of applications, including protein identification, characterization, and
annotation. UniProt also provides many tools and resources to help researchers analyze and
interpret protein data, making it an invaluable resource for the scientific community.

Advances in Bioinformatics (Springer, 2021)
50% (2)
Advances in Bioinformatics (Springer, 2021)
446 pages
Blast
100% (1)
Blast
21 pages
D. Higgins, Willie Taylor Bioinformatics Sequence, Structure and Databanks PDF
100% (2)
D. Higgins, Willie Taylor Bioinformatics Sequence, Structure and Databanks PDF
268 pages
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
Fundamentals of Bioinformatics
No ratings yet
Fundamentals of Bioinformatics
40 pages
Unit 1 Bioinformatics
No ratings yet
Unit 1 Bioinformatics
38 pages
Day 1
No ratings yet
Day 1
38 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
76 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Biological Databases PDF
No ratings yet
Biological Databases PDF
13 pages
Xu GMX 9 D JN
No ratings yet
Xu GMX 9 D JN
270 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
13 pages
Bio in For Matics
100% (1)
Bio in For Matics
160 pages
What Is Genomics?
No ratings yet
What Is Genomics?
10 pages
Bioinformatics
No ratings yet
Bioinformatics
24 pages
What Is Bioinformatics
No ratings yet
What Is Bioinformatics
30 pages
Bioinformatics in PAM AND BLOSUM
100% (15)
Bioinformatics in PAM AND BLOSUM
17 pages
Phylogenetics - Wikipedia
No ratings yet
Phylogenetics - Wikipedia
9 pages
Introduction To NCBI Resources
No ratings yet
Introduction To NCBI Resources
39 pages
04 Computer Applications in Pharmacy Full Unit IV
No ratings yet
04 Computer Applications in Pharmacy Full Unit IV
14 pages
PB Bioinfo L1 2023
No ratings yet
PB Bioinfo L1 2023
21 pages
Biological Databases
No ratings yet
Biological Databases
13 pages
KEGG
No ratings yet
KEGG
6 pages
10 TKJ 2 - KIMIA (Respons)
No ratings yet
10 TKJ 2 - KIMIA (Respons)
22 pages
What Is Bioinformatics
No ratings yet
What Is Bioinformatics
6 pages
Bby 063
No ratings yet
Bby 063
16 pages
Bioinformatics Definition
No ratings yet
Bioinformatics Definition
11 pages
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
Bioin
No ratings yet
Bioin
34 pages
Local and Global Sequence Alignment 5+5 Examples
No ratings yet
Local and Global Sequence Alignment 5+5 Examples
10 pages
02 UPGMA Example
100% (1)
02 UPGMA Example
5 pages
BIOINFORMATICS - eNOTES
No ratings yet
BIOINFORMATICS - eNOTES
23 pages
Concepts of Bioinformatics PDF
100% (2)
Concepts of Bioinformatics PDF
20 pages
Bioinformatics Note
No ratings yet
Bioinformatics Note
7 pages
Ahora Si Este Es El Bueno
No ratings yet
Ahora Si Este Es El Bueno
8 pages
Construction of Phylogenetic Tree
No ratings yet
Construction of Phylogenetic Tree
12 pages
What Is Bioinformatics
100% (1)
What Is Bioinformatics
22 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
10 pages
BMS Lecture 1
No ratings yet
BMS Lecture 1
24 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
7 pages
An Assignment
No ratings yet
An Assignment
6 pages
Bioinformatics Overview
100% (1)
Bioinformatics Overview
18 pages
Bio Hist1586267617
No ratings yet
Bio Hist1586267617
8 pages
Task 2 - Biodiversity - Evolution - Genetic Variations
No ratings yet
Task 2 - Biodiversity - Evolution - Genetic Variations
7 pages
Amino Acid Substitution Matrices: Evolutionary Model
No ratings yet
Amino Acid Substitution Matrices: Evolutionary Model
20 pages
Bioinformatics: Tina Elizabeth Varghese
No ratings yet
Bioinformatics: Tina Elizabeth Varghese
9 pages
Bioinformatics History of Bioinformatics
No ratings yet
Bioinformatics History of Bioinformatics
10 pages
On Bioinformatic Resources
No ratings yet
On Bioinformatic Resources
7 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
Collection
No ratings yet
Collection
8 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
2 pages
Bioinformatics
No ratings yet
Bioinformatics
2 pages
Biomath Final
No ratings yet
Biomath Final
25 pages
Multiple Sequence Alignment For Construction of Phylogenetic Tree
No ratings yet
Multiple Sequence Alignment For Construction of Phylogenetic Tree
5 pages
Bioinformatics - A Student's Com
No ratings yet
Bioinformatics - A Student's Com
295 pages
Phylogentic Tree Construction - Tools
No ratings yet
Phylogentic Tree Construction - Tools
6 pages
Bioinformation: Phylogenetic Analysis of Chloroplast Matk Gene From Zingiberaceae For Plant Dna Barcoding
No ratings yet
Bioinformation: Phylogenetic Analysis of Chloroplast Matk Gene From Zingiberaceae For Plant Dna Barcoding
4 pages
l11 Introduction To Bioinformatics
No ratings yet
l11 Introduction To Bioinformatics
8 pages
Amber Noureen LOM Biochemistry and Molecular Biology Potsdam
No ratings yet
Amber Noureen LOM Biochemistry and Molecular Biology Potsdam
2 pages
Bio in For Matics
No ratings yet
Bio in For Matics
138 pages
ADVT - Project Associate - Tea Project 1
No ratings yet
ADVT - Project Associate - Tea Project 1
2 pages
Rani Anak Mat Case 4 Report
No ratings yet
Rani Anak Mat Case 4 Report
5 pages
Unit I Bioinformatics-Introduction
No ratings yet
Unit I Bioinformatics-Introduction
21 pages
Notas
No ratings yet
Notas
4 pages
Bio 316 - 0
No ratings yet
Bio 316 - 0
43 pages
Synthese Bioinfocrmatics
No ratings yet
Synthese Bioinfocrmatics
13 pages
"If You Can't Do Bioinformatics, You Can't Do Biology", J.D. Tisdall, 2003
No ratings yet
"If You Can't Do Bioinformatics, You Can't Do Biology", J.D. Tisdall, 2003
12 pages
Unit I
No ratings yet
Unit I
11 pages
1.history of Bioinformatics
No ratings yet
1.history of Bioinformatics
7 pages
Lecture 1and 2 Introduction
No ratings yet
Lecture 1and 2 Introduction
47 pages
Class03-What Is bioinformatics-2022-SIV2001
No ratings yet
Class03-What Is bioinformatics-2022-SIV2001
21 pages
Basics of Bioinformatics in Biological Research
No ratings yet
Basics of Bioinformatics in Biological Research
5 pages
Biological Databases
No ratings yet
Biological Databases
15 pages
Bioinformatics & Gene Banks
No ratings yet
Bioinformatics & Gene Banks
2 pages
Bioinformatics PDF Bak
No ratings yet
Bioinformatics PDF Bak
14 pages
Lecture2-DataMining For Bioinformatics
No ratings yet
Lecture2-DataMining For Bioinformatics
7 pages
13. ვ. ქობალია. ბიოტექნოლოგია მცენარეთა დაცვაში
No ratings yet
13. ვ. ქობალია. ბიოტექნოლოგია მცენარეთა დაცვაში
215 pages
Bby 063
No ratings yet
Bby 063
16 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Unit 1
No ratings yet
Unit 1
24 pages
Full Download The Phylogenetic Handbook A Practical Approach To Phylogenetic Analysis and Hypothesis Testing 2nd Edition Philippe Lemey PDF
No ratings yet
Full Download The Phylogenetic Handbook A Practical Approach To Phylogenetic Analysis and Hypothesis Testing 2nd Edition Philippe Lemey PDF
44 pages
Lecture Guide BLS 211 by .O.a. Adeyemi
No ratings yet
Lecture Guide BLS 211 by .O.a. Adeyemi
7 pages
Phylogenetic Trees Made Easy A How To Manual Fifth Edition Hall - Own The Ebook Now and Start Reading Instantly
No ratings yet
Phylogenetic Trees Made Easy A How To Manual Fifth Edition Hall - Own The Ebook Now and Start Reading Instantly
47 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
IBT Applicant Outcome Successful
No ratings yet
IBT Applicant Outcome Successful
7 pages
(Ebook) Insect Molecular Biology and Biochemistry by Lawrence I. Gilbert ISBN 0123847478 PDF Download
100% (1)
(Ebook) Insect Molecular Biology and Biochemistry by Lawrence I. Gilbert ISBN 0123847478 PDF Download
52 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
7 pages
Gauthier, 2019 (History)
No ratings yet
Gauthier, 2019 (History)
16 pages
Realizing the Promise and Minimizing the Perils of AI for Science and the Scientific Community
From Everand
Realizing the Promise and Minimizing the Perils of AI for Science and the Scientific Community
Kathleen Hall Jamieson
No ratings yet
Unlocking the Human Genome - What Gene Editing Means for Our Future
From Everand
Unlocking the Human Genome - What Gene Editing Means for Our Future
Dr. Jonathan Wells
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2.3 - History of Biological Databases

Uploaded by

2.3 - History of Biological Databases

Uploaded by

Biological databases can be broadly classified into sequence and structure databases.

Major milestones in bioinformatics

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.