0% found this document useful (0 votes)
3 views44 pages

Lecture 5

A Laboratory Information Management System (LIMS) is software that manages laboratory samples, users, instruments, and workflows, primarily used in environmental, research, and commercial labs. It integrates instruments within a lab network, allowing personnel to perform calculations and document results efficiently. While LIMS is widely adopted in industry, academic labs have been slower to implement such systems due to maintenance costs, although minimal LIMS and electronic lab notebooks are emerging for small research groups.

Uploaded by

dudalamouni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views44 pages

Lecture 5

A Laboratory Information Management System (LIMS) is software that manages laboratory samples, users, instruments, and workflows, primarily used in environmental, research, and commercial labs. It integrates instruments within a lab network, allowing personnel to perform calculations and document results efficiently. While LIMS is widely adopted in industry, academic labs have been slower to implement such systems due to maintenance costs, although minimal LIMS and electronic lab notebooks are emerging for small research groups.

Uploaded by

dudalamouni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

LIMS

From Wikipedia, . . .

A Laboratory Information Management System (LIMS) is software is used in


the laboratory for the management of samples, laboratory users, instruments,
standards and other laboratory functions such as invoicing, plate management,
and work flow automation.
LIMS

From Wikipedia, . . .

A LIMS and a Laboratory Information System


(LIS) perform similar functions. The primary
difference is that LIMS are generally targeted
toward environmental, research or commercial
analysis, such as pharmaceutical or
petrochemical, and LIS are targeted toward the
clinical market (hospitals and other clinical
labs).
Today’s trend is to move the whole process of
information gathering, decision making, Beckamn-Coulter
calculation, review and release out into the
workplace and away from the office.
LIMS

From Wikipedia, . . .

The goal is to create an organization such that:


I Instruments are integrated in the lab network;
receive instructions and worklists from the LIMS
and return finished results including raw data
back to a central repository.
I Lab personnel perform calculations,
documentation and review results using online
information from connected instruments,
reference databases and other resources using
electronic lab notebooks (ELN’s) connected to
the LIMS.
I Management can supervise the lab process, react
to bottlenecks in workflow and ensure regulatory
demands are met.
I External participants (department, company) can
place work requests and follow up on progress,
view results and other documentation.
LIMS
From Wikipedia, . . .

LIMS are frequently used in industry, whereas academic labs have been slower
to adapt LIMS due to the large overhead involved in installing and maintaining
such systems. Several minimal LIMS or ELN systems for small research groups
are starting to appear, and with the advance of high throughput technologies
and Systems Biology it is likely that more academic labs and small research
groups will adopt LIMS or ELNs.

ABcontrols.com
Nucleic Acid Research

Nucleic Acid Research is a major biological journal that


publishes on all manner of research dealing with DNA and
RNA sequence biology.

As part of this mandate, each year it publishes a special


“database” issue. The following is taken from the table of
contents of the 2008 years issue.
Nucleic Acid Research I
Nucl. Acids Res. Database issue: Vol. 36 2008

I The Molecular Biology Database Collection: 2008 update


I Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the
EMBL Nucleotide Sequence Database
I Database resources of the National Center for Biotechnology Information
I DDBJ with new system and face
I GenBank
I GISSD: Group I Intron Sequence and Structure Database
I The Gypsy Database (GyDB) of mobile genetic elements
I TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven
vertebrates and invertebrates
I UgMicroSatdb: database for mining microsatellites from unigenes
I UTRome.org: a platform for 3’UTR biology in C. elegans
I ProSAS: a database for analyzing alternative splicing in the context of protein structures
I CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data mining
I COXPRESdb: a database of coexpressed gene networks in mammals
I CTCFBSDB: a CTCF-binding site database for characterization of vertebrate genomic insulators
I DBD taxonomically broad transcription factor predictions: new content and functionality
I DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic
conservation information
I DBTSS: database of transcription start sites, progress report 2008
I JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the
2008 update
Nucleic Acid Research II
I ORegAnno: an open-access community-driven resource for regulatory annotation
I ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes
I RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active
(experimental) annotated promoters and Textpresso navigation
I TransfactomeDB: a resource for exploring the nucleotide sequence specificity and condition-specific
regulatory activity of trans-acting factors
I YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations
in Saccharomyces cerevisiae
I ARED Organism: expansion of ARED reveals AU-rich element cluster variations between human and mouse
I GRSDB2 and GRS UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs
I The microRNA.org resource: targets and expression
I miRBase: tools for microRNA genomics
I miRGator: an integrated system for functional annotation of microRNAs
I miRNAMap 2.0: genomic maps of microRNAs in metazoan genomes
I NONCODE v2.0: decoding the non-coding
I piRNABank: a web resource on classified and clustered Piwi-interacting RNAs
I The 3D rRNA modification maps database: with interactive tools for ribosome analysis
I Vir-Mir db: prediction of viral microRNA candidate hairpins
I The Universal Protein Resource (UniProt)
I MIPS: analysis and annotation of genome information in 2007
I AAindex: amino acid index database, progress report 2008
I CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and
engineering
I MALISAM: a database of structurally analogous motifs in proteins
I MegaMotifBase: a database of structural motifs in protein families and superfamilies
Nucleic Acid Research III
I PPT-DB: the protein property prediction and testing database
I LOCATE: a mammalian protein subcellular localization database
I TOPDB: topology data bank of transmembrane proteins
I Phospho.ELM: a database of phosphorylation sites update 2008
I The 20 years of PROSITE
I eggNOG: automated construction and annotation of orthologous groups of genes
I EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon
information
I InParanoid 6: eukaryotic ortholog clusters with inparalogs
I OPTIC: orthologous and paralogous transcripts in clades
I OrthoDB: the hierarchical catalog of eukaryotic orthologs
I PairsDB atlas of protein sequence space
I The Pfam protein families database
I SIMAP structuring the network of protein similarities
I ATDB: a uni-database platform for animal toxins
I ChromDB: The Chromatin Database
I DB-PABP: a database of polyanion-binding proteins
I KNOTTIN: the knottin or inhibitor cystine knot scaffold in 2007
I MEROPS: the peptidase database
I NORINE: a database of nonribosomal peptides
I SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements
I The Telomerase Database
I ChEBI: a database and ontology for chemical entities of biological interest
I ChemBank: a small-molecule screening and cheminformatics resource database
Nucleic Acid Research IV
I R.E.DD.B.: A database for RESP and ESP atomic charges, and force field libraries
I Glycoconjugate Data Bank:Structures an annotated glycan structure database and N-glycan primary
structure verification service
I Greglist: a database listing potential G-quadruplex regulated genes
I The ITS2 Database II: homology modelling RNA structure for molecular systematics
I QuadBase: genome-wide database of G4 DNA occurrence and conservation in human, chimpanzee, mouse
and rat promoters and 146 microbes
I RNA FRABASE version 1.0: an engine with a database to search for the three-dimensional fragments
within RNA structures
I RNAJunction: a database of RNA junctions and kissing loops for three-dimensional structural analysis and
nanodesign
I AutoPSI: a database for automatic structural classification of protein sequences and structures
I BioMagResBank
I coliSNP database server mapping nsSNPs on protein structures
I Gene3D: comprehensive structural and functional annotation of genomes
I Data growth and its impact on the SCOP database: new developments
I Remediation of the protein data bank archive
I FunSimMat: a comprehensive functional similarity database
I The Gene Ontology project in 2008
I The HGNC Database in 2008: a resource for the human genome
I The Plant Ontology Database: a community resource for plant structure and developmental stages
controlled vocabulary and annotations
I IDBD: Infectious Disease Biomarker Database
I SuperCAT: a supertree database for combined and integrative multilocus sequence typing analysis of the
Bacillus cereus group of bacteria (including B. cereus, B. anthracis and B. thuringiensis)
Nucleic Acid Research V
I GenoList: an integrated environment for comparative analysis of microbial genomes
I The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their
associated metadata
I KEGG for linking genomes to life and the environment
I Narcisse: a mirror view of conserved syntenies
I PhylomeDB: a database for genome-wide collections of gene phylogenies
I BioHealthBase: informatics support in the elucidation of influenza virus host pathogen interactions and
virulence
I CoVDB: a comprehensive database for comparative analysis of coronavirus genes and genomes
I The hepatitis C sequence database in Los Alamos
I AlterORF: a database of alternate open reading frames
I Enteropathogen Resource Integration Center (ERIC): bioinformatics support for research on
biodefense-relevant enterobacteria
I HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under
translational selection
I The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions
I IMG/M: a data management and analysis system for metagenomes
I VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics
I xBASE2: a comprehensive resource for comparative bacterial genomics
I ProtozoaDB: dynamic visualization and exploration of protozoan genomes
I ToxoDB: an integrated Toxoplasma gondii database resource
I CandidaDB: a multi-genome database for Candida species and related Saccharomycotina
I CFGP: a web-based, comparative fungal genomics platform
I PHI-base update: additions to the pathogen host interaction database
I Gene Ontology annotations at SGD: new data sources and annotation methods
Nucleic Acid Research VI
I ButterflyBase: a platform for lepidopteran genomics
I FlyBase: integration and improvements to query tools
I REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in
Drosophila
I SmedGD: the Schmidtea mediterranea genome database
I Upgrades to StellaBase facilitate medical and genetic studies on the starlet sea anemone, Nematostella
vectensis
I WormBase 2007
I PROCOGNATE: a cognate ligand domain mapping for enzymes
I The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of
Pathway/Genome Databases
I Bacteriome.org an integrated protein interaction database for E. coli
I The BioGRID Interaction Database: 2008 update
I The cell cycle DB: a systems biology approach to cell cycle analysis
I CORUM: the comprehensive resource of mammalian protein complexes
I DIMA 2.0 predicted and known domain interactions
I DOMINE: a database of protein domain interactions
I HotSprint: database of computational hot spots in protein interfaces
I LigASite a database of biologically relevant binding sites in proteins with known apo-structures
I Binding MOAD, a high-quality protein ligand database
I PepCyber:P PEP: a database of human protein protein interactions mediated by phosphoprotein-binding
domains
I STITCH: interaction networks of chemicals and proteins
I EndoNet: an information resource about regulatory networks of cell-to-cell communication
I NetworKIN: a resource for exploring cellular phosphorylation networks
Nucleic Acid Research VII
I The Molecule Pages database
I Ensembl 2008
I EuroPhenome and EMPReSS: online mouse phenotyping resource
I Gallus GBrowse: a unified genomic database for the chicken
I The Mouse Genome Database (MGD): mouse biology and model systems
I PBmice: an integrated database system of piggyBac (PB) insertional mutations and their characterizations
in mice
I TreeFam: 2008 Update
I The UniTrap resource: tools for the biologist enabling optimized use of gene trap clones
I UTGB/medaka: genomic resource database for medaka biology
I The vertebrate genome annotation (Vega) database
I Xenbase: a Xenopus biology and genomics resource
I The Zebrafish Information Network: the zebrafish model organism database provides expanded support for
genotypes and phenotypes
I The UCSC Genome Browser Database: 2008 update
I X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis
I Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees
I The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and
transcripts
I Human PAML browser: a database of positive selection on human genes using phylogenetic methods
I MSY Breakpoint Mapper, a database of sequence-tagged sites useful in defining naturally occurring
deletions in the human Y chromosome
I MutDB: update on development of tools for the biochemical analysis of genetic variation
I F-SNP: computationally predicted functional SNPs for disease association studies
Nucleic Acid Research VIII
I Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the
SNPeffect and PupaSuite databases
I CanGEM: mining gene copy number changes in cancer
I MethyCancer: the database of human DNA methylation and cancer
I PubMeth: a cancer methylation database combining text-mining and expert annotation
I 4DXpress: a database for cross-species expression pattern comparisons
I Cyclebase.org a comprehensive multi-organism online database of cell-cycle experiments
I EMAGE Edinburgh Mouse Atlas of Gene Expression: 2008 update
I Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured
experimental metadata
I The Stanford Tissue Microarray Database
I PRIDE: new developments and new datasets
I An emerging cyberinfrastructure for biodefense pathogen and pathogen host data
I CEBS Chemical Effects in Biological Systems: a public data repository integrating study design and
toxicity data with microarray and proteomics data
I DrugBank: a knowledgebase for drugs, drug actions and drug targets
I GLIDA: GPCR ligand database for chemical genomics drug discovery database and tools update
I The pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge
I SuperTarget and Matador: resources for exploring drug-target relationships
I VIOLIN: vaccine investigation and online information network
I The plant organelles database (PODB): a collection of visualized plant organelles and protocols for plant
organelle research
I Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals
I The Generation Challenge Programme comparative plant stress-responsive gene catalogue
I Gramene: a growing plant comparative genomics resource
Nucleic Acid Research IX
I MetaCrop: a detailed database of crop plant metabolism
I PlantGDB: a resource for comparative plant genomics
I PlantTFDB: a comprehensive plant transcription factor database
I PlantTribes: a gene and gene family resource for comparative genomics in plants
I ppdb: a plant promoter database
I Update of ASRP: the Arabidopsis Small RNA Project database
I CATdb: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform
I GreenPhylDB: a database for plant comparative genomics
I AtPID: Arabidopsis thaliana protein interactome database an integrative platform for plant systems biology
I The Arabidopsis Information Resource (TAIR): gene structure and function annotation
I PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation
site predictor
I Oryza Tag Line, a phenotypic mutant database for the Genoplante rice insertion line library
I The Rice Annotation Project Database (RAP-DB): 2008 update
I GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data
I Panzea: an update on new content and features
I Shanghai RAPESEED Database: a resource for functional genomics studies of seed development and fatty
acid metabolism of Brassica
I MUGEN mouse database; Animal models of human immunological diseases
Organism specific databases
Organism specific databases
Organism specific databases
Molecule specific databases
Molecule sub-type specific databases
Outline

I Introduction: Databases, Database systems, Data Mining


I Biological Databases - Lecture #1–#4
I Database systems - Lectures #5-#13
I Data Mining - Lectures #14-#24
Definitions

Database - Is “a collection of interrelated data items that are


managed as a single unit”

It can be (Access), but is usually not done as a


single file (as a single physical entity; all other
DBMS’s).
Database management system (DBMS) - programs (software)
that manipulate databases.
DBMS

The major DBMS’s are

Access (DB2, PostgreSQL, MySQL) Oracle


MySQL is a popular choice in Biology
MySQL is a popular choice in Biology
MySQL is a popular choice in Biology
MySQL is a popular choice in Biology
MySQL is a popular choice in Biology
Data Abstraction

The data is considered in a hierarchical fashion. Generally it is


considered in three layers -

1. Logical
2. Physical
3. External
Data Abstraction

The data is considered in a hierarchical fashion. Generally it is


considered in three layers -

1. Logical = Schema. The DBMS will transform the data files


into useful related bits of information according to the logical
structures determined by the schema.
2. Physical
3. External
Data Abstraction

The data is considered in a hierarchical fashion. Generally it is


considered in three layers -

1. Logical
2. Physical = data files. Access is one of the few to store as a
single file. This is quite limiting for large databases or those
accessed by many people.
3. External
Data Abstraction

The data is considered in a hierarchical fashion. Generally it is


considered in three layers -

1. Logical
2. Physical
3. External = user views of the data. Multiple users can be
looking at the same database (physical files) and yet be
presented with completely different views.
Why layers?

Why should the data be looked at in this hierarchical fashion?

I Logical layers allow data to be changed including entire new


data objects without disrupting users.
I Physical layer independence allows files to be split, to be
moved or to be added without affecting the database
structure.
I Hence, the layers permit an independence of each aspect of
the database.
Database models

I Flat files
I Hierarchical Models
I Network Models
I Relational Models
I Object-Orientated Model (OO)
I Object-Relational
Database models

I Flat files - e.g. Genbank files. Not a database at all. Often


used to input data into databases however. Each flat file
requires a special program to deal with it and the peculiarities
of its format.
I Hierarchical Models
I Network Models
I Relational Models
I Object-Orientated Model (OO)
I Object-Relational
Database models

I Flat files
I Hierarchical Models - Each file contains records with pointers
(generally these were physical pointers) to other records in a
one-to-many relationship (parent to children).
I Network Models
I Relational Models
I Object-Orientated Model (OO)
I Object-Relational
Database models

I Flat files
I Hierarchical Models
I Network Models - there is still a one-to-many relationship but
the child could have multiple parents. Similar to the web in
that each page may have many other pages that link to it.
Problem with this is that relationships become very complex
and can even have circular links.
I Relational Models
I Object-Orientated Model (OO)
I Object-Relational
Database models

I Flat files
I Hierarchical Models
I Network Models
I Relational Models - Should not need to follow predefined links
as in the web. The key to a relational model is to have a
common data item stored within each record. These are called
keys or ID’s. In this case, physically the rows/entries of each
record of data need not even be in the same file.
I Object-Orientated Model (OO)
I Object-Relational
Database models

I Flat files
I Hierarchical Models
I Network Models
I Relational Models
I Object-Orientated Model (OO) - brought about by the need
to handle images, audio, etc. ≡ complex data. An object is a
logical grouping of related data and associated program logic.
I Object-Relational
Database models

I Flat files
I Hierarchical Models
I Network Models
I Relational Models
I Object-Orientated Model (OO)
I Object-Relational - The OO methods lacks a simplistic ad-hoc
query capability. So people have created hybrids of Relational
and Object methods (e.g. Oracle, DB2).
Organizing Table structure

Consider a simple table of species ...

Species Family Common Name Habitat


Homo sapiens Hominidae Man all known
Homo erectus Hominidae Ancient Man Africa
Pan troglodytes Hominidae Chimpanzee Africa
Pan troglodytes schweinfurthii Hominidae Eastern Chimpanzee Africa
Pan paniscus Hominidae Pygmy Chimpanzee Africa
Pan paniscus Hominidae Bonobo Africa
Organizing Table structure

I Problems?
I What if someone is old (aka experienced) and
recognizes that Pongidae is the legacy family
name?
I All of this data is very repetitive?
I What happens if I misspell an entry?
I What happens if the taxonomists decide to
change the names? (e.g. Pan paniscus used to be
a subspecies of Pan troglodytes).
Table structure

The items below represent a single


row or a single set of related items
that are in the table “Title”.
Title
item
item
item
item
item
item
item

Number one rule — keep it simple.


Table structure

Keys are used to identify (uniquely) each row.

Title
item ID
item
item
item
item
item
Table structure

An example of a (very
poor) table is shown here.
Note that there is lots of DNA SEQUENCE
information potentially SEQUENCE ID
associated with a sequence GENBANK ACC NUMBER
record beyond the element SAMPLE
itself. SEQUENCE TYPE (eg. mtDNA)
SEQUENCE MACHINE
Note that the table title is SEQUENCE SOFTWARE
“DNA SEQUENCE” TECHNICIAN
rather than “DNA TECHNICIAN LOCATION
SEQUENCES”; table titles TECHNICIAN EMAIL
SEQUENCE TEXT
are always singular. This is
because it represents one
object.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy