0% found this document useful (0 votes)
17 views37 pages

18-DIP - Database of Interacting Proteins-26-09-2024

The document discusses databases for biological interactions, focusing on the BioGRID, STRING, and DIP databases, which catalog protein-protein interactions and related data. BioGRID is a curated resource that includes over 1.7 million interactions from various species, while STRING integrates known and predicted protein interactions and supports advanced search features. DIP catalogs experimentally determined interactions and is part of the International Molecular Exchange Consortium, facilitating collaboration among major interaction data providers.

Uploaded by

Piyush Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views37 pages

18-DIP - Database of Interacting Proteins-26-09-2024

The document discusses databases for biological interactions, focusing on the BioGRID, STRING, and DIP databases, which catalog protein-protein interactions and related data. BioGRID is a curated resource that includes over 1.7 million interactions from various species, while STRING integrates known and predicted protein interactions and supports advanced search features. DIP catalogs experimentally determined interactions and is part of the International Molecular Exchange Consortium, facilitating collaboration among major interaction data providers.

Uploaded by

Piyush Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Databases for biological interactions

PPI network components

 Nodes represent proteins called signaling receptors.

 A line connecting two nodes is an edge

 Hub gene is a gene that is highly connected to other genes in a network,


and is thought to be important for the regulation of the network as a
whole.

 Seeds are the proteins used to start the analysis and around which the
network is built.
BioGRID
(The Biological General Repository for
Interaction Datasets )
BioGRID Database
• The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological
database of protein-protein interactions, genetic interactions, chemical interactions, and post-
translational modifications.
• The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is
partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the
Alliance of Genome Resources.
• The BioGRID is funded by the NIH (National Institutes of Health) and (Canadian Institutes of Health
Research) CIHR.
• It is an observer member of the International Molecular Exchange Consortium (IMEx).
• In addition to collaborating with experts in themed curation project efforts, BioGRID actively works
together with MOD and meta-database resources in order to facilitate the widespread propagation
of BioGRID records.
• Open-access database resource that houses manually curated protein and genetic interactions from
multiple species including yeast, worm, fly, mouse, and human.
• In addition to general curation across species, BioGRID undertakes themed curation
projects in specific aspects of cellular regulation, for example the ubiquitin-proteasome
system, as well as specific disease areas, such as for the SARS-CoV-2 virus that causes
COVID-19 severe acute respiratory syndrome.
• BioGRID also now curates gene‐phenotype relationships from genome‐wide CRISPR
screens.
• All protein and genetic interactions annotated in BioGRID are exclusively derived from
expert manual curation of experimental data reported in peer‐reviewed publications.
• BioGRID currently holds over 1,740,000 interactions curated from both high-throughput
datasets and individual focused studies, as derived from over 70,000+ publications in the
primary literature.
Partners
BioGRID Architecture and Data Model

• The BioGRID database architecture consists of three distinct components: (i)


the Core (ii) the Web (iii) the Interaction Management System (IMS).
• Each of the three components has a specific role in driving the BioGRID
system and can easily be modified to meet rapidly changing needs in data
management without entailing major changes to the applications it supports.
• All of the BioGRID databases use MySQL 5.1.
• BioGRID provides interaction data to several model organism databases,
resources such as Entrez-Gene, SGD, TAIR, FlyBase and other interaction
meta-databases.
BioGRID Architecture and Data Model
Searches in BioGRID

• BioGRID searches can be performed by clicking on ‘gene’ tab from the main search page.
• To perform search ,one can simply enter term and search engine will search for matching identifier.
• Advance search option can be also used for the better result.
Search option
at BioGRID

Wildcard Publication
searches searches
Wildcard searches
• The BioGRID supports wildcard searching (ie. searching where the search is not exact but
rather a range of possible matches) on the TAIL end of any keyword entered in our search
field.
• To perform a wildcard search, simply enter your prefix (must be 3 letters or more)
followed by a star (*).
• Examples: STE*, CDC*, YAL01*, CLN*

Publication Searches

• Searches publication by pubmed id, author name or by keyword terms.


Ex. 9006895
BioGRID-ORCS
• A recent extension of BioGRID, named the Open
Repository of CRISPR Screens (ORCS,
orcs.thebiogrid.org), captures single mutant
phenotypes and genetic interactions from published
high throughput genome-wide CRISPR/Cas9-based
genetic screens.
• ORCS is updated on a quarterly basis and is fully
searchable by gene/protein, phenotype, cell line,
authors, and other attributes. All data in ORCS can be
downloaded in standard formats and fly cell lines.
• BioGRID-ORCS searches 360 publications and
93,209 genes to return 1,982 CRISPR screens from 5
major model organism species, 801 cell lines, and
135 cell types.
Themed Curation Projects

• Due to the overwhelming size of published scientific literature containing human (Homo
sapiens) gene, protein, and chemical interactions, BioGRID has taken a targeted, project-based
approach to curation of human interaction data in manageable collections of high impact data.
• These themed curation projects represent central biological processes with disease relevance
such as chromatin modification, autophagy, and the ubiquitin-proteasome system or diseases of
interest including glioblastoma, Fanconi Anemia, and COVID-19.
• As of 18 October 2020, BioGRID themed curation project efforts have resulted in the extraction
of 424,631 interactions involving 2,361 proteins from more than 37,000 scientific articles.
STRING
(Search Tool for the Retrieval of
Interacting Genes/Proteins)
STRING
• STRING is a database of known and predicted protein-protein interactions. The database aims to
integrate all known and predicted associations between proteins, including both physical
interactions as well as functional associations.
• The STRING database contains information from numerous sources, including experimental data,
computational prediction methods and public text collections.
• Freely accessible.
• STRING has been developed by a consortium of academic institutions including CPR, EMBL, KU,
SIB, TUD and UZH.
• The STRING database currently covers 59,309,604 proteins from 12,535 organisms.
• Apart from the website, the database can be queried directly from within Cytoscape and from
within R (via a Bioconductor package).
Why is STRING Important?

 Proteins are essential for almost every function in a living cell. Understanding how they
interact helps us figure out how cells work, how diseases develop, and how we might treat
them.

 STRING helps visualize these interactions in the form of networks, making it easier to see
how proteins connect with each other.
Features
• In STRING, each protein-protein interaction is annotated with one or more 'scores’.
• These scores are indicators of confidence, i.e. how likely STRING judges an interaction to
be true, given the available evidence. All scores rank from 0 to 1, with 1 being the highest
possible confidence. A score of 0.5 would indicate that roughly every second interaction
might be erroneous (i.e., a false positive).
• Results of the various computational predictions can be inspected from different designated
views.
• There are two modes of STRING: Protein-mode and COG-mode (clusters of orthologous
groups).
• In COG mode Predicted interactions are propagated to proteins in other organisms for
which interaction has been described by inference of orthology.
• A web interface is available to access the data and to give a fast overview of the proteins
and their interactions.
• A plug-in for cytoscape to use STRING data is available.
• Another possibility to access data STRING is to use the application programming interface
(API) by constructing a URL that contain the request.
Data sources
• STRING imports data from experimentally derived protein–protein interactions through
literature curation.
• STRING also store computationally predicted interactions from: (i) text mining of scientific
texts, (ii) interactions computed from genomic features, and (iii) interactions transferred
from model organisms based on orthology.
• All predicted or imported interactions are benchmarked against a common reference of
functional partnership as annotated by KEGG (Kyoto Encyclopedia of Genes and
Genomes).
• Interactions in STRING are derived from five main sources:
Imported data

STRING imports protein association knowledge from databases of physical interaction and databases
of curated biological pathway knowledge (MINT, HPRD, BIND, DIP, BioGRID, KEGG, Reactome,
IntAct, EcoCyc, NCI-Nature Pathway Interaction Database, GO).

Text mining

A large body of scientific texts (SGD, OMIM, FlyBase, PubMed) are parsed to search for statistically
relevant co-occurrences of gene names.
Predicted data

• Neighborhood: Similar genomic context in different species suggest a similar function of


the proteins.
• Fusion-fission events: Proteins that are fused in some genomes are very likely to be
functionally linked (as in other genomes where the genes are not fused).
• Occurrence: Proteins that have a similar function or an occurrence in the same metabolic
pathway, must be expressed together and have similar phylogenetic profile.
• Coexpression: Predicted association between genes based on observed patterns of
simultaneous expression of genes.
Occurrence
Neighborhood

Fusion-fission events

Coexpression
Searches in STRING
Netwo Data Settings
rk Legen
d
DIP: The Database of
Interacting Proteins
DIP
• The Database of Interacting Proteins (DIP) is a biological database which catalogs
experimentally determined interactions between proteins.
• It combines information from a variety of sources to create a single, consistent set of
protein–protein interactions.
• The data stored within DIP have been curated, both manually, by expert curators, and
automatically, using computational approaches that utilize the knowledge about the
protein–protein interaction networks extracted from the most reliable, core subset of the
DIP data.
• The database was initially released in 2002. As of 2014, DIP is curated by the research
group of David Eisenberg at UCLA.
DIP
• DIP is a member of the International Molecular Exchange Consortium (IMEx),a group of the
major public providers of interaction data.
• Other participating databases include the Biomolecular Interaction Network Database
(BIND), IntAct, the Molecular Interaction Database (MINT), MIPS, MPact, and BioGRID.
• The databases of IMEx work together to prevent duplications of effort, collecting data from
non-overlapping sources and sharing the curated interaction data.
• DIP is useful for understanding protein function and protein–protein relationships, studying
the properties of networks of interacting proteins, benchmarking predictions of protein–
protein interactions, and studying the evolution of protein–protein interactions.
The DIP database is composed of nodes and edges:

• DIP Nodes (proteins)


Each protein in the DIP database has a unique identifier, like <DIP:nnnN> (e.g. DIP:310N)
These proteins are linked to major protein databases such as PIR, SWISSPROT, and
GENBANK.
• DIP Edges (interactions)
Each interaction between proteins is also given a unique identifier, like <DIP:nnnE> (e.g.
DIP:1234E)
It provides access to information such as the region involved in the interaction, the dissociation
constant and the experimental methods used to identify and characterize the interaction.
Relational structure of DIP

The DIP database is composed of three linked tables: a table of protein information, a table of protein–protein
interactions, and a table describing details of experiments detecting the protein–protein interactions.

(i) The protein information table contains protein identification codes from the SWISS-PROT, PIR and GenBank
sequence databases, as well as each protein’s gene name, description, enzyme code and cellular localization,
when known.

(ii) The interaction table describes proteins that interact from the protein information table, as well as the ranges
of amino acids and the protein domains involved in the protein–protein interaction, when known.

(iii) The experimental article table details the experiments used to detect the interactions from the interaction
table and their associated literature citations. This table includes the MEDLINE standard article code
(PMID/UID), as well as the authors, title, journal and year of publication of the article. Over 20 different
experimental techniques are represented in DIP, including co-immunoprecipitation, yeast two-hybrid and in
vitro binding assays. Where determined, a dissociation constant is also included.
Relational structure of DIP
Searching The Database
DIP can be searched in a variety of ways. One can look for interactions
involving a specific protein by entering its gene name or its accession code from
GenBank, PIR or SWISS-PROT. More general searches can be performed for
information such as organisms, protein superfamilies, keywords, experimental
techniques or literature citations.
Applications of PPI interaction
databases
Predict
functions of
gene/proteins

Network Molecular
medicine: drug mechanisms
targets: drug behind certain
repurposing phenotypes

Network biology
Systems Enhance our
understanding of
biology
model organisms

Ecological
systems
Applications
• Disease Research: widely used in identifying protein networks involved in diseases, including
cancer, neurodegenerative disorders, and infectious diseases like rheumatoid arthritis.

• Drug Target Identification: By exploring PPIs, researchers can identify potential drug targets
and understand the impact of drugs on cellular networks.

• Systems Biology: it aids in the construction of large-scale interaction networks, which are
crucial for systems biology studies aimed at understanding complex biological systems.

• Functional Genomics: Researchers can predict the function of unknown proteins based on
their interactions with well-characterized proteins.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy