18-DIP - Database of Interacting Proteins-26-09-2024
18-DIP - Database of Interacting Proteins-26-09-2024
Seeds are the proteins used to start the analysis and around which the
network is built.
BioGRID
(The Biological General Repository for
Interaction Datasets )
BioGRID Database
• The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological
database of protein-protein interactions, genetic interactions, chemical interactions, and post-
translational modifications.
• The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is
partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the
Alliance of Genome Resources.
• The BioGRID is funded by the NIH (National Institutes of Health) and (Canadian Institutes of Health
Research) CIHR.
• It is an observer member of the International Molecular Exchange Consortium (IMEx).
• In addition to collaborating with experts in themed curation project efforts, BioGRID actively works
together with MOD and meta-database resources in order to facilitate the widespread propagation
of BioGRID records.
• Open-access database resource that houses manually curated protein and genetic interactions from
multiple species including yeast, worm, fly, mouse, and human.
• In addition to general curation across species, BioGRID undertakes themed curation
projects in specific aspects of cellular regulation, for example the ubiquitin-proteasome
system, as well as specific disease areas, such as for the SARS-CoV-2 virus that causes
COVID-19 severe acute respiratory syndrome.
• BioGRID also now curates gene‐phenotype relationships from genome‐wide CRISPR
screens.
• All protein and genetic interactions annotated in BioGRID are exclusively derived from
expert manual curation of experimental data reported in peer‐reviewed publications.
• BioGRID currently holds over 1,740,000 interactions curated from both high-throughput
datasets and individual focused studies, as derived from over 70,000+ publications in the
primary literature.
Partners
BioGRID Architecture and Data Model
• BioGRID searches can be performed by clicking on ‘gene’ tab from the main search page.
• To perform search ,one can simply enter term and search engine will search for matching identifier.
• Advance search option can be also used for the better result.
Search option
at BioGRID
Wildcard Publication
searches searches
Wildcard searches
• The BioGRID supports wildcard searching (ie. searching where the search is not exact but
rather a range of possible matches) on the TAIL end of any keyword entered in our search
field.
• To perform a wildcard search, simply enter your prefix (must be 3 letters or more)
followed by a star (*).
• Examples: STE*, CDC*, YAL01*, CLN*
Publication Searches
• Due to the overwhelming size of published scientific literature containing human (Homo
sapiens) gene, protein, and chemical interactions, BioGRID has taken a targeted, project-based
approach to curation of human interaction data in manageable collections of high impact data.
• These themed curation projects represent central biological processes with disease relevance
such as chromatin modification, autophagy, and the ubiquitin-proteasome system or diseases of
interest including glioblastoma, Fanconi Anemia, and COVID-19.
• As of 18 October 2020, BioGRID themed curation project efforts have resulted in the extraction
of 424,631 interactions involving 2,361 proteins from more than 37,000 scientific articles.
STRING
(Search Tool for the Retrieval of
Interacting Genes/Proteins)
STRING
• STRING is a database of known and predicted protein-protein interactions. The database aims to
integrate all known and predicted associations between proteins, including both physical
interactions as well as functional associations.
• The STRING database contains information from numerous sources, including experimental data,
computational prediction methods and public text collections.
• Freely accessible.
• STRING has been developed by a consortium of academic institutions including CPR, EMBL, KU,
SIB, TUD and UZH.
• The STRING database currently covers 59,309,604 proteins from 12,535 organisms.
• Apart from the website, the database can be queried directly from within Cytoscape and from
within R (via a Bioconductor package).
Why is STRING Important?
Proteins are essential for almost every function in a living cell. Understanding how they
interact helps us figure out how cells work, how diseases develop, and how we might treat
them.
STRING helps visualize these interactions in the form of networks, making it easier to see
how proteins connect with each other.
Features
• In STRING, each protein-protein interaction is annotated with one or more 'scores’.
• These scores are indicators of confidence, i.e. how likely STRING judges an interaction to
be true, given the available evidence. All scores rank from 0 to 1, with 1 being the highest
possible confidence. A score of 0.5 would indicate that roughly every second interaction
might be erroneous (i.e., a false positive).
• Results of the various computational predictions can be inspected from different designated
views.
• There are two modes of STRING: Protein-mode and COG-mode (clusters of orthologous
groups).
• In COG mode Predicted interactions are propagated to proteins in other organisms for
which interaction has been described by inference of orthology.
• A web interface is available to access the data and to give a fast overview of the proteins
and their interactions.
• A plug-in for cytoscape to use STRING data is available.
• Another possibility to access data STRING is to use the application programming interface
(API) by constructing a URL that contain the request.
Data sources
• STRING imports data from experimentally derived protein–protein interactions through
literature curation.
• STRING also store computationally predicted interactions from: (i) text mining of scientific
texts, (ii) interactions computed from genomic features, and (iii) interactions transferred
from model organisms based on orthology.
• All predicted or imported interactions are benchmarked against a common reference of
functional partnership as annotated by KEGG (Kyoto Encyclopedia of Genes and
Genomes).
• Interactions in STRING are derived from five main sources:
Imported data
STRING imports protein association knowledge from databases of physical interaction and databases
of curated biological pathway knowledge (MINT, HPRD, BIND, DIP, BioGRID, KEGG, Reactome,
IntAct, EcoCyc, NCI-Nature Pathway Interaction Database, GO).
Text mining
A large body of scientific texts (SGD, OMIM, FlyBase, PubMed) are parsed to search for statistically
relevant co-occurrences of gene names.
Predicted data
Fusion-fission events
Coexpression
Searches in STRING
Netwo Data Settings
rk Legen
d
DIP: The Database of
Interacting Proteins
DIP
• The Database of Interacting Proteins (DIP) is a biological database which catalogs
experimentally determined interactions between proteins.
• It combines information from a variety of sources to create a single, consistent set of
protein–protein interactions.
• The data stored within DIP have been curated, both manually, by expert curators, and
automatically, using computational approaches that utilize the knowledge about the
protein–protein interaction networks extracted from the most reliable, core subset of the
DIP data.
• The database was initially released in 2002. As of 2014, DIP is curated by the research
group of David Eisenberg at UCLA.
DIP
• DIP is a member of the International Molecular Exchange Consortium (IMEx),a group of the
major public providers of interaction data.
• Other participating databases include the Biomolecular Interaction Network Database
(BIND), IntAct, the Molecular Interaction Database (MINT), MIPS, MPact, and BioGRID.
• The databases of IMEx work together to prevent duplications of effort, collecting data from
non-overlapping sources and sharing the curated interaction data.
• DIP is useful for understanding protein function and protein–protein relationships, studying
the properties of networks of interacting proteins, benchmarking predictions of protein–
protein interactions, and studying the evolution of protein–protein interactions.
The DIP database is composed of nodes and edges:
The DIP database is composed of three linked tables: a table of protein information, a table of protein–protein
interactions, and a table describing details of experiments detecting the protein–protein interactions.
(i) The protein information table contains protein identification codes from the SWISS-PROT, PIR and GenBank
sequence databases, as well as each protein’s gene name, description, enzyme code and cellular localization,
when known.
(ii) The interaction table describes proteins that interact from the protein information table, as well as the ranges
of amino acids and the protein domains involved in the protein–protein interaction, when known.
(iii) The experimental article table details the experiments used to detect the interactions from the interaction
table and their associated literature citations. This table includes the MEDLINE standard article code
(PMID/UID), as well as the authors, title, journal and year of publication of the article. Over 20 different
experimental techniques are represented in DIP, including co-immunoprecipitation, yeast two-hybrid and in
vitro binding assays. Where determined, a dissociation constant is also included.
Relational structure of DIP
Searching The Database
DIP can be searched in a variety of ways. One can look for interactions
involving a specific protein by entering its gene name or its accession code from
GenBank, PIR or SWISS-PROT. More general searches can be performed for
information such as organisms, protein superfamilies, keywords, experimental
techniques or literature citations.
Applications of PPI interaction
databases
Predict
functions of
gene/proteins
Network Molecular
medicine: drug mechanisms
targets: drug behind certain
repurposing phenotypes
Network biology
Systems Enhance our
understanding of
biology
model organisms
Ecological
systems
Applications
• Disease Research: widely used in identifying protein networks involved in diseases, including
cancer, neurodegenerative disorders, and infectious diseases like rheumatoid arthritis.
• Drug Target Identification: By exploring PPIs, researchers can identify potential drug targets
and understand the impact of drugs on cellular networks.
• Systems Biology: it aids in the construction of large-scale interaction networks, which are
crucial for systems biology studies aimed at understanding complex biological systems.
• Functional Genomics: Researchers can predict the function of unknown proteins based on
their interactions with well-characterized proteins.