0% found this document useful (0 votes)

15 views6 pages

CMB Project Report

The project 'BIOACTIVITY PREDICTION APP' aims to enhance drug discovery by using machine learning models to predict the potency and solubility of drug-like compounds. By leveraging bioinformatics and databases like Chembl, the research seeks to streamline the drug discovery process, reducing time and costs associated with manual methods. The proposed models utilize Random Forest and Linear Regression algorithms to improve predictive accuracy for drug candidates.

Uploaded by

MONA KUMARI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

CMB Project Report

Uploaded by

MONA KUMARI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Date : 21 Dec 2024

Indian Statististical Institue , Kolkata

BIOACTIVITY PREDICTION IN DRUG DISCOVERY
Ranjan Kumar Choubey CS2316, Mona Kumari CS2311
Supervisor : Prof . Malay Bhattacharyya

ABSTRACT
The project “BIOACTIVITY PREDICTION APP” is one step ahead in the process of drug discovery using
bioinformatics and is used to predict the drug likeliness of the compound, predict the potency of the input
molecules and predict the solubility of the molecules. Drug discovery is a pivotal process in curing the living
beings or protecting them against the diseases and it should be as swift as possible so that lives of living beings
can be saved. In the current scenario, drug discovery is a slow process which takes a lot of time to make curable
drugs and medicines. This slow rate is unacceptable in today’s world as everything else is speeding up with the
advancement in technology. Since bioinformatics has come out as a standout field in the field of medicines, so
using the previous studies, we have made machine learning models which will speed up the process of
calculating the potency of the molecules in terms of pIC50 values and predicting the solubility of the molecules.
Keywords: Bioinformatics, Potency, Solubility, Random Forest, Linear Regression, Drug Discovery
I. INTRODUCTION
Drug discovery is a step-by-step process in which new drugs are discovered. In general, pharmaceutical
companies follow well Pharmacology and chemistry-based drug discovery approaches, and face various
difficulties in finding new drugs [1]. Purpose of drug availability to produce more drugs a short term with low
risk in bioinformatics [2]. In fact, there is now a new, different, well-known field such as computer-aid drug
design (CADD),[3],[4]. Bioinformatics is experiencing a exponential growth of biological data have favored
development of primary and secondary databases of nucleic acid sequences, protein sequences, and structures.
Some of the most popular databases include ChEMBL, GenBank, SWISS-PROT, PDB, PIR, SCOP, CATH, etc., these
information sites are available as a public domain information and hosted on various online servers throughout
the earth. We undertook a deep study on Alzheimer’s disease which attacks the single protein. When any new
disease attacks the human body, it either inhibit the protein or release it, which create an imbalance in the body
[5]. This imbalance in the body is the reason behind illness and other effects in the body. We created a dataset
in which different compounds are compared. First they are checked on Lipinski Descriptor for the drug
likeliness of the compound. Then Padel Descriptors are used to generate molecular fingerprints which are fed
to the model. Then the results are predicted.
II. PROJECT OBJECTIVE
The proposed machine learning model is highly trained, scalable, well researched, adaptive, flexible and
accurate, using the features of advanced neural networks to highly optimize the learning of the model. The
proposed research is used to predict the potency and solubility of the drug likely molecules which are cleaned
using the Lapinski Rule of Five. Generally speaking, the proposed research is of utmost benefit to the
researchers and the biologists who are manually doing these processes to discover a drug, Due to this manual
process there are increases chances of human errors which cause further delay in the drug discovery process.
Also it increases the cost of the process. Our machine learning algorithm considers only the molecular weight,
octal-water partition coefficient, number of hydrogen bond donors and number of hydrogen bond acceptors of
the molecule and input molecules must be in the form of smiles notation containing the Chembl Id of the
molecule.
The main objectives of the project are:-
1. This project is based on the applicability of the proposed machine learning algorithms that had
demonstrated their efficiency to predict potency and solubility of the drug likely molecules with a better
predictive rates.
2. To apply best machine learning procedures for prediction.

Indian Statistical Institute , Kolkata

3. We proposed the development of prediction model for predicting potency of the drug likely molecules
using Random Forest Model and development of prediction model for predicting solubility of the molecules
using Linear Regression Model.
III. WORKING PROCEDURE

Figure 1: Flow Diagram for Potency Prediction Model

Indian Statistical Institute , Kolkata

Figure 2: Flow Diagram for Solubility Prediction Model
3.1 ALGORITHM FOR POTENCY PREDICTION
Step1: Gathering the data from Chembl Database and preparing data by removing missing values.
Step 2: Perform Exploratory data analysis on the gathered data.
Step 3: Split the gathered data into Training Dataset and Testing dataset.
Step 4: Using training data we create Random Forest Model
Step 5: Using testing data we test the created Random Forest Model.
Step 6: Using the model now we predict the potency of the molecules.

3.2 ALGORITHM FOR SOLUBILITY PREDICTION

Step1: Gathering the data from Chembl Database and preparing data by removing missing values.
Step 2: Split the gathered data into Training Dataset and Testing dataset.
Step 3: Using training data we create Linear Regression Model
Step 4: Using testing data we test the created Linear Regression Model.
Step 5: Using the model now we predict the solubility of the molecules.

3.3 RANDOM FOREST

Random Forest is a popular machine learning algorithm that is part of a supervised learning strategy. Can be
used for both Scheduling as well as retrieve the problems in machine learning. It is based on the concept of
Indian Statistical Institute , Kolkata
integrated learning, which is the process of integrating multiple dividers to solve complex problems and
improve model performance. As the name suggests, "The Random Forest is a subdivision that contains a
number of decision trees for the various datasets set and takes measurement to improve the prediction
accuracy of that database." Instead of depending upon a single decision tree, the random forest takes a
prediction from each tree and is based on these multiple predictable votes and predicts the final result. Such
large number of trees in the forest leads to high accuracy and prevents the problem of overcrowding. It can also
maintain accuracy when a large portion of the data is missing.
3.4 LINEAR REGRESSION
Linear regression in machine learning helps analyzing and finding relationships and patterns in data and
eventually making educated prediction. It is one among the most known and understood algorithms in
statistics and machine learning. The linear regression algorithm shows the linear relationship between
dependent (y) and one or more independent variables (y), called linear regression. As the linear regression
reflects the linear relationship, which means it finds out how the value of the dependent variable changes in
accordance to the value of the independent variable. The linear regression model provides a sloped straight line
representing the relationship between the variables.
IV. RESULTS AND ANALYSIS
The results and discussion may be combined into a common section or obtainable separately. They may also be
broken into subsets with short, revealing captions. An easy way to comply with the conference paper
formatting requirements is to use this document as a template and simply type your text into it. This section
should be typed in character size 10pt Times New Roman.
4.1 RANDOM FOREST MODEL FOR POTENCY PREDICTION

Figure 3: Predicted vs Experimental pIC50 values of the drug likely molecules.

Figure 4: Input Data

Indian Statistical Institute , Kolkata

Figure 5: Output
4.2 LINEAR REGRESSION MODEL FOR SOLUBILITY PREDICTION

Figure 6: Predicted vs Experimental logS (solubility) values of the drug likely molecules.

Figure 7: Linear Regression Model Performance

Figure 8: Input Data

Indian Statistical Institute , Kolkata

Figure 9: Output
V. CONCLUSION
As stated above, we have created machine learning models to predict the potency and solubility of the
molecules. We have first trained the model by feeding input dataset to the model. Then we used this trained
model to make predictions for the new molecules. In the upcoming times, as there is advancement in the
technology new models will be used to make these predictions more accurate. Although, it is not so easy to
successfully predict the potency and solubility of the unknown compounds, but it will be of great benefit to the
biologists and researchers as it will exponentially speed up the process of drug discovery, which will result into
early medicines for unknown diseases thus benefiting the mankind.
VI. REFERENCES
[1] M.Iskar, G. Zeller, Zhao XM, V.Van Noort, P. Bork, “Drug discovery in the age of systems biology: the rise
of computational approaches for data integration”, Curr Opin Biotechnol 23, Pp.609–616, 2012.
[2] S.S. Ortega, L.C. Cara, M.K. Salvador, “In silico pharmacology for a multidisciplinary drug discovery
process”, Drug Metabol Drug Interact 27, Pp.199–207, 2012.
[3] C.M. Song, S.J. Lim, J.C. Tong, “Recent advances in computer aided drug design”, Brief Bioinform, 10,
Pp.579–591, 2009.
[4] A. Speck-Planche, M.N. Cordeiro, “Computer-aided drug design, synthesis and evaluation of new anti-
cancer drugs”, Curr Top Med Chem. [Epub ahead of print], 2013.
[5] Siddharthan N., M. Raja Prabhu, Balayogan S., “Bioinformatics in Drug Discovery a Revi”

Indian Statistical Institute , Kolkata

UiPath Certified Professional - Specialized AI Pro Exam Description
No ratings yet
UiPath Certified Professional - Specialized AI Pro Exam Description
15 pages
EA1006
No ratings yet
EA1006
1 page
Andrew F
No ratings yet
Andrew F
4 pages
Computational Biologist - The Comprehensive Guide: Vanguard Professionals
From Everand
Computational Biologist - The Comprehensive Guide: Vanguard Professionals
Viruti Shivan
No ratings yet
Masterarbeit / Master'S Thesis
No ratings yet
Masterarbeit / Master'S Thesis
58 pages
ENGGG
No ratings yet
ENGGG
36 pages
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet
Drugdisease 2
No ratings yet
Drugdisease 2
17 pages
Efficient Lipophilicity Prediction of Molecules Employing Deep-Learning Models
No ratings yet
Efficient Lipophilicity Prediction of Molecules Employing Deep-Learning Models
13 pages
Recent Advances in Machine-Lea
No ratings yet
Recent Advances in Machine-Lea
16 pages
Prediction Machines Applied Machine Learning For Therapeutic
No ratings yet
Prediction Machines Applied Machine Learning For Therapeutic
17 pages
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Improving The Prediction of Drug-Target Interactions Using Machine - Documentation
No ratings yet
Improving The Prediction of Drug-Target Interactions Using Machine - Documentation
45 pages
Validation Strategies For Target Prediction Methods
No ratings yet
Validation Strategies For Target Prediction Methods
12 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Activity Prediction
No ratings yet
Activity Prediction
3 pages
Link Prediction Drug Disease
No ratings yet
Link Prediction Drug Disease
21 pages
Deep Learning Assisted Compound Bioactivity Estim - 2024 - Egyptian Informatics
No ratings yet
Deep Learning Assisted Compound Bioactivity Estim - 2024 - Egyptian Informatics
9 pages
Development of Machine Learning Models For Predicting Bioactivity and Drug-Likeness of Phytochemic
No ratings yet
Development of Machine Learning Models For Predicting Bioactivity and Drug-Likeness of Phytochemic
20 pages
Abstract
No ratings yet
Abstract
8 pages
DS Report 03
No ratings yet
DS Report 03
30 pages
Research Report
No ratings yet
Research Report
35 pages
Batch 23 Research Project I
No ratings yet
Batch 23 Research Project I
64 pages
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
Machine Learning in Drug Discovery - Bridging Data
No ratings yet
Machine Learning in Drug Discovery - Bridging Data
12 pages
1 s2.0 S2949747723000106 Main
No ratings yet
1 s2.0 S2949747723000106 Main
11 pages
Effective Prediction of Adverse Covid Drug
No ratings yet
Effective Prediction of Adverse Covid Drug
37 pages
Machine Learning Based Prediction Methods in Bioinformatics
No ratings yet
Machine Learning Based Prediction Methods in Bioinformatics
34 pages
Hack o Holic
No ratings yet
Hack o Holic
6 pages
Thesis 6
No ratings yet
Thesis 6
62 pages
Early Drug Discovery
No ratings yet
Early Drug Discovery
16 pages
1 2021 ML
No ratings yet
1 2021 ML
53 pages
J Jtbi 2017 01 019
No ratings yet
J Jtbi 2017 01 019
24 pages
Papd 11
No ratings yet
Papd 11
13 pages
J Bbagen 2020 129545
No ratings yet
J Bbagen 2020 129545
18 pages
Preview 2022 Machine Learning For Drug Discovery Melo
No ratings yet
Preview 2022 Machine Learning For Drug Discovery Melo
26 pages
Machine Learning in Drug Discovery A Cri
No ratings yet
Machine Learning in Drug Discovery A Cri
11 pages
Bbab 355
No ratings yet
Bbab 355
21 pages
Chemistry Centric Explanation of Machine 2021 Artificial Intelligence in The
No ratings yet
Chemistry Centric Explanation of Machine 2021 Artificial Intelligence in The
4 pages
Drug Recommendation System in Medical Emergencies Using Machine Learning
No ratings yet
Drug Recommendation System in Medical Emergencies Using Machine Learning
6 pages
Biology Project On Ai in Medicine
No ratings yet
Biology Project On Ai in Medicine
10 pages
Vikash Rai Project Report
No ratings yet
Vikash Rai Project Report
53 pages
Lipid Patient Prediction Using Machine Learning
No ratings yet
Lipid Patient Prediction Using Machine Learning
34 pages
Molecular Simulations and ML in Drug Discovery
No ratings yet
Molecular Simulations and ML in Drug Discovery
11 pages
Physicochemical and Biomimetic Properties in Drug Discovery: Chromatographic Techniques for Lead Optimization
From Everand
Physicochemical and Biomimetic Properties in Drug Discovery: Chromatographic Techniques for Lead Optimization
Klara Valko
No ratings yet
Bioinformatics Scientist - The Comprehensive Guide: Vanguard Professionals
From Everand
Bioinformatics Scientist - The Comprehensive Guide: Vanguard Professionals
Viruti Shivan
No ratings yet
Final Year PPT 3
No ratings yet
Final Year PPT 3
18 pages
Character N-Gram Model For Toxicity Prediction
No ratings yet
Character N-Gram Model For Toxicity Prediction
8 pages
Final PROJECT-1
No ratings yet
Final PROJECT-1
10 pages
Clyde Uchicago 0330D 16709
No ratings yet
Clyde Uchicago 0330D 16709
269 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Predicting Disease With Machine Learning
No ratings yet
Predicting Disease With Machine Learning
20 pages
Project Proposal Chi
No ratings yet
Project Proposal Chi
6 pages
Bioinformatics: Algorithms, Coding, Data Science And Biostatistics
From Everand
Bioinformatics: Algorithms, Coding, Data Science And Biostatistics
Rob Botwright
No ratings yet
Biostatistics with Python: Apply Python for biostatistics with hands-on biomedical and biotechnology projects
From Everand
Biostatistics with Python: Apply Python for biostatistics with hands-on biomedical and biotechnology projects
Darko Medin
No ratings yet
Drug Discovery and Drug Identification Using AI
No ratings yet
Drug Discovery and Drug Identification Using AI
3 pages
AI Drug Design
No ratings yet
AI Drug Design
13 pages
Machine Learning in Drug Discovery and Development Part 1: A Primer
No ratings yet
Machine Learning in Drug Discovery and Development Part 1: A Primer
14 pages
Synopsis
No ratings yet
Synopsis
6 pages
Artificial Intelligence For Prediction of Biological 2vipcq6k
No ratings yet
Artificial Intelligence For Prediction of Biological 2vipcq6k
22 pages
Btae 271
No ratings yet
Btae 271
7 pages
Enhancement of Road Safety in The University of The Philippines Diliman Campus Through Effective Data Management
No ratings yet
Enhancement of Road Safety in The University of The Philippines Diliman Campus Through Effective Data Management
12 pages
ATM Advantages
No ratings yet
ATM Advantages
15 pages
Village Study Assignment (2022-24 Batch) - Final - Updated
No ratings yet
Village Study Assignment (2022-24 Batch) - Final - Updated
75 pages
Program Evaluation For Social Workers: Foundations of Evidence Based Programs 7th Edition, (Ebook PDF
100% (1)
Program Evaluation For Social Workers: Foundations of Evidence Based Programs 7th Edition, (Ebook PDF
53 pages
Important Questions of CSV File in Python
75% (4)
Important Questions of CSV File in Python
9 pages
Examples of An Evaluation Essay
100% (2)
Examples of An Evaluation Essay
7 pages
PMBOK - Estimate Resources
No ratings yet
PMBOK - Estimate Resources
8 pages
NLPG Application Form Questions 13012022 0
No ratings yet
NLPG Application Form Questions 13012022 0
19 pages
Assignment Activity Unit 1 by Esmael Musa
No ratings yet
Assignment Activity Unit 1 by Esmael Musa
7 pages
Unit - 1 (PPS)
100% (1)
Unit - 1 (PPS)
19 pages
DWM Exp 1-2
No ratings yet
DWM Exp 1-2
9 pages
Stucor TT R21 Am23 1
No ratings yet
Stucor TT R21 Am23 1
51 pages
Sri Lankan International School - Jeddah: Worksheet - Monitoring & Control Systems
No ratings yet
Sri Lankan International School - Jeddah: Worksheet - Monitoring & Control Systems
4 pages
ABAP Performance Check
100% (1)
ABAP Performance Check
22 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
Big Data Tools and Applications Assignment
No ratings yet
Big Data Tools and Applications Assignment
10 pages
941 - ss2 Lesson Note On Types of File Organization
No ratings yet
941 - ss2 Lesson Note On Types of File Organization
2 pages
swr302 pt2
No ratings yet
swr302 pt2
3 pages
Neologism
100% (4)
Neologism
19 pages
Live Expand RedHat-based Linux LVM Volume and Filesystem On VMWare Virtual Machines
No ratings yet
Live Expand RedHat-based Linux LVM Volume and Filesystem On VMWare Virtual Machines
6 pages
Preguntas Tests Udemy
No ratings yet
Preguntas Tests Udemy
338 pages
Tableau Pulse Datasheet
No ratings yet
Tableau Pulse Datasheet
2 pages
Eapp Tables
No ratings yet
Eapp Tables
3 pages
Nursing Research Data Collection
100% (3)
Nursing Research Data Collection
220 pages
Feb 13 Day 1 2222222222
No ratings yet
Feb 13 Day 1 2222222222
4 pages
Database Quiz Questions
No ratings yet
Database Quiz Questions
4 pages
SqlDependency - Start ( - Connect) Makes These DB Calls: Select Is - Broker - Enabled
No ratings yet
SqlDependency - Start ( - Connect) Makes These DB Calls: Select Is - Broker - Enabled
4 pages
Chapter 3 Research Method
100% (1)
Chapter 3 Research Method
26 pages
Big Data Analytics For Security
No ratings yet
Big Data Analytics For Security
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CMB Project Report

Uploaded by

CMB Project Report

Uploaded by

Date : 21 Dec 2024

Indian Statististical Institue , Kolkata

Indian Statistical Institute , Kolkata

Figure 1: Flow Diagram for Potency Prediction Model

Indian Statistical Institute , Kolkata

3.2 ALGORITHM FOR SOLUBILITY PREDICTION

3.3 RANDOM FOREST

Figure 3: Predicted vs Experimental pIC50 values of the drug likely molecules.

Figure 4: Input Data

Indian Statistical Institute , Kolkata

Figure 7: Linear Regression Model Performance

Figure 8: Input Data

Indian Statistical Institute , Kolkata

Indian Statistical Institute , Kolkata

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.