0% found this document useful (0 votes)

25 views6 pages

Thesis Proposal Zhiyun Gong Revised

This thesis proposal focuses on applying machine learning and Bayesian optimization to enhance scientific research, particularly in biological studies. It aims to classify time-series data from IEF experiments and develop a user-friendly tool for optimizing experimental parameters using both sequential and parallel Bayesian optimization methods. The project includes creating a Shiny app for easy access and implementation of these techniques, with preliminary results indicating promising accuracy in classification tasks.

Uploaded by

appa londhe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

Thesis Proposal Zhiyun Gong Revised

Uploaded by

appa londhe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Zhiyun Gong MS Thesis Proposal 15 December 2020

Application of Machine Learning and Bayesian Optimization in Scientific Research

Background
Machine learning is a group of artificial intelligence algorithms focusing on training computational
models to learn patterns from data to help people make predictions or generate new insights into the
problem, and it has been widely applied to different areas of study. In biological research, there are
circumstances where the researchers need to assess the quality of experimental results, however,
sometimes a simple numeric quality measurement is not comprehensive enough, or when the data is high-
dimensional and hard to visualize, then making decisions on the data become challenging for
experimental researchers. For example, in 2D Differential In-Gel Electrophoresis (2D-DIGE)
experiments, the results from the first dimension of separation by isoelectric focusing (IEF) will directly
affect the quality of the final gel electrophoresis image. However, there lacks a consensus of how should
those IEF time-series giving rise to the high-quality separation of proteins look like. Therefore, we
believe that supervised machine learning may be a good choice to help classify the time-series data, make
decisions on whether the more expensive and time-consuming second dimension would be worthwhile.

Another type of problem faced in scientific research is the optimization of experimental parameters, to
which the solution is usually very empirical in the real-world wet-lab experiments given that exhaustive
search might be too expensive. We propose that Bayesian optimization (BO), which utilizes the historical
data as prior knowledge, fits a probabilistic surrogate function, and suggests a new candidate likely to
improve the modeling of the objective function for evaluation according to the posterior, could be a good
solution to this optimization problem.

BO is usually conventionally a sequential process, where only one new candidate is suggested in each
round after the previous evaluation is completed. However, in the wet lab research environment, there are
scenarios where the researchers need to tweak different parameters in order to get results with satisfactory
quality, while multiple evaluations can be performed in parallel. For example, there may be multiple
instruments of the same model that the researchers have access to or some instruments may have the
capacity to perform experiments with multiple samples at the same time. Therefore, the parallel version of
the Bayesian optimization which incorporates multiple evaluations and suggests a batch of new
experiments to be run next seems to be a more reasonable and efficient choice, which may help save a
significantly large amount of time. Specifically, in the first case, one instrument may finish one
evaluation earlier than the others even if they started at the same time. Then it's reasonable to incorporate
the newly available data point as soon as possible to the prior knowledge rather than let it stand and wait
for the others to finish. Thus this would be a suitable circumstance for implementing the asynchronous
parallel optimization. In contrast, in the second case, the synchronous version of the optimization would
be a better fit.

There are currently several packages in the R community (Yan, 2016; Wilson, 2020; Bischl et al., 2017;
Roustant, Ginsbourger & Deville, 2012; Kuhn, 2020). Among them, ParBayesianOptimization, mlrMBO,
and DiceOptim support batch parallel optimization, while the others only work in sequential mode.
However, all of these packages require the users to be familiar with writing objectives functions in R and
incorporate them into the specific optimization framework, which might not be convenient for many

1
Zhiyun Gong MS Thesis Proposal 15 December 2020

experimental researchers. Also, asynchronous parallel optimization is not supported by any of the existing
libraries.
Objectives
There are two main objectives of this thesis as summarized below. And the first aim is completed except
for adding more data to the model in the future for improving its performance.

1. We aimed to classify the time series data generated by IEF experiments into good and bad groups using
supervised machine learning methods, which may provide the researchers with the prediction of whether
an IEF experiment would give rise to good image quality if the second dimension SDS-PAGE separation
were performed.

2. We also aim to implement and integrate different variants of Bayesian Optimization, and provide a
product which the user can utilize either by directly running the function programmatically or by using a
user-friendly interface. Using this tool, the user will be able to easily pass in a tabular file containing
experimental results, specify the ranges and types of all the parameters to be optimized, choose the
suitable variant of the algorithm and the surrogate model, as well as specify the batch size and the
maximum number of experiments allowed. The duration of each round of the optimization may depend
on the nature of the actual experimental instrument, and in many cases, this time could be long. So
another feature of the product we propose to develop is the ability to save interim results and load them
back to resume the optimization process from it stopped without losing the historical steps.

Current results
1. Classification of Isoelectric Focusing data of 2D-DIGE (20 Spring)
We used Random Forest as a baseline model and implemented 1-Dimensional Convolutional
Neural Network and Long Short-Term Memory models to map the time series data to binary
labels of the resulting 2D images: good or bad. Our preliminary results suggest that it might be
possible to build a classifier that achieves moderate to high accuracy (80-85%) with good AUCs
(~0.8) using CNNs or RNNs. Both of these models outperform the baseline model, Random
Forests.

We also constructed a prototype web app using the Shiny R package (See Fig 1). This app would
enable users to upload raw IEF experiment data and get prediction results for all lanes of the
experiment by the two trained NN models in seconds.

2
Zhiyun Gong MS Thesis Proposal 15 December 2020

3
Zhiyun Gong MS Thesis Proposal 15 December 2020

2. A prototype of the Shiny-based optimization app (20 Fall)

A prototype of the Shiny app was developed in the 20 Fall semester, basically having achieved
what we want for the interface, where the users can start by setting up a new optimization job by
specifying the types and constraints of all parameters or resuming a previous job. There are
currently four infill methods Expected Improvement, Upper Confidence Bound, Probability of
Improvement, and Thompson Sampling available for choice. Once the results of the new
experimental parameters are evaluated, the users can input the quality measures back to the app,
append the new records to the historical experiments, and decide whether to proceed with
another iteration of optimization.

Methods
1. Algorithms to implement
Sequential, synchronous, and asynchronous parallel versions of Bayesian Optimization
algorithms will be all included in the package (Kandasamy et al., 2017, 2018). In terms of the
surrogate function, a Gaussian Process regression model with Radial Basis Function kernel is
implemented (by kernlab in R and GPy in Python) to approximate the objective function using
prior knowledge with uncertainty(Karatzoglou et al., 2004; GPy, 2012). Then, the next
combination(s) of parameters to be evaluated will be estimated according to an acquisition
function (Expected Improvement, Upper Confidence Bound, or Probability of Improvement) or
Thompson Sampling from the posterior. Also, we will try to incorporate the re-implemented
IMGPO algorithm which is currently available in Matlab, and a derived batch version of it by
Trevor into our package (Kawaguchi, Kaelbling & Lozano-Pérez, 2016). If time allows,

2. Design of the products

There are two possible products from this project. One is an R library, which requires the user to
be comfortable with manually run the optimization functions in R. The package will encompass
the main function, which can be directly called within the user's code either by directly passing
the parameters or input the parameters using the embedded Shiny app.

The second is a fully functional Shiny app with a more visually intuitive interface. The users can
complete multiple rounds of optimization completely without writing any codes. There will be a
dashboard on the user interface showing the evaluated sets of parameters and the next set or batch

4
Zhiyun Gong MS Thesis Proposal 15 December 2020

of parameters to be evaluated next suggested by the algorithm. Once getting satisfactory results,
the user can stop the evaluation and download a report containing all of the parameter
combinations and their corresponding objective value.

If time allows, we also plan to deploy the Shiny app to an online server. Therefore, the users do
not have to install the package and run it locally and will be able to re-upload their previous result
and resume from where they left, given that in many cases it can take a relatively long period of
time to complete a single round in real-world wet lab settings.

3. Testing on simulated/real-world dataset

After implementing the algorithms and properly packaging the functions, the package will be
tested on several datasets with different modes to validate the optimization efficiency of the
algorithms as well as the overall user experience of the whole product. First, we will test the
method of the historical MALDI-ToF experiment data from Prof. Alan Russell's lab with 120
combinations of 4 parameters, as well as a directed protein evolution dataset with 4 mutable loci
published by Wu et al. (2016). Also, we can possibly use the method to help select parameters
using the asynchronous batch algorithm and run new High-Performance Liquid Chromatography
(HPLC) experiments on Emerald Cloud Lab.

Timeline
Time period/ Deadline Tasks
Mid December Proposal final draft ( to the committee and academic advisor)
Winter break (23 Dec 2020 – 31 Documentation of the functions in the package, writing
Jan 2021) introduction and methods sections of the thesis; Comparison of
different methods on existing datasets;
1 – 28 Feburary Testing on real-world experiments; Writing results section of the
thesis
Beginning of March Progression report submission
1 – 31 March Trying to publish the app online; Running additional
experiments and making adjustments according to the report
feedback
1 – 30 April Writing up, final revisions
Beginning of May Thesis defense; thesis document submission

References
Bischl, B., Richter, J., Bossek, J., Horn, D., et al. (2017) mlrMBO: A Modular Framework for Model-
Based Optimization of Expensive Black-Box Functions. [Online]. Available from:
http://arxiv.org/abs/1703.03373.
GPy (n.d.) {GPy}: A Gaussian process framework in python.
Kandasamy, K., Krishnamurthy, A., Schneider, J. & Poczos, B. (2017) Asynchronous Parallel Bayesian
Optimisation via Thompson Sampling. International Conference on Artificial Intelligence and
Statistics, AISTATS 2018. [Online] 133–142. Available from: http://arxiv.org/abs/1705.09236.
Kandasamy, K., Krishnamurthy, A., Schneider, J. & Poczos, B. (2018) Parallelised Bayesian
Optimisation via Thompson Sampling Kirthevasan. International Conference on Artificial
Intelligence and Statistics. 84, 133–142.

5
Zhiyun Gong MS Thesis Proposal 15 December 2020

Karatzoglou, A., Smola, A., Hornik, K. & Zeileis, A. (2004) kernlab -- An {S4} Package for Kernel
Methods in {R}. Journal of Statistical Software. [Online] 11 (9), 1–20. Available from:
http://www.jstatsoft.org/v11/i09/.
Kawaguchi, K., Kaelbling, L.P. & Lozano-Pérez, T. (2016) Bayesian Optimization with Exponential
Convergence. Advances in Neural Information Processing Systems. [Online] 2015-Janua, 2809–
2817. Available from: http://arxiv.org/abs/1604.01348.
Kuhn, M. (2020) tune: Tidy Tuning Tools. [Online]. Available from:
https://cran.r-project.org/package=tune.
Roustant, O., Ginsbourger, D. & Deville, Y. (2012) DiceKriging , DiceOptim : Two R Packages for the
Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization. Journal of
Statistical Software. [Online] 51 (1), 1–55. Available from: doi:10.18637/jss.v051.i01.
Wilson, S. (2020) ParBayesianOptimization: Parallel Bayesian Optimization of Hyperparameters.
[Online]. Available from: https://github.com/AnotherSamWilson/ParBayesianOptimization.
Wu, N.C., Dai, L., Olson, C.A., Lloyd-Smith, J.O., et al. (2016) Adaptation in protein fitness landscapes
is facilitated by indirect paths. eLife. [Online] 5 (JULY). Available from: doi:10.7554/eLife.16965
[Accessed: 24 June 2020].
Yan, Y. (2016) rBayesianOptimization: Bayesian Optimization of Hyperparameters. [Online]. Available
from: https://cran.r-project.org/package=rBayesianOptimization.

Statistical Learning in Genetics An Introduction Using R Complete PDF Download
100% (18)
Statistical Learning in Genetics An Introduction Using R Complete PDF Download
15 pages
Benchmark Functions PDF
No ratings yet
Benchmark Functions PDF
47 pages
Abdulbaset Saad PHD 2018-5-1
No ratings yet
Abdulbaset Saad PHD 2018-5-1
180 pages
A Literature Survey of Benchmark Functions For Global Optimisation Problems PDF
No ratings yet
A Literature Survey of Benchmark Functions For Global Optimisation Problems PDF
45 pages
Gonzalez 2021
No ratings yet
Gonzalez 2021
67 pages
Kim2024 - ADAM Optimization With Adaptive Batch Selection
No ratings yet
Kim2024 - ADAM Optimization With Adaptive Batch Selection
42 pages
Hyperopt A Python Library For Model Selection and
No ratings yet
Hyperopt A Python Library For Model Selection and
25 pages
Practical Optimization Methods With Mathematica Applications PDF
100% (1)
Practical Optimization Methods With Mathematica Applications PDF
729 pages
An Empirical Study of Bayesian Optimization: Acquisition Versus Partition
No ratings yet
An Empirical Study of Bayesian Optimization: Acquisition Versus Partition
25 pages
Computer Experiments
No ratings yet
Computer Experiments
48 pages
Project (Burj Al Arab) Updated
78% (23)
Project (Burj Al Arab) Updated
129 pages
CSC 832 GROUP 1 Assignment
No ratings yet
CSC 832 GROUP 1 Assignment
39 pages
18 Ba1110
No ratings yet
18 Ba1110
25 pages
Bayesian Optimization
No ratings yet
Bayesian Optimization
15 pages
Statistical Learning in Genetics An Introduction Using R Reference Book Download
No ratings yet
Statistical Learning in Genetics An Introduction Using R Reference Book Download
16 pages
An Intelligent Feature Selection Approach Based On A Novel Improve Binary Sparrow Search Algorithm For COVID-19 Classification
No ratings yet
An Intelligent Feature Selection Approach Based On A Novel Improve Binary Sparrow Search Algorithm For COVID-19 Classification
21 pages
829-Article Text-5973-2-10-20210114
No ratings yet
829-Article Text-5973-2-10-20210114
11 pages
NeurIPS 2022 Bayesian Active Learning With Fully Bayesian Gaussian Processes Paper Conference
No ratings yet
NeurIPS 2022 Bayesian Active Learning With Fully Bayesian Gaussian Processes Paper Conference
13 pages
Bayesian Optimization For Adaptive Experimental Design A Review
No ratings yet
Bayesian Optimization For Adaptive Experimental Design A Review
12 pages
Accelerating Bayesian Optimal Experimental Design With Derivative-Informed Neural Operators
No ratings yet
Accelerating Bayesian Optimal Experimental Design With Derivative-Informed Neural Operators
29 pages
4-2 Generalizing Bayesian Optimization With Likelihood-Free Inference and Decision-Theoretic Entropies
No ratings yet
4-2 Generalizing Bayesian Optimization With Likelihood-Free Inference and Decision-Theoretic Entropies
45 pages
Developing A Predictive Model of Hydro Electricty Output
No ratings yet
Developing A Predictive Model of Hydro Electricty Output
23 pages
Evaluation of Bayesian Optimization Applied To Discrete-Event Simulation
No ratings yet
Evaluation of Bayesian Optimization Applied To Discrete-Event Simulation
9 pages
15 Ba966si
No ratings yet
15 Ba966si
32 pages
Chapter 13
No ratings yet
Chapter 13
12 pages
Prior Processes and Their Applications Nonparametric Bayesian Estimation - 2nd Edition ISBN 3319327887, 9783319327884 Chapter-by-Chapter Download
No ratings yet
Prior Processes and Their Applications Nonparametric Bayesian Estimation - 2nd Edition ISBN 3319327887, 9783319327884 Chapter-by-Chapter Download
16 pages
10 21105 Joss 06372
No ratings yet
10 21105 Joss 06372
8 pages
Vikram Mullachery Aniruddh Khera Amir Husain: Bayesian Neural Networks
No ratings yet
Vikram Mullachery Aniruddh Khera Amir Husain: Bayesian Neural Networks
16 pages
Yang 2015
No ratings yet
Yang 2015
18 pages
3778-Full Paper-13068-1-10-20230828
No ratings yet
3778-Full Paper-13068-1-10-20230828
5 pages
Optimization in Industrial Engineering Sqp-Methods
No ratings yet
Optimization in Industrial Engineering Sqp-Methods
30 pages
A Framework For Search and Application Agnostic Interactive Optimization
No ratings yet
A Framework For Search and Application Agnostic Interactive Optimization
9 pages
A Tutorial On Bayesian Optimization
No ratings yet
A Tutorial On Bayesian Optimization
22 pages
21IPST093
No ratings yet
21IPST093
6 pages
Egg Et Al 13
No ratings yet
Egg Et Al 13
5 pages
Journal of Statistical Software: Pyparticleest: A Python Framework For
No ratings yet
Journal of Statistical Software: Pyparticleest: A Python Framework For
25 pages
Unit 2
No ratings yet
Unit 2
18 pages
Futureinternet 12 00054 v2 PDF
No ratings yet
Futureinternet 12 00054 v2 PDF
14 pages
Power of Human-Algorithm Collaboration in Solving Combinatorial Optimization Problems
No ratings yet
Power of Human-Algorithm Collaboration in Solving Combinatorial Optimization Problems
19 pages
Objective Function Designing Led by User
No ratings yet
Objective Function Designing Led by User
6 pages
Entropy 23 00018 v2 24
No ratings yet
Entropy 23 00018 v2 24
1 page
Entropy 23 00018 v2 22
No ratings yet
Entropy 23 00018 v2 22
1 page
Pci Dss Compliance Checklist
No ratings yet
Pci Dss Compliance Checklist
9 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
Gallup Test
No ratings yet
Gallup Test
25 pages
Bayesian Optimization PDF
No ratings yet
Bayesian Optimization PDF
22 pages
EUROGEN99 MD Constrained Hybrid GA
No ratings yet
EUROGEN99 MD Constrained Hybrid GA
27 pages
Python For Power Systems Computations
No ratings yet
Python For Power Systems Computations
5 pages
Class 11 CHAPTER 04 by Arslan Saleem
No ratings yet
Class 11 CHAPTER 04 by Arslan Saleem
12 pages
Bayesian Networks, Introduction and Practical Applications (Final Draft)
No ratings yet
Bayesian Networks, Introduction and Practical Applications (Final Draft)
32 pages
Conclusions and Future Work
No ratings yet
Conclusions and Future Work
12 pages
RePAMO PDF
No ratings yet
RePAMO PDF
19 pages
Test Case Optimization A Nature Inspired Approach Using Bacteriologic Algorithm
No ratings yet
Test Case Optimization A Nature Inspired Approach Using Bacteriologic Algorithm
18 pages
Bayesian and Surroagte
No ratings yet
Bayesian and Surroagte
12 pages
Scientific and Engineering Applications Using Matlab
No ratings yet
Scientific and Engineering Applications Using Matlab
214 pages
Gaussian Process Emulation of Dynamic Computer Codes (Conti, Gosling Et Al)
No ratings yet
Gaussian Process Emulation of Dynamic Computer Codes (Conti, Gosling Et Al)
14 pages
World's Largest Science, Technology & Medicine Open Access Book Publisher
No ratings yet
World's Largest Science, Technology & Medicine Open Access Book Publisher
36 pages
Properties of KMnO4 and K2Cr2O7.PDF-65
No ratings yet
Properties of KMnO4 and K2Cr2O7.PDF-65
7 pages
Recent Developments and Challenges in Surrogate Model Based Optimal Design of Engineering Systems
No ratings yet
Recent Developments and Challenges in Surrogate Model Based Optimal Design of Engineering Systems
9 pages
Optimization of An Electric Apparatus Using Genetic Algorithms
No ratings yet
Optimization of An Electric Apparatus Using Genetic Algorithms
5 pages
Sowtware Testing
No ratings yet
Sowtware Testing
9 pages
METERIAL REQUISITION FIL 2015-2016
No ratings yet
METERIAL REQUISITION FIL 2015-2016
329 pages
Establishing OPC UA Connectivity With Rockwell Automation® Integrated Architecture
No ratings yet
Establishing OPC UA Connectivity With Rockwell Automation® Integrated Architecture
3 pages
HG Grade 3
No ratings yet
HG Grade 3
3 pages
Eyal Lederman - Process Approach in PT
100% (1)
Eyal Lederman - Process Approach in PT
72 pages
DCDS2009 - Medina - Weber - Simon - Iung Review HAL
No ratings yet
DCDS2009 - Medina - Weber - Simon - Iung Review HAL
6 pages
Quantam Computers
No ratings yet
Quantam Computers
21 pages
Loft D55 Spec Sheet
No ratings yet
Loft D55 Spec Sheet
5 pages
ED-UCCP-201401A-Packaged Water Cool PDF
No ratings yet
ED-UCCP-201401A-Packaged Water Cool PDF
38 pages
Catch Up Friday Research
No ratings yet
Catch Up Friday Research
1 page
What Is Behavioral Finance
No ratings yet
What Is Behavioral Finance
10 pages
Julia de Burgos Biography - Bilingual
No ratings yet
Julia de Burgos Biography - Bilingual
2 pages
Dissertation Akash
No ratings yet
Dissertation Akash
106 pages
Confirmation - Flight Booking - Etihad
No ratings yet
Confirmation - Flight Booking - Etihad
2 pages
ERP in FMCG Company
No ratings yet
ERP in FMCG Company
48 pages
OOPS Lab File
No ratings yet
OOPS Lab File
60 pages
Bellman Ford
No ratings yet
Bellman Ford
36 pages
FCE Sample Use of English 1, Twins, Edinburugh, Languages
No ratings yet
FCE Sample Use of English 1, Twins, Edinburugh, Languages
6 pages
Basukukya
No ratings yet
Basukukya
9 pages
Occupational Health and Safety Policy For The National Department of Health
No ratings yet
Occupational Health and Safety Policy For The National Department of Health
14 pages
Matteson Thesis
No ratings yet
Matteson Thesis
37 pages
MSCB Thesis AGuo
No ratings yet
MSCB Thesis AGuo
33 pages
SPM Time Card Management
No ratings yet
SPM Time Card Management
12 pages
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
No ratings yet
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
2 pages
15 Advanced English Phrases For Better Expressing Emotions
No ratings yet
15 Advanced English Phrases For Better Expressing Emotions
4 pages
From The Canterbury Tales - The Prologue
No ratings yet
From The Canterbury Tales - The Prologue
24 pages
Definition: The Ability To Use Strength Quickly To Produce An Explosive Effort
No ratings yet
Definition: The Ability To Use Strength Quickly To Produce An Explosive Effort
41 pages
Final Project Report MRI Reconstruction
No ratings yet
Final Project Report MRI Reconstruction
19 pages
Sapien Labs Age of First Smartphone and Mental Wellbeing Outcomes
No ratings yet
Sapien Labs Age of First Smartphone and Mental Wellbeing Outcomes
26 pages
Marchingsquares - AlgorithmExplanation
No ratings yet
Marchingsquares - AlgorithmExplanation
1 page
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
No ratings yet
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Thesis Proposal Zhiyun Gong Revised

Uploaded by

Thesis Proposal Zhiyun Gong Revised

Uploaded by

Zhiyun Gong MS Thesis Proposal 15 December 2020

Application of Machine Learning and Bayesian Optimization in Scientific Research

2. A prototype of the Shiny-based optimization app (20 Fall)

2. Design of the products

3. Testing on simulated/real-world dataset

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.