0% found this document useful (0 votes)

8 views3 pages

Sheet 14

This document is an exercise sheet for a course on Statistics in Bioinformatics at TU Dortmund for the Winter Semester 2024/2025. It contains a comprehensive list of questions covering key topics such as stochastic processes, Markov chains, evolutionary models, clustering methods, and statistical analysis techniques relevant to bioinformatics. The sheet serves as a study aid for students preparing for their written exam and does not require submission or grading.

Uploaded by

Mehedi Hasan Ridoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Sheet 14

Uploaded by

Mehedi Hasan Ridoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

TU Dortmund Winter Semester 2024/2025

Department of Statistics 28.01.2025

Prof. Dr. Jörg Rahnenführer
Dr. Franziska Kappenberg

Statistics in Bioinformatics
Exercise Sheet 14

Summary and Repetition

This list of questions revisits the most important topics of the course and may serve as a part
of the preparation for the written exam.

1. What is the central dogma (of molecular biology)?

2. What is a stochastic process?

3. Name the properties of a stochastic matrix.

4. How is a Markov chain defined?

5. What are essential properties of Markov chains and their importance in bioinformatics
(typical questions)?

6. How is a stationary distribution defined? When is a stationary distribution uniquely de-

termined?

7. What do the terms recurrent, transient, irreducible, periodic, and aperiodic mean?

8. What is the difference between a silent mutation, a transition, and a transversion?

9. How can parameter estimation be done in Markov chains?

10. What is a Markov process? What is the relationship between transition matrix and rate
matrix?

11. Explain the terms PEM and PAM.

12. What is meant by a evolutionary Markov process?

13. When is a Markov process called time-reversible?

14. How do you estimate a Markov process from given data?

15. What are score matrices and their applications?

16. How can score matrices be derived?

17. Name two important evolutionary DNA models and their properties.

18. Explain how time estimation can be done with evolutionary DNA models.

19. What is meant by the Jukes-Cantor correction?

20. Name the most important phylogenetic methods.

21. Explain the maximum-parsimony and the maximum-likelihood approaches to phylogeny.

22. What is the purpose of the Needleman-Wunsch algorithm and the Smith-Waterman algo-
rithm?

23. When comparing a sequence to a database, how are the P value and the E value defined?

24. What is BLAST and how does the algorithm work?

25. What is a MA plot?

26. Name methods for normalizing microarray experiments.

27. What is kernel-based regression estimation and how is it different from the K-nearest
neighbor approach?

28. In this context, what is meant by the bias-variance tradeoff?

29. Formulate the minimization problems in local linear and local polynomial regression.

30. How is the Nadaraya-Watson kernel estimator defined?

31. Describe the concept of a variance-stabilizing normalization (VSN).

32. What is Huber’s error model for measured intensity of gene expression data and what is
the resulting transformation?

33. What does robust regression mean and what is minimized with LTS?

34. What is the basic idea of quantile normalization?

35. Explain the difference between unsupervised and supervised learning.

36. What are objectives in cluster analysis?

37. Name three important distance measures in cluster analysis. How do they differ?

38. Explain the most popular algorithms for clustering microarray data.

39. What methods can be used to determine the number of clusters? How is the average
silhouette width defined?

40. What is the basic idea of the Rand index?

41. How can you reasonably select genes in a cluster analysis for microarray samples?

42. What is the effect of clustering after supervised feature selection?

43. What are important methods for classifying samples?

44. Briefly describe linear and quadratic discriminant analysis. How does regularized discri-
minant analysis (RDA) work?

45. Explain PAM (prediction analysis for microarrays)? What is the significance of the regu-
larization parameter ∆?

46. What is a support vector machine (SVM)?

47. What is the procedure for SVMs in the case of overlapping classes?

48. How is a classification tree constructed?

49. Given a classification tree, how does class prediction work for a new observation? What
does this look like for a regression tree?

50. Given a random forest, how does construction and prediction work?

51. How to determine the amount of regularization needed in a classification procedure? How
does one proceed with PAM?

52. What is meant by nested cross-validation and what is it needed for?

53. Explain the terms sensitivity and specificity.

54. What are ROC curves?

55. How are AUC (area under the curve) and pAUC defined and what is a possible application
in microarray experiments?

56. What tests can be used to determine differentially expressed genes?

57. What does the volcano plot display?

58. What is the difference between FWER (family-wise error rate) and the FDR (false disco-
very rate)? Name a procedure for controlling each of these errors.

59. What is the underlying statistical model in the LIMMA methodology? Specify the distri-
bution assumptions.

60. List two approaches to dividing a group of patients into two subgroups based on the
distribution of expression levels of a gene.

61. Which bimodality measures from the lecture can be assigned to each of these two approa-
ches?

62. What are the two main approaches for enrichment analysis? Explain the two methods.

63. What is meant by Gene Ontology?

64. What is meant by Global Test? How can the test statistic be interpreted?

65. What are the steps of the STEM algorithm and how can you use it to analyze gene groups?

66. What is the most popular model for fitting dose-response curves? Specify the model and
the procedure for estimating its parameters.

67. What are the incidence matrix and the ancestor matrix of a DAG?

68. What is meant by Markov Chain Monte Carlo (MCMC)?

69. What is a Metropolis Hastings Sampler?

70. Name all classes of disease progression models from the lecture.

71. Explain oncogenetic trees. How is the probability calculated that a certain combination of
events occurs?

This sheet will not be corrected or graded, thus no submission is required. Note that there will
be no sample solution for this exercise sheet.

Introduction To Bioinformatics With R A Practical Guide For Biologists (Edward Curry)
100% (1)
Introduction To Bioinformatics With R A Practical Guide For Biologists (Edward Curry)
308 pages
MCQ A
No ratings yet
MCQ A
11,493 pages
Computational Cancer Biology An Interaction Network Approach Full MOBI Ebook
100% (15)
Computational Cancer Biology An Interaction Network Approach Full MOBI Ebook
14 pages
MCQ - Biostats
No ratings yet
MCQ - Biostats
10 pages
Biostatistics Mcqs
100% (4)
Biostatistics Mcqs
6 pages
LBOE2112 Module 2 Multivariate Data Analysis - 2024-2025 - All
No ratings yet
LBOE2112 Module 2 Multivariate Data Analysis - 2024-2025 - All
155 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Analysis of Microarray Gene Expression Data Ebook Full Text
100% (18)
Analysis of Microarray Gene Expression Data Ebook Full Text
17 pages
Identifiability and Regression Analysis of Biological Systems Models Statistical and Mathematical Foundations and R Scripts 2nd Edition
100% (18)
Identifiability and Regression Analysis of Biological Systems Models Statistical and Mathematical Foundations and R Scripts 2nd Edition
14 pages
Lecture 1 Data Quality and Statistics
50% (2)
Lecture 1 Data Quality and Statistics
31 pages
Basic Principles in Bioinformatics: Understanding Microarrays
No ratings yet
Basic Principles in Bioinformatics: Understanding Microarrays
81 pages
MCQ A
No ratings yet
MCQ A
357 pages
Notes For Lectures 11 To 16 - 2024
No ratings yet
Notes For Lectures 11 To 16 - 2024
68 pages
Coulter Counter
No ratings yet
Coulter Counter
16 pages
Datascience Interview
100% (1)
Datascience Interview
31 pages
Pioneers' Volume III StatsMCQs 2024
No ratings yet
Pioneers' Volume III StatsMCQs 2024
109 pages
Basi Concepts
No ratings yet
Basi Concepts
32 pages
Course Notes
No ratings yet
Course Notes
141 pages
Sokal y Rohlf Bioestadistica
67% (3)
Sokal y Rohlf Bioestadistica
374 pages
TT Plus Catalogue RCF - ENG
No ratings yet
TT Plus Catalogue RCF - ENG
52 pages
Introduction To Bios Tatis Tic S Second
No ratings yet
Introduction To Bios Tatis Tic S Second
374 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
4 pages
Statistical For de
No ratings yet
Statistical For de
9 pages
Course Code: Qtt509 COURSE TITLE: Statistical Analysis For Decision Making
No ratings yet
Course Code: Qtt509 COURSE TITLE: Statistical Analysis For Decision Making
12 pages
Zuur 2010
No ratings yet
Zuur 2010
12 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Methods of Microarray Data Analysis III Papers From CAMDA 02 - 1st Edition Scribd PDF Download
No ratings yet
Methods of Microarray Data Analysis III Papers From CAMDA 02 - 1st Edition Scribd PDF Download
17 pages
Toaz - Info Physiotherapy Secrets by P P Sir 1 PR
No ratings yet
Toaz - Info Physiotherapy Secrets by P P Sir 1 PR
7 pages
FHA 2m
No ratings yet
FHA 2m
3 pages
ModelQuestions MID Spring2024
No ratings yet
ModelQuestions MID Spring2024
5 pages
STA3022Test2 2023 v2
No ratings yet
STA3022Test2 2023 v2
6 pages
Ridge Regression
No ratings yet
Ridge Regression
82 pages
Electrostatics (Formula Sheet)
No ratings yet
Electrostatics (Formula Sheet)
6 pages
PS2
No ratings yet
PS2
4 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
BMB402 502 Introduction To Bioinformatics Syllabus 2025
No ratings yet
BMB402 502 Introduction To Bioinformatics Syllabus 2025
11 pages
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
No ratings yet
Gene Expression Analysis: Ulf Leser and Karin Zimmermann
46 pages
BIO Final22 Questionssol
No ratings yet
BIO Final22 Questionssol
16 pages
A Protocol For Data Exploration To Avoid Common Statistical Problems
No ratings yet
A Protocol For Data Exploration To Avoid Common Statistical Problems
12 pages
Biostat Long Quiz
No ratings yet
Biostat Long Quiz
2 pages
Bioinformatics & Rational Drug Design (SIT, MLACW)
No ratings yet
Bioinformatics & Rational Drug Design (SIT, MLACW)
11 pages
Smath Studio
No ratings yet
Smath Studio
47 pages
8614 Quiz
No ratings yet
8614 Quiz
14 pages
Zuur Protocol 2010
No ratings yet
Zuur Protocol 2010
12 pages
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
No ratings yet
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
398 pages
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
Exam Paper 2 Year 6 (Math)
50% (2)
Exam Paper 2 Year 6 (Math)
7 pages
Sem2Portion-Bioinformatics and Biostatistics
No ratings yet
Sem2Portion-Bioinformatics and Biostatistics
2 pages
Biostatistics: Written by - Alomgir Hossain
No ratings yet
Biostatistics: Written by - Alomgir Hossain
7 pages
MA232 Final Exam Fall2020 Online
No ratings yet
MA232 Final Exam Fall2020 Online
9 pages
Zuur Etal (2010) - MethodsinEcologyandEvolution - A Protocol For Data Exploration To Avoid Common Statistical Problems
No ratings yet
Zuur Etal (2010) - MethodsinEcologyandEvolution - A Protocol For Data Exploration To Avoid Common Statistical Problems
12 pages
Microarray Review
No ratings yet
Microarray Review
5 pages
IFOS Presentation-PAK Mobilink0704
No ratings yet
IFOS Presentation-PAK Mobilink0704
13 pages
Hall Ticket Number:: III/IV B.Tech (Regular/Supplementary) DEGREE EXAMINATION
No ratings yet
Hall Ticket Number:: III/IV B.Tech (Regular/Supplementary) DEGREE EXAMINATION
11 pages
Assignment III
No ratings yet
Assignment III
3 pages
1756 ControlLogix Controllers
No ratings yet
1756 ControlLogix Controllers
40 pages
Iso 8503-1 - 8503-2 - Surface Roughness Comprator PDF
No ratings yet
Iso 8503-1 - 8503-2 - Surface Roughness Comprator PDF
4 pages
Journal of Bioinformatics and Computational Biology Vol. 10, No. 4 (2012) 1203002 (3 Pages) C Imperial College Press Doi
No ratings yet
Journal of Bioinformatics and Computational Biology Vol. 10, No. 4 (2012) 1203002 (3 Pages) C Imperial College Press Doi
3 pages
2.RGP Corneal Lens
No ratings yet
2.RGP Corneal Lens
13 pages
Computer Awareness: Computer Awareness For IBPS PO/MT and Clerk
No ratings yet
Computer Awareness: Computer Awareness For IBPS PO/MT and Clerk
10 pages
Intro S4HANA Using Global Bike Exercises PP Fiori en v4.2
No ratings yet
Intro S4HANA Using Global Bike Exercises PP Fiori en v4.2
16 pages
Worksheet On Force
No ratings yet
Worksheet On Force
3 pages
The Definition of El Niño: Kevin E. Trenberth
No ratings yet
The Definition of El Niño: Kevin E. Trenberth
7 pages
Nodal Analysis and (IPR, TPC) Curve
No ratings yet
Nodal Analysis and (IPR, TPC) Curve
9 pages
Gas Absorption
No ratings yet
Gas Absorption
11 pages
DIN A Rail Sections
100% (1)
DIN A Rail Sections
1 page
Shop 04 PEB Data
No ratings yet
Shop 04 PEB Data
9 pages
Eng CD 2374900 A4-3077475
No ratings yet
Eng CD 2374900 A4-3077475
4 pages
Solid State (IITian Notes - Kota)
No ratings yet
Solid State (IITian Notes - Kota)
43 pages
Icd Tutorial
No ratings yet
Icd Tutorial
42 pages
Electroválvula Honeywell TN UR
No ratings yet
Electroválvula Honeywell TN UR
20 pages
Prelims Test Series Csat 1722243977612
No ratings yet
Prelims Test Series Csat 1722243977612
3 pages
Hydraulic Power Unit: RE 51057, Edition: 2020-11, Bosch Rexroth AG
No ratings yet
Hydraulic Power Unit: RE 51057, Edition: 2020-11, Bosch Rexroth AG
20 pages
Server Information Gathering Packet v1.0
No ratings yet
Server Information Gathering Packet v1.0
12 pages
Chemical Shift
No ratings yet
Chemical Shift
10 pages
Cusps: Akshuz 09-Nov-1984 09:55:15 PM Ernakulam 76:17:0 E, 9:59:0 N Tzone: 5.5 KP (Original) Ayanamsha 23:33:6
No ratings yet
Cusps: Akshuz 09-Nov-1984 09:55:15 PM Ernakulam 76:17:0 E, 9:59:0 N Tzone: 5.5 KP (Original) Ayanamsha 23:33:6
1 page
(New) Akh-0.66k-φ Split Ct (5a) 英文
No ratings yet
(New) Akh-0.66k-φ Split Ct (5a) 英文
2 pages
IVECO Daily E6 Van Spec Sheet
No ratings yet
IVECO Daily E6 Van Spec Sheet
8 pages
Jurnal Spasial: Volume 6, Nomor 1, April
No ratings yet
Jurnal Spasial: Volume 6, Nomor 1, April
7 pages
Antennas and Wave Propagation - May - 2016
No ratings yet
Antennas and Wave Propagation - May - 2016
1 page
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Applied Iterative Methods
From Everand
Applied Iterative Methods
Louis A. Hageman
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Data Science with Machine Learning - Python Interview Questions: Python Interview Questions
From Everand
Data Science with Machine Learning - Python Interview Questions: Python Interview Questions
Vishwanathan Narayanan
No ratings yet
Survival Analysis: Models and Applications
From Everand
Survival Analysis: Models and Applications
Xian Liu
No ratings yet
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
From Everand
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sheet 14

Uploaded by

Sheet 14

Uploaded by

TU Dortmund Winter Semester 2024/2025

Department of Statistics 28.01.2025

Summary and Repetition

1. What is the central dogma (of molecular biology)?

2. What is a stochastic process?

3. Name the properties of a stochastic matrix.

4. How is a Markov chain defined?

6. How is a stationary distribution defined? When is a stationary distribution uniquely de-

8. What is the difference between a silent mutation, a transition, and a transversion?

9. How can parameter estimation be done in Markov chains?

11. Explain the terms PEM and PAM.

12. What is meant by a evolutionary Markov process?

13. When is a Markov process called time-reversible?

14. How do you estimate a Markov process from given data?

15. What are score matrices and their applications?

16. How can score matrices be derived?

19. What is meant by the Jukes-Cantor correction?

20. Name the most important phylogenetic methods.

21. Explain the maximum-parsimony and the maximum-likelihood approaches to phylogeny.

24. What is BLAST and how does the algorithm work?

25. What is a MA plot?

26. Name methods for normalizing microarray experiments.

28. In this context, what is meant by the bias-variance tradeoff?

30. How is the Nadaraya-Watson kernel estimator defined?

31. Describe the concept of a variance-stabilizing normalization (VSN).

34. What is the basic idea of quantile normalization?

35. Explain the difference between unsupervised and supervised learning.

36. What are objectives in cluster analysis?

40. What is the basic idea of the Rand index?

42. What is the effect of clustering after supervised feature selection?

43. What are important methods for classifying samples?

46. What is a support vector machine (SVM)?

48. How is a classification tree constructed?

52. What is meant by nested cross-validation and what is it needed for?

53. Explain the terms sensitivity and specificity.

54. What are ROC curves?

56. What tests can be used to determine differentially expressed genes?

57. What does the volcano plot display?

63. What is meant by Gene Ontology?

68. What is meant by Markov Chain Monte Carlo (MCMC)?

69. What is a Metropolis Hastings Sampler?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.