0% found this document useful (0 votes)
8 views3 pages

Sheet 14

This document is an exercise sheet for a course on Statistics in Bioinformatics at TU Dortmund for the Winter Semester 2024/2025. It contains a comprehensive list of questions covering key topics such as stochastic processes, Markov chains, evolutionary models, clustering methods, and statistical analysis techniques relevant to bioinformatics. The sheet serves as a study aid for students preparing for their written exam and does not require submission or grading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Sheet 14

This document is an exercise sheet for a course on Statistics in Bioinformatics at TU Dortmund for the Winter Semester 2024/2025. It contains a comprehensive list of questions covering key topics such as stochastic processes, Markov chains, evolutionary models, clustering methods, and statistical analysis techniques relevant to bioinformatics. The sheet serves as a study aid for students preparing for their written exam and does not require submission or grading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

TU Dortmund Winter Semester 2024/2025

Department of Statistics 28.01.2025


Prof. Dr. Jörg Rahnenführer
Dr. Franziska Kappenberg

Statistics in Bioinformatics
Exercise Sheet 14

Summary and Repetition


This list of questions revisits the most important topics of the course and may serve as a part
of the preparation for the written exam.

1. What is the central dogma (of molecular biology)?

2. What is a stochastic process?

3. Name the properties of a stochastic matrix.

4. How is a Markov chain defined?

5. What are essential properties of Markov chains and their importance in bioinformatics
(typical questions)?

6. How is a stationary distribution defined? When is a stationary distribution uniquely de-


termined?

7. What do the terms recurrent, transient, irreducible, periodic, and aperiodic mean?

8. What is the difference between a silent mutation, a transition, and a transversion?

9. How can parameter estimation be done in Markov chains?

10. What is a Markov process? What is the relationship between transition matrix and rate
matrix?

11. Explain the terms PEM and PAM.

12. What is meant by a evolutionary Markov process?

13. When is a Markov process called time-reversible?

14. How do you estimate a Markov process from given data?

15. What are score matrices and their applications?

16. How can score matrices be derived?

17. Name two important evolutionary DNA models and their properties.

18. Explain how time estimation can be done with evolutionary DNA models.

19. What is meant by the Jukes-Cantor correction?

20. Name the most important phylogenetic methods.

21. Explain the maximum-parsimony and the maximum-likelihood approaches to phylogeny.


22. What is the purpose of the Needleman-Wunsch algorithm and the Smith-Waterman algo-
rithm?

23. When comparing a sequence to a database, how are the P value and the E value defined?

24. What is BLAST and how does the algorithm work?

25. What is a MA plot?

26. Name methods for normalizing microarray experiments.

27. What is kernel-based regression estimation and how is it different from the K-nearest
neighbor approach?

28. In this context, what is meant by the bias-variance tradeoff?

29. Formulate the minimization problems in local linear and local polynomial regression.

30. How is the Nadaraya-Watson kernel estimator defined?

31. Describe the concept of a variance-stabilizing normalization (VSN).

32. What is Huber’s error model for measured intensity of gene expression data and what is
the resulting transformation?

33. What does robust regression mean and what is minimized with LTS?

34. What is the basic idea of quantile normalization?

35. Explain the difference between unsupervised and supervised learning.

36. What are objectives in cluster analysis?

37. Name three important distance measures in cluster analysis. How do they differ?

38. Explain the most popular algorithms for clustering microarray data.

39. What methods can be used to determine the number of clusters? How is the average
silhouette width defined?

40. What is the basic idea of the Rand index?

41. How can you reasonably select genes in a cluster analysis for microarray samples?

42. What is the effect of clustering after supervised feature selection?

43. What are important methods for classifying samples?

44. Briefly describe linear and quadratic discriminant analysis. How does regularized discri-
minant analysis (RDA) work?

45. Explain PAM (prediction analysis for microarrays)? What is the significance of the regu-
larization parameter ∆?

46. What is a support vector machine (SVM)?

47. What is the procedure for SVMs in the case of overlapping classes?

48. How is a classification tree constructed?


49. Given a classification tree, how does class prediction work for a new observation? What
does this look like for a regression tree?

50. Given a random forest, how does construction and prediction work?

51. How to determine the amount of regularization needed in a classification procedure? How
does one proceed with PAM?

52. What is meant by nested cross-validation and what is it needed for?

53. Explain the terms sensitivity and specificity.

54. What are ROC curves?

55. How are AUC (area under the curve) and pAUC defined and what is a possible application
in microarray experiments?

56. What tests can be used to determine differentially expressed genes?

57. What does the volcano plot display?

58. What is the difference between FWER (family-wise error rate) and the FDR (false disco-
very rate)? Name a procedure for controlling each of these errors.

59. What is the underlying statistical model in the LIMMA methodology? Specify the distri-
bution assumptions.

60. List two approaches to dividing a group of patients into two subgroups based on the
distribution of expression levels of a gene.

61. Which bimodality measures from the lecture can be assigned to each of these two approa-
ches?

62. What are the two main approaches for enrichment analysis? Explain the two methods.

63. What is meant by Gene Ontology?

64. What is meant by Global Test? How can the test statistic be interpreted?

65. What are the steps of the STEM algorithm and how can you use it to analyze gene groups?

66. What is the most popular model for fitting dose-response curves? Specify the model and
the procedure for estimating its parameters.

67. What are the incidence matrix and the ancestor matrix of a DAG?

68. What is meant by Markov Chain Monte Carlo (MCMC)?

69. What is a Metropolis Hastings Sampler?

70. Name all classes of disease progression models from the lecture.

71. Explain oncogenetic trees. How is the probability calculated that a certain combination of
events occurs?

This sheet will not be corrected or graded, thus no submission is required. Note that there will
be no sample solution for this exercise sheet.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy