2M Biostatistics & Research Methodology Ans PDF
2M Biostatistics & Research Methodology Ans PDF
Pawan Dhamala
RR College of Pharmacy
SEMESTER- VIII
BP801T. BIOSTATISITCS AND RESEARCH METHODOLOGY
Question Bank
SHORT ANSWERS 10 x 2 = 20Marks.
1. Multiple regression.
Solution:
When there is significant correlation between two variables, and the correlation is not
spurious, rate of change in dependent variable for a unit change in independent variable can
be estimated using regression technique. Multiple regression equation:
• In bivariate correlation, there will be one dependent variable and only one
independent variable which may be related with the dependent variable.
• There may be more than one independent variable for the causation of dependent
variable.
• If the overall correlation of all these independent factors with the dependent factor is
considered then it is known as ‘Multiple correlation’.
• The multiple correlation coefficient is represented by ‘R’ and can be estimated by
simple correlation coefficient between various variables.
• R has no sign since the correlation may be positive with one variable and may be
negative with the other R approach unit as more and more variables responsible for
the causation of the dependent variables are considered.
For example, you could use multiple regression to understand whether exam
performance can be predicted based on revision time, test anxiety, lecture attendance
and gender.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
1. It is used in pharmacy relative formulation and processing.
2. It is involved in formulating drug products in various forms.
3. Final product not only meets the requirements from the bio- availability but also from
the practical mass production criteria.
4. It helps the pharmaceutical scientist to understand theoretical formulation and the target
processing parameters which ranges for each excipients & processing factors.
Thus, there would be a population of the sampled means having its distinct variance and
mean. It may be defined as the standard deviation of such sample means of all the possible
samples taken from the same given population. SEM defines an estimate of standard
deviation which has been computed from the sample. It is calculated as the ratio of the
standard deviation to the root of sample size, such as:
SEM= S/ √𝑛
Where‘s’ is the standard deviation and n is the number of observations.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
Solution:
1. Sample size calculation for cross sectional studies/ surveys Cross sectional studies or
cross sectional survey are done to estimate a population parameter like prevalence of some
disease in a community or finding the average value of some quantitative variable in a
population. Sample size formula for qualitative variable and quantities variable are
different.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
2. Sample size calculation for case control studies: In case control studies cases (the group
with disease/condition under consideration) are compared with controls (the group without
disease/condition under consideration) regarding exposure to the risk factor under question.
The formula for sample size calculation for this design also depends on the type of variable
(qualitative or quantitative).
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
forecasting the data, analyzing the time series, and finding the causal effect dependencies
between the variables
8. Wilcoxon Rank Sum test.
The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon Test or the
Wilcoxon Rank Sum Test, is used to test whether two samples are likely to derive from the
same population (i.e., that the two populations have the same shape).
Formula: The test statistic for the Mann Whitney U Test is denoted U and is the
smaller of U1 and U2, defined below.
Where: R1 = sum of the ranks for group 1 and R2 = sum of the ranks for group 2.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
Solution: Observational studies
In an observational study, the epidemiologist simply observes the exposure and disease
status of each study participant. The two most common types of observational studies are
cohort studies and case-control studies; a third type is cross-sectional studies.
Cohort study. In a cohort study the epidemiologist records whether each study participant is
exposed or not, and then tracks the participants to see if they develop the disease of interest.
Note that this differs from an experimental study because, in a cohort study, the investigator
observes rather than determines the participants’ exposure status. After a period of time, the
investigator compares the disease rate in the exposed group with the disease rate in the
unexposed group. The unexposed group serves as the comparison group, providing an
estimate of the baseline or expected amount of disease occurrence in the community. If the
disease rate is substantively different in the exposed group compared to the unexposed group,
the exposure is said to be associated with illness.
or expected amount of exposure in that population. If the amount of exposure among the
case group is substantially higher than the amount you would expect based on the control
group, then illness is said to be associated with that exposure.
Cross-sectional study. In this third type of observational study, a sample of persons from a
population is enrolled and their exposures and health outcomes are measured simultaneously.
The cross-sectional study tends to assess the presence (prevalence) of the health outcome at
that point of time without regard to duration.
11. Sample size calculation for confidence interval.
Calculation of a 95% confidence interval when n<30 the formula:
For example, a study that has an 80% power means that the study has an 80% chance of the
test having significant results.
A high statistical power means that the test results are likely valid.
As the power increases, the probability of making a Type II error decreases.
A low statistical power means that the test results are questionable.
Statistical power helps you to determine if your sample size is large enough.
13. Pharmaceutical examples for data analysis using SPSS.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
• Reduce number of outbreaks in the hospital by predicting which patients are at
highest risk of contracting the illness.
• Reduce waiting time in emergency situations and allocate staff and resources
efficiently by forecasting number of patients that will be admitted.
• Run clinical trials to accurately test effectiveness of treatment, even on a small
sample size.
• Create a measurement instrument for a psychological feeling e.g., stress.
14. Factorial design.
Many experiments involve the study of the effects of two or more factors. Factorial designs
are most efficient for this type of experiment.
• In a factorial design, all possible combinations of the levels of the factors are
investigated in each replication.
• If there are a levels of factor A, and b levels of factor B, then each replicate contains
all ab treatment combinations. Main Effects
• The main effect of a factor is defined to be the change in response produced by a
change in the level of a factor.
• The main effect of A is the difference between the average response at A1 and A2.
• Summary
• Background/Introduction
• Implemented Methods
• Results based on Analysis
• Deliberation
• Conclusion
16. Assumptions in chi square test.
Assumption #1: One categorical variable (i.e., the variable can be dichotomous, nominal or
ordinal).
Assumption #2: You should have independence of observations, which means that there is
no relationship between any of the cases (e.g., participants).
Assumption #3: The groups of the categorical variable must be mutually exclusive.
Assumption #4: There must be at least 5 expected frequencies in each group of your
categorical variable.
17. Confidence interval.
Solution:
A confidence interval, in statistics, refers to the probability that a population parameter will
fall between a set of values for a certain proportion of times.
Key points:
• A confidence interval displays the probability that a parameter will fall between a pair
of values around the mean.
• Confidence intervals measure the degree of uncertainty or certainty in a sampling
method.
• They are most often constructed using confidence levels of 95% or 99%.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
18. Characteristics of Normal distribution data.
The normal distribution is also referred to as Gaussian or Gauss distribution. The
distribution is widely used in natural and social sciences. It is made relevant by the Central
Limit Theorem, which states that the averages obtained from independent, identically
distributed random variables tend to form normal distributions, regardless of the type of
distributions they are sampled from.
All forms of (normal) distribution share the following characteristics:
• It is symmetric.
• The mean, median, and mode are equal. ...
• Empirical rule.
• Skewness and kurtosis.
19. Applications of nonparametric tests.
Nonparametric statistical tests are employed to analyze the sampling results of solids
mixing.
These tests can be performed on the data with different kinds of scale of measurement,
such as nominal, ordinal, interval or ratio, without knowing the distribution of the
population.
20. chi square test.
There are two types of chi-square tests. Both use the chi-square statistic and distribution for
different purposes:
• A chi-square goodness of fit test determines if sample data matches a population.
• A chi-square test for independence compares two variables in a contingency table to
see if they are related. In a more general sense, it tests to see whether distributions
of categorical variables differ from each another.
21. Confidence interval.
A confidence interval is how much uncertainty there is with any particular statistic.
Confidence intervals are often used with a margin of error. It tells you how confident you
can be that the results from a poll or survey reflect what you would expect to find if it were
possible to survey the entire population. Confidence intervals are intrinsically connected to
confidence levels.
22. Features of normal distribution pattern.
All forms of (normal) distribution share the following characteristics:
1. It is symmetric. A normal distribution comes with a perfectly symmetrical shape.
2. The mean, median, and mode are equal.
3. Empirical rule.
4. Skewness and kurtosis.
23. Probability.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
Probability is simply how likely something is to happen.
Whenever we’re unsure about the outcome of an event, we can talk about the probabilities
of certain outcomes—how likely they are. The analysis of events governed by probability
is called statistics.
The best example for understanding probability is flipping a coin:
There are two possible outcomes—heads or tails.
What’s the probability of the coin landing on Heads? We can find out using the equation
P(H) = ?P(H)=?P, left parenthesis, H, right parenthesis, equals, question mark.You might
intuitively know that the likelihood is half/half, or 50%.
Probability of an event = (# of ways it can happen) / (total number of outcomes) P(A)
= (# of ways A can happen) / (Total number of outcomes).
24. Applications of SAS.
The main SAS application is to process complex raw data and generate meaningful
insights. This helps the organization to make better decisions. It helps us to mine data from
various sources, compile it and analyze it.
1. Multivariate Analysis, 2. Business Intelligence, 3. Predictive Analytics, 4. Creating
Safe Drugs & Clinical Research and Forecasting
25. Standard error of mean.
The standard error of the mean is a method used to evaluate the standard deviation of a
sampling distribution. It is also called the standard deviation of the mean and is abbreviated
as SEM. For instance, usually, the population mean estimated value is the sample mean, in
a sample space. But, if we pick another sample from the same population, it may give a
different value.
Hence, a population of the sampled means will occur, having its different variance and
mean. Standard error of mean could be said as the standard deviation of such a sample
means comprising all the possible samples drawn from the same given population. SEM
represents an estimate of standard deviation, which has been calculated from the sample.
Formula
The formula for standard error of the mean is equal to the ratio of the standard deviation to
the root of sample size.
SEM = SD/√N
Yes. Median is preferable particularly when you have some extreme low and high values in
the data distribution. When this is the case, the median is a better measure of central
tendency than the mean.
27. Optimization techniques.
• Optimization makes the perfect formulation &reduce the cost.
• Primary objective may not be optimize absolutely but to compromise effectively &
thereby produce the best formulation under a given set of restrictions.
28. 22 and 23 designs.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
A factorial design is one involving two or more factors in a single experiment. Such
designs are classified by the number of levels of each factor and the number of factors. So
a 22 factorial will have two levels or two factors and a 23 factorial will have three factors
each at two levels.
A 22 factorial design is a trial design meant to be able to more efficiently test two
interventions in one sample. For instance, testing aspirin versus placebo and clonidine
versus placebo in a randomized trial.
A 23 Example: It's clear that inpatient treatment works best, day treatment is next best, and
outpatient treatment is worst of the three. It's also clear that there is no difference between
the two treatment levels (psychotherapy and behaviour modification).
29. Applications of student‘t’ test.
The T-test is used to compare the mean of two samples, dependent or independent. It can
also be used to determine if the sample mean is different from the assumed mean. T-test
has an application in determining the confidence interval for a sample mean.
30. One tailed and Two tailed tests.
A one-tailed test is a statistical test in which the critical area of a distribution is one-sided
so that it is either greater than or less than a certain value, but not both. If the sample being
tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead
of the null hypothesis (directional hypothesis or directional test.).
A two-tailed test, in statistics, is a method in which the critical area of a distribution is
twosided and tests whether a sample is greater than or less than a certain range of values. It
is used in null-hypothesis testing and testing for statistical significance. If the sample being
tested falls into either of the critical areas, the alternative hypothesis is accepted instead of
the null hypothesis.
31. Pharmaceutical examples of optimization techniques.
• Optimization makes the perfect formulation &reduce the cost.
• Primary objective may not be optimize absolutely but to compromise effectively &
thereby produce the best formulation under a given set of restrictions.
• The term Optimize is defined as to make perfect, effective, or functional as possible.
It is the process of finding the best way of using the existing resources while taking
in to the account of all the factors that influences decisions in any experiment .
• Traditionally, optimization in pharmaceuticals refers to changing one variable at a
time, so to obtain solution of a problematic formulation.
• Modern pharmaceutical optimization involves systematic design of experiments
(DoE) to improve formulation irregularities.
• In the other word we can say that –quantitate a formulation that has been
qualitatively determined. It’s not a screening techniques.
32. Histograms.
A histogram is a graphical representation that organizes a group of data points into
userspecified ranges. Similar in appearance to a bar graph, the histogram condenses a data
series into an easily interpreted visual by taking many data points and grouping them into
logical ranges or bins.
• A histogram is a bar graph-like representation of data that buckets a range of
outcomes into columns along the x-axis.
• The y-axis represents the number count or percentage of occurrences in the data for
each column and can be used to visualize data distributions.
33. Differentiate between sample and population parameter.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
A population is the entire group that you want to draw conclusions about.
A sample is the specific group that you will collect data from. The size of the sample is
always less than the total size of the population.
Population Sample
Advertisements for IT jobs in the The top 50 search results for advertisements for IT jobs in the Netherlan
Netherlands on May 1, 2020
Songs from the Eurovision Song Winning songs from the Eurovision Song Contest that were performed i
Contest English
Undergraduate students in the 300 undergraduate students from three Dutch universities who voluntee
Netherlands your psychology research study
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
• Factorial designs are more efficient than OFAT experiments. They provide more
information at similar or lower cost. They can find optimal conditions faster than
OFAT experiments.
• Factorial designs allow additional factors to be examined at no additional cost.
• When the effect of one factor is different for different levels of another factor, it
cannot be detected by an OFAT experiment design. Factorial designs are required to
detect such interactions. Use of OFAT when interactions are present can lead to
serious misunderstanding of how the response changes with the factors.
• Factorial designs allow the effects of a factor to be estimated at several levels of the
other factors, yielding conclusions that are valid over a range of experimental
conditions.
37. Define blinding in clinical study.
• Blinding refers to the concealment of group allocation from one or more individuals
involved in a clinical research study, most commonly a randomized controlled trial
(RCT).
• A blinded (or masked) clinical trial is a field study of a drug in which the recipient
does not know if he is receiving the actual drug versus a placebo. A double-blind
clinical trial is one in which both the recipient and the administrator does not know
if the recipient is receiving the actual drug.
38. Differentiate SD and SEM.
The key differences:
• The SD quantifies scatter — how much the values vary from one another.
• The SEM quantifies how precisely you know the true mean of the population. It
takes into account both the value of the SD and the sample size.
• Both SD and SEM are in the same units -- the units of the data.
•The SEM gets smaller as your samples get larger. This makes sense, because the mean of
a large sample is likely to be closer to the true population mean than is the mean of a small
sample. With a huge sample, you'll know the value of the mean with a lot of precision even
if the data are very scattered.
•The SD does not change predictably as you acquire more data. The SD you compute from
a sample is the best possible estimate of the SD of the overall population. As you collect
more data, you'll assess the SD of the population with more precision. But you can't predict
whether the SD from a larger sample will be bigger or smaller than the SD from a small
sample. (This is not strictly true. It is the variance -- the SD squared -- that doesn't change
predictably, but the change in SD is trivial and much much smaller than the change in the
SEM.)
39. Difference between nominal and ordinal type of data.
Nominal
A nominal scale describes a variable with categories that do not have a natural order or
ranking. You can code nominal variables with numbers if you want, but the order is
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
arbitrary and any calculations, such as computing a mean, median, or standard deviation,
would be meaningless.
Ordinal
An ordinal scale is one where the order matters but not the difference between values.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
The relationship between the process inputs (material attributes and process parameters)
and the critical quality attributes can be described as the design space.
Advantages:
1. A design space can be updated over the lifecycle as additional knowledge is gained.
2. Risk assessments, as part of the risk management process, help steer the focus of
development studies and define the design space.
3. The design space associated with the control strategy ensures that the
manufacturing process produces a product that meets ◦ The Quality Target Product
Profile (QTPP) and ◦ Critical Quality Attributes (CQAs).
4. Since design spaces are typically developed at small scale, an effective control
strategy helps manage potential residual risk after development and
implementation.
5. In developing design spaces for existing products, multivariate models can be used
for retrospective evaluation of historical production data.
6. The level of variability present in the historical data will influence the ability to
develop a design space, and additional studies might be appropriate.
7. Design spaces can be based on ◦ scientific first principles and/or ◦ empirical models.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
The central composite design is the most commonly used fractional factorial design used in
the response surface model. In this design, the center points are augmented with a group of
axial points called star points. With this design, quickly first-order and second-order terms
can be estimated.
50. Define bias in clinical study.
A bias in evidence based medicine is any factor that leads to conclusions that are
systematically different from the truth. Although in general parlance “bias” has moral or
ethical implications, research bias does not refer to the researcher's character, just the validity
of the study.
51. Role of sample size in calculation of confidence interval
In order to estimate the sample size, we need approximate values of p1 and p2. The values
of p1 and p2 that maximize the sample size are p1=p2=0.5. Thus, if there is no information
available to approximate p1 and p2, then 0.5 can be used to generate the most conservative,
or largest, sample sizes.
52. Advantages and disadvantages Pie charts.
Rather than just presenting a series of numbers, a simple way to visualize statistical
information for businesses is charts and graphs. The most common of these is the pie chart.
As it shows data in slices, as it has a circular shape, its name comes from a resemblance of
the pie. When you need to present and measure simple data, pie chart works well which is
simple to create and understand. This pie chart is not suitable for the complex needs as other
visualization tools like a bar graph.
53. Explain: Range, Interquartile range and Variance
Range: the difference between the highest and lowest values.
Interquartile range: the range of the middle half of a distribution.
Variance: average of squared distances from the mean.
54. Control Space
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
Control space, including Continuous Quality Overall Summary (CQOS), is also a critical
element of QbD, and should include starting materials, intermediates, and finished
products. The strategy should include every aspect known to potentially impact the
product.
The following are suggested considerations for a control space:
Manufacturing instructions:
• Set the most appropriate parameters
• Multivariate interactions need to be evaluated
• Correct ranges of operation
• Determine the proper data for trending and analysis
• Create a control strategy:
Correlation
Pearson Correlation Coefficient.
Linear Correlation Coefficient.
Sample Correlation Coefficient.
Population Correlation Coefficient.
58. Difference between ANOVA and student t test.
©harishankar.17.abph@acharya.ac.in
Biostatistics & Research Methodology
Pawan Dhamala
RR College of Pharmacy
The Student's t test is used to compare the means between two groups, whereas ANOVA is
used to compare the means among three or more groups. A significant P value of the
ANOVA test indicates for at least one pair, between which the mean difference was
statistically significant.
59. What factors qualifies mode to be the best measure of central tendency?
The mode is the least used of the measures of central tendency and can only be used when
dealing with nominal data. For this reason, the mode will be the best measure of central
tendency (as it is the only one appropriate to use) when dealing with nominal data.
60. Define α and β error.
The probability of committing a type I error (rejecting the null hypothesis when it is actually
true) is called α (alpha) the other name for this is the level of statistical significance. The
probability of making a type II error (failing to reject the null hypothesis when it is actually
false) is called β (beta).
61. Classify observational and experimental studies.
In an observational study, values of the explanatory variable occur naturally. In this case,
this means that the participants themselves choose a method of trying to quit smoking. In
an experiment, researchers assign the values of the explanatory variable. In other words,
they tell people what method to use.
62. What is interventional study?
Intervention (or Experimental) studies differ from observational studies in that the
investigator assigns the exposure. They are used to determine the effectiveness of an
intervention or the effectiveness of a health service delivery. It approximates the controlled
experiment of basic science.
63. List the characteristics of observational studies.
Some of the characteristics of observation method of data collection are as follows:
1. Observation is a Systematic Method.
2. Observation is Specific.
3. Observation is Objective.
4. Observation is Quantitative.
5. Observation is an Affair of Eyes.
6. The Record of Observation is Made Immediately
64. Define semi logarithmic plots.
A semi-log graph is useful when graphing exponential functions. Consider a function of the
form y = bax. When graphed on semi-log paper, this function will produce a straight line
with slope log (a) and y-intercept b.
65. Application of Post Hoc tests
Post hoc (“after this” in Latin) tests are used to uncover specific differences between three
or more group means when an analysis of variance (ANOVA) F test is significant.
66. Define surrogate & direct end point.
A surrogate endpoint of a clinical trial is a laboratory measurement or a physical sign used
as a substitute for a clinically meaningful endpoint that measures directly how a patient feels,
functions or survives.
An endpoint is the primary outcome that is being measured by a clinical trial. A cancer drug,
for example, might use survival as an endpoint, comparing the five-year survival rate of
patients using an experimental therapy against the five-year survival rate of patients using
another treatment or a placebo.
©harishankar.17.abph@acharya.ac.in