0% found this document useful (0 votes)
48 views35 pages

7 - Data Collection, Analysis & Interpretation

Interpretation of epidemiology data collected and analysis of the information. Designed for undergraduates
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views35 pages

7 - Data Collection, Analysis & Interpretation

Interpretation of epidemiology data collected and analysis of the information. Designed for undergraduates
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Data collection, analysis &

interpretation
Dr Peter Akera
Department of Public Health
Gulu University
29/05/2023
Data collection
• Data collection is a systematic process of gathering observations or
measurements.
• Allows you to gain first-hand knowledge and original insights into your
research problem.
• Methods and aims may differ between fields, but the overall process
of data collection remains largely the same.
Data collection
• Before you begin collecting data, you need to consider:
• The aim of the research
• The type of data that you will collect
• The methods and procedures you will use to collect, store, and process the
data
• To collect high-quality data that is relevant to your purposes, follow
these four steps.
1- Define the aim of your research
2- Choose your data collection method
3- Plan your data collection procedures
4- Collect the data
Data collection
Step 1: Define the aim of your research
• Before you start the process of data collection, you need to identify
exactly what you want to achieve.
• You can start by writing a problem statement:
• what is the practical or scientific issue that you want to address and why does
it matter?
• Help convey the issues and context that gave rise to the study
• These statements help readers anticipate the goals of each study
• Often poorly written
Data collection
• Next, formulate one or more research questions that precisely define
what you want to find out.
• Depending on your research questions, you might need to collect
quantitative or qualitative data:
• Quantitative data is expressed in numbers and graphs and is analyzed
through statistical methods.
• Qualitative data is expressed in words and analyzed through
interpretations and categorizations
Data collection
• If your aim is to test a hypothesis, measure something precisely, or
gain large-scale statistical insights, collect quantitative data.

• If your aim is to explore ideas, understand experiences, or gain


detailed insights into a specific context, collect qualitative data.

• If you have several aims, you can use a mixed methods approach that
collects both types of data
Data collection
Step 2: Choose your data collection method
• Based on the data you want to collect, decide which method is best
suited for your research.
• Experimental research is primarily a quantitative method.
• Interviews, focus groups, and ethnographies are qualitative methods.
• Surveys, observations, archival research and secondary data
collection can be quantitative or qualitative methods.
• Carefully consider what method you will use to gather data that helps
you directly answer your research questions.
Data collection
Step 3: Plan your data collection procedures
• When you know which method(s) you are using, you need to plan
exactly how you will implement them.
• What procedures will you follow to make accurate observations or
measurements of the variables you are interested in?
• For instance, if you’re conducting surveys or interviews, decide what
form the questions will take;
• if you’re conducting an experiment, make decisions about your
experimental design (e.g., determine inclusion and exclusion criteria).
Data collection
Operationalization
• Sometimes your variables can be measured directly: for example, you can
collect data on the average age of employees simply by asking for dates of
birth.
• However, often you’ll be interested in collecting data on more abstract
concepts or variables that can’t be directly observed.
• Operationalization means turning abstract conceptual ideas into measurable
observations.
• When planning how you will collect data, you need to translate the
conceptual definition of what you want to study into the operational
definition of what you will actually measure. Alcohol consumption ???
Data collection
Sampling
• You may need to develop a sampling plan to obtain data systematically.
• This involves defining a population, the group you want to draw
conclusions about, and a sample, the group you will actually collect
data from.
• Your sampling method will determine how you recruit participants or
obtain measurements for your study.
• To decide on a sampling method you will need to consider factors like
the required sample size, accessibility of the sample, and timeframe of
the data collection.
Data collection
Standardizing procedures
• If multiple researchers are involved, write a detailed manual to
standardize data collection procedures in your study.
• lays out specific step-by-step instructions so that everyone in your
research team collects data in a consistent way
• This helps you avoid common research biases like omitted variable
bias or information bias.
• This helps ensure the reliability of your data, and you can also use it
to replicate the study in the future.
Data collection
Creating a data management plan
• Before beginning data collection, you should also decide how you will
organize and store your data.
• If you are collecting data from people, you will likely need to anonymize
and safeguard the data to prevent leaks of sensitive information (e.g.
names or identity numbers).
• If you are collecting data via interviews or pencil-and-paper formats, you
will need to perform transcriptions or data entry in systematic ways to
minimize distortion.
• You can prevent loss of data by having an organization system that is
routinely backed up.
Data collection
Step 4: Collect the data
• Finally, you can implement your chosen methods to measure or
observe the variables you are interested in.
• To ensure that high quality data is recorded in a systematic way, here
are some best practices:
• Record all relevant information as and when you obtain data.
• Double-check manual data entry for errors. E.g. double data entry, etc
• If you collect quantitative data, you can assess the reliability and validity to
get an indication of your data quality. HOW?
Data collection
Examples of collecting qualitative and quantitative data
• To collect data about perceptions of health managers, you administer
a survey with closed- and open-ended questions to a sample of 300
health workers across different departments and locations.

• The closed-ended questions ask participants to rate their health


manager’s leadership skills on scales from 1–5.
• The data produced is numerical and can be statistically analyzed for averages
and patterns.
Data collection
• The open-ended questions ask participants for examples of what the
manager is doing well now and what they can do better in the future.
• The data produced is qualitative and can be categorized through content
analysis for further insights.
Data Analysis and Interpretation
• Epidemiological data analysis uses specific methods to organize,
describe, infer and summarize data.
• Epidemiological research issues include disease distribution, etiology
and risk factors, diagnosis, prevention, and treatment evaluation.
• Most enjoyable part of carrying out an epidemiologic study, since
after all of the hard work and waiting you get the chance to find out
the answers.
• Data do not, however, “speak for themselves”.
Data Analysis and Interpretation
• A new investigator, attempting to collect this reward, finds
him/herself alone with the dataset and no idea how to proceed,
results in anxiety
• Should relate to the study objectives and research questions.
• Usual analysis approach is to begin with descriptive analyses, to
explore and gain a “feel” for the data.
• Then address specific questions from the study aims or hypotheses,
from findings and questions from studies reported in the literature,
and from patterns suggested by the descriptive analyses.
Data Analysis and Interpretation
Major objectives of data analysis
1. Evaluate and enhance data quality

2. Describe the study population

3. Assess potential for bias (e.g., nonresponse, refusal, and attrition,


comparison groups)

4. Estimate measures of frequency and extent (prevalence, incidence, means,


medians)
Data Analysis and Interpretation
Major objectives of data analysis – Continued
5. Estimate measures of strength of association or effect

6. Assess the degree of uncertainty from random noise (“chance”)

7. Control and examine effects of other relevant factors

8. Seek further insight into the relationships observed or not observed

9. Evaluate impact or importance


Data Analysis and Interpretation
Data editing
• Often the need to “edit” data, both before and after they are
computerized.

• The first step is “manual” or “visual editing” - forms are reviewed to


spot irregularities and problems that escaped notice or correction
during monitoring.

• Open-ended questions, if there are any, usually need to be coded


Data Analysis and Interpretation
• Even forms with only closed-end questions having precoded
responses choices may require coding for such situations as unclear or
ambiguous responses, multiple responses to a single item, written
comments from the participant or data collector, and other situations
that arise.

• Visual editing also provides the opportunity to get a sense for how
well the forms were filled out and how often certain types of
problems have arisen.
Data Analysis and Interpretation
Data cleaning
• Range checks
• Detect and correct invalid values
• Note and investigate unusual values
• Note outliers (even if correct their presence may have a bearing on which
statistical methods to use)
• Check reasonableness of distributions and also note their form, since that will
also affect choice of statistical procedures
Data Analysis and Interpretation
Data cleaning
Consistency checks
• Examine each pair (occasionally more) of related data items in
relation to the set of usual and permissible values for the variables as
a pair.
• For example, males should not have had a hysterectomy.
Data Analysis and Interpretation
Data coding
• Data coding means translating information into values suitable for
computer entry and statistical analysis.
• All types of data (e.g., medical records, questionnaires, laboratory
tests) must be coded, though in some cases the coding has been
worked out in advance.
• The objective is to create variables from information, with an eye
towards their analysis.
• E.g. Male = 1 Female = 2 , others
Data Analysis and Interpretation
Types of variables - levels or scales of measurement
• Constructs or factors being studied are represented by “variables”.
• Variables (also sometimes called “factors”) have “values” or “levels”.
• Variables summarize and reduce data, attempting to represent the
“essential” information.
• A continuous variable takes on all values within its permissible range,
so that for any two allowable values there are other allowable values in
between.
• A continuous variable (sometimes called a “measurement variable”)
can be used in answer to the question “how much”. Height, BP
Data Analysis and Interpretation
• A discrete variable can take only certain values between its maximum
and minimum values, even if there is no limit to the number of such
values.
• Discrete variables that can take any of a large number of values are
often treated as if they were continuous.
• If the values of a variable can be placed in order, then whether the
analyst elects to treat it as discrete and/or continuous depends on the
variable’s distribution, the requirements of available analytic
procedures, and the analyst’s judgment about interpretability.
Data Analysis and Interpretation
Types of discrete variables
1. Identification – a variable that simply names each observation (e.g.,
a study identifying number) and which is not used in statistical
analysis;
2. Nominal – a categorization or classification, with no inherent
ordering; the values or the variable are completely arbitrary and
could be replaced by any others without affecting the results (e.g.,
ABO blood group, ethnicity).
Data Analysis and Interpretation
3. Ordinal – a classification in which values can be ordered or ranked;
since the coded values need only reflect the ranking they can be
replaced by any others with the same relative ranking (e.g., 1,2,5;
6,22,69; 3.5,4.2, 6.9 could all be used in place of 1,2,3). Examples are
injury severity and socioeconomic status.
4. Count – the number of entities, events, or some other countable
phenomenon, for which the question “how many” is relevant (e.g.,
parity, number of siblings); to substitute other numbers for the
variable’s value would change its meaning. In epidemiologic data
analysis, count variables are often treated as continuous, especially if
the range is lar
Data Analysis and Interpretation
Data reduction
• Data reduction seeks to reduce the number of variables for analysis
by combining single variables into compound variables that better
quantify the construct.
• Variables created during coding attempt to faithfully reflect the
original data (e.g., height, weight).
• Often these variables can be used directly for analysis, but it is also
often necessary to create additional variables to represent constructs
of interest.
• For example.
Data Analysis and Interpretation
Preparatory work – Exploring the data
• Try to get a “feel” for the data – inspect the distribution of each
variable.
• Examine bivariate scatterplots and cross classifications.
• Do the patterns make sense? Are they believable?
• Observe shape – symmetry vs. skewness, discontinuities
• Look within important subgroups
• Note proportion of missing values
Data Analysis and Interpretation
Preparatory work – Missing values
• Missing data are a nuisance and can be a problem.
• Missing responses mean that the denominators for many analyses
differ, which can be confusing and tiresome to explain.
• Also, analyses that involve multiple variables (e.g., crosstabulations,
regression models) generally exclude an entire observation if it is
missing a value for any variable in the analysis
• An analysis that makes no adjustment for the missing data will be
biased, because certain subgroups will be underrepresented in the
available data (a form of selection bias).
Descriptive analyses
• Exploration of the data at some point becomes descriptive analysis,
to examine and then to report measures of frequency (incidence,
prevalence) and extent (means, survival time), association
(differences and ratios), and impact (attributable fraction, preventive
fraction).
• These measures will be computed for important subgroups and
probably for the entire study population.
• Standardization or other adjustment procedures may be needed to
take account of differences in age and other risk factor distributions,
follow-up time, etc.
Evaluation of hypotheses
• After the descriptive analyses comes evaluation of the study
hypotheses, if the study has identified any.
• This is a more formal evaluation of potential confounding, other
forms of bias, potential alternative explanations for what has been
observed.
• One aspect of both descriptive analysis and hypothesis testing,
especially of the latter, is the assessment of the likely influence of
random variability (“chance”) on the data.
• Much of this is studied in statistics
Evaluating the role of chance
• Use tests of significance to serve as guides to caution before drawing
a conclusion, before inflating the particular to the general.
• This question is usually answered by means of a statistical test.
• A statistical test of significance is a device for evaluating the amount
of numerical data on which an observed pattern is based, to answer a
question like, “How often could such a strong association arise
completely by chance in an infinite number of analogous experiments
with the same number of subjects and the same proportion of cases
(or of exposed)?”
• P-value , 95% Confidence intervals
Interpretation of results
Key questions
1. How good are the data?
2. Could chance or bias explain the results?
3. How do the results compare with those from other studies?
4. What theories or mechanisms might account for findings?
5. What new hypotheses are suggested?
6. What are the next research steps?
7. What are the clinical and policy implications?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy