0% found this document useful (0 votes)
5 views29 pages

Biostatics Course

The document provides an overview of statistics, detailing its application across various fields and the importance of data collection, analysis, and interpretation. It distinguishes between descriptive and inferential statistics, explaining their roles in summarizing data and making predictions about populations based on samples. Additionally, it covers the methodologies for gathering qualitative and quantitative data, emphasizing the significance of reliable statistical methods in drawing conclusions and making informed decisions.

Uploaded by

Rahim Belhadj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views29 pages

Biostatics Course

The document provides an overview of statistics, detailing its application across various fields and the importance of data collection, analysis, and interpretation. It distinguishes between descriptive and inferential statistics, explaining their roles in summarizing data and making predictions about populations based on samples. Additionally, it covers the methodologies for gathering qualitative and quantitative data, emphasizing the significance of reliable statistical methods in drawing conclusions and making informed decisions.

Uploaded by

Rahim Belhadj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Section 1

Understanding Statistics

Statistics are used in virtually all scientific disciplines such as the physical and social sciences as
well as in business, medicine, the humanities, government, and manufacturing. Statistics is a branch
of applied mathematics including calculus and linear algebra that developed from the application of
mathematical tools to probability theory.
It's the idea that we can learn about the properties of large sets of objects or events (a population) by
studying the characteristics of a smaller number of similar objects or events (a sample). Gathering
comprehensive data about an entire population is too costly, difficult, or impossible in many cases
so statistics start with a sample that can be conveniently or affordably observed.
Statistics is the study of collecting data, analysing it, processing it, interpreting the results and
presenting them in such a way that the data can be understood by everyone. It is at once a science, a
method and a set of techniques. Data analysis is used to describe the phenomena studied, make
forecasts and take decisions about them. In this way, statistics is an essential tool for understanding
and managing complex phenomena.
Descriptive and Inferential Statistics
The two major areas of statistics are descriptive statistics and inferential statistics.
Descriptive statistics describes the properties of sample and population data. Inferential statistics
uses those properties to test hypotheses and draw conclusions.
Descriptive statistics include mean or average, variance, skewness, and kurtosis. Inferential statistics
include linear regression analysis, analysis of variance or ANOVA, logit/Probit models, and null
hypothesis testing.

Understanding Descriptive Statistics

Descriptive statistics help describe and explain the features of a specific data set by giving short
summaries about the sample and measures of the data. The most recognized types of descriptive
statistics are measures of center. For example, the mean, median, and mode, which are used at
almost all levels of math and statistics, are used to define and describe a data set. The mean, or the
average, is calculated by adding all the figures within the data set and then dividing by the number
of figures within the set.
For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The mean is 4 (20/5). The mode
of a data set is the value appearing most often, and the median is the figure situated in the middle of
the data set. It is the figure separating the higher figures from the lower figures within a data set.
However, there are less common types of descriptive statistics that are still very important.
People use descriptive statistics to repurpose hard-to-understand quantitative insights across a large
data set into bite-sized descriptions. A student's grade point average (GPA), for example, provides a
good understanding of descriptive statistics. The idea of a GPA is that it takes data points from a
range of individual course grades, and averages them together to provide a general understanding of
a student's overall academic performance. A student's personal GPA reflects their mean academic
performance.

What Are Descriptive Statistics?

Descriptive statistics are brief informational coefficients that summarize a given data set, which can
be either a representation of the entire population or a sample of a population. Descriptive statistics
are broken down into measures of central tendency and measures of variability (spread). Measures
of central tendency include the mean, median, and mode, while measures of variability
include standard deviation, variance, minimum and maximum variables, kurtosis, and skewness.
Descriptive statistics focus mostly on the central tendency, variability, and distribution of sample
data. Central tendency refers to the estimate of the characteristics, a typical element of a sample or
population. It includes descriptive statistics such as mean, median, and mode.
Variability refers to a set of statistics that show how much difference there is among the elements of
a sample or population along the characteristics measured. It includes metrics such as
range, variance, and standard deviation.
The distribution refers to the overall “shape” of the data. This can be depicted on a chart such as a
histogram or a dot plot and includes properties such as the probability distribution function,
skewness, and kurtosis.
Descriptive statistics can also describe differences between observed characteristics of the elements
of a data set. They can help us understand the collective properties of the elements of a data sample
and form the basis for testing hypotheses and making predictions using inferential statistics.
Types of Descriptive Statistics
All descriptive statistics are either measures of central tendency or measures of variability, also
known as measures of dispersion.
Central Tendency
Measures of central tendency focus on the average or middle values of data sets, whereas measures
of variability focus on the dispersion of data. These two measures use graphs, tables, and general
discussions to help people understand the meaning of the analyzed data.
Measures of central tendency describe the center position of a distribution for a data set. A person
analyzes the frequency of each data point in the distribution and describes it using
the mean, median, or mode, which measures the most common patterns of the analyzed data set.
Measures of Variability
Measures of variability (or measures of spread) aid in analyzing how dispersed the distribution is for
a set of data. For example, while the measures of central tendency may give a person the average of
a data set, it does not describe how the data is distributed within the set. So while the average of the
data might be 65 out of 100, there can still be data points at both 1 and 100. Measures of variability
help communicate this by describing the shape and spread of the data set. Range, quartiles, absolute
deviation, and variance are all examples of measures of variability. Consider the following data set:
5, 19, 24, 62, 91, 100. The range of that data set is 95, which is calculated by subtracting the lowest
number (5) in the data set from the highest (100).
Distribution
Distribution (or frequency distribution) refers to the number of times a data point occurs.
Alternatively, it can be how many times a data point fails to occur. Consider this data set: male,
male, female, female, female, other. The distribution of this data can be classified as:
 The number of males in the data set is 2.
 The number of females in the data set is 3.
 The number of individuals identifying as other is 1.
 The number of non-males is 4.
Univariate vs. Bivariate
In descriptive statistics, univariate data analyzes only one variable. It is used to identify
characteristics of a single trait and is not used to analyze any relations hips or causations.
For example, imagine a room full of high school students. Say you wanted to gather the average age
of the individuals in the room. This univariate data is only dependent on one factor: each person's
age. By gathering this one piece of information from each person and dividing by the total number
of people, you can determine the average age.
Bivariate data, on the other hand, attempts to link two variables by searching for correlation. Two
types of data are collected, and the relationship between the two pieces of information is analyzed
together. Because multiple variables are analyzed, this approach may also be referred to
as multivariate.
Let's say each high school student in the example above takes a college assessment test, and we
want to see whether older students are testing better than younger students. In addition to gathering
the ages of the students, we need to find out each student's test score. Then, using data analytics, we
mathematically or graphically depict whether there is a relationship between student age and test
scores.
Inferential Statistics
Inferential statistics is a tool used by statisticians to draw conclusions about the characteristics of a
population. It's drawn from the characteristics of a sample. It's also used to determine how certain
the statistician can be of the reliability of those conclusions. Statisticians can calculate the
probability that statistics will provide an accurate picture of the corresponding parameters of the
whole population from which the sample is drawn based on sample size and distribution.
Inferential statistics are used to make generalizations about large groups such as estimating average
demand for a product by surveying the buying habits of a sample of consumers or attempting to
predict future events. This might mean projecting the future return of a security or an asset class
based on returns in a sample period.
Regression analysis is a widely used technique of statistical inference. It's used to determine the
strength and nature of the relationship between a dependent variable and one or more explanatory or
independent variables. The output of a regression model is often analyzed for statistical significance.
A result from findings generated by testing or experimentation isn't likely to have occurred
randomly or by chance.
Statistical significance suggests that the results are attributable to a specific cause explained by the
data.The data studied can be of any kind, which makes statistics useful in all disciplinary fields and
explains why it is taught in all university courses, from economics to biology, psychology and of
course engineering sciences.
Statistics involves :
- Gathering data.
- Presenting and summarising the data.
- Drawing conclusions about the population studied and helping to make decisions.
What Are Descriptive Statistics?
Inferential statistics can be classified as either parametric or nonparametric. Nonparametric statistics
are most commonly used for variables at the nominal or ordinal level of measurement, which
basically means that they are used for variables that do not have a normal distribution. Statistical
significance is calculated using information contained only in the sample (rather than the population)
and may use measures of central tendency appropriate for nominal or ordinal level data (ie, the
median rather than the mean). Parametric statistics are the most common approach to inferential
statistical analysis. Parametric statistics require that the variables be measured at the interval or ratio
level. Use of parametric statistics also relies on other assumptions, such as the expectation that values
for a given variable will be normally distributed in the population. Inferential statistics encompass a
variety of statistical significance tests that investigators can use to make inferences about their
sample data. These tests can be divided into three basic categories depending on their intended
purpose: evaluating differences, examining relationships, and making predictions. The decision of
which procedure to use is determined, in part, by the investigator’s research question or research
design.
Section 2
Introduction
Many people are familiar with the term statistics. It denotes recording of numerical facts and figures,
for example, the daily prices of selected stocks on a stock exchange, the annual employment and
unemployment of a country, the daily rainfall in the monsoon season, etc. However, statistics deals
with situations in which the occurrence of some events cannot be predicted with certainty. It also
provides methods for organizing and summarizing facts and for using information to draw various
conclusions. Historically, the word statistics is derived from the Latin word status meaning state. For
several decades, statistics was associated solely with the display of facts and figures pertaining to
economic, demographic, and political situations prevailing in a country. As a subject, statistics now
encompasses concepts and methods that are of far-reaching importance in all enquires/questions that
involve planning or designing of the experiment, gathering of data by a process of experimentation or
observation, and finally making inference or conclusions by analyzing such data, which eventually
helps in making the future decision. Fact finding through the collection of data is not confined to
professional researchers. It is a part of the everyday life of all people who strive, consciously or
unconsciously, to know matters of interest concerning society, living conditions, the environment,
and the world at large. Sources of factual information range from individual experience to reports in
the news media, government records, and articles published in professional journals. Weather
forecasts, market reports, costs of living indexes, and the results of public opinion are some other
examples. Statistical methods are employed extensively in the production of such reports. Reports
that are based on sound statistical reasoning and careful interpretation of conclusions are truly
informative. However, the deliberate or inadvertent misuse of statistics leads to erroneous
conclusions and distortions of truths.
In this unit, we shall talk about the basics of statistics. We shal1 define the terms which we shall be
using again and again throughout this course. It is possible that you have read all this before. But that
might have been some years ago. So a quick look through this unit will help you to recall the relevant
facts. In case you have never beeh introduced to statistics before, this unit will gradually acquaint you
with its basic concepts. You will find that most of the terms we use in statistics are part of our daily
vocabulary. But we have to know their precise meaning before we use them in statistics.
Objectives of descriptive (or exploratory) statistics:
On reading this unit, you should be able to : distinguish between a qualitative and a quantitative
character, differentiate between a discrete and a continuous variable, draw up a frequency table and
get the relative frequencies, cumulative frequencies and frequency densities, decide upon a suitable
mode of representing a frequency distribution , diagrammatically.
- to summarise and synthesise the information contained in the statistical series and highlight its
properties.
- To suggest hypotheses about the population from which the sample is drawn.

- Tables (frequency tables, contingency tables, etc.)


- Graphs (box plots, histograms, etc.)
- Indicators (mean, correlation, etc.).- In the case of time-dependent data, we try to make predictions.
1.1 What are statistics?
Statistics is a branch of applied mathematics that involves the collection, description, analysis, and
inference of conclusions from quantitative data. The mathematical theories behind statistics rely
heavily on differential and integral calculus, linear algebra, and probability theory.
Those who work with statistics are referred to as statisticians. They’re particularly concerned with
determining how to draw reliable conclusions about large groups and general events from the
behavior and other observable characteristics of small samples. These samples represent a portion of
the large group or a limited number of instances of a general phenomenon.
Statistics is the set of scientific methods used to organise, summarise, present and analyse data
relating to the same phenomenon and which allow conclusions to be drawn and decisions to be taken.
Note 1: Statistics, which is the science just defined, should not be confused with a statistic, which is a
set of figures on a specific subject.
Section 3
1.2 Methodology
Statisticians measure and gather data about the individuals or elements of a sample and analyze this
data to generate descriptive statistics. They can then use these observed characteristics of the sample
data to make inferences or educated guesses about the unmeasured characteristics of the broader
population. These are known as the parameters.
The statistician's approach is as follows:
Data collection
Processing and analysis
Objectives Problem to be studied
Verification
Interpretation
1.2.1 Data collection
One of the main stages in a research study is data collection that enables the researcher to find
answers to research questions. Data collection is the process of collecting data aiming to gain insights
regarding the research topic. There are different types of data and different data collection methods
accordingly. However, it may be challenging for researchers to select the most appropriate type of
data collection based on the type of data that is used in the research.
The statistician must have data on the problem posed (either by a survey, an experiment, or a
historical study, etc.). This data may be qualitative or quantitative.
1.2.2 Data processing and analysis
Based on the data, the statistician proposes a number of methods (depending on the problem posed),
estimation, preview, test,....
1.2.2.1 Types of data
Before selecting a data collection method, the type of data that is required for the study should be
determined. This section aims to provide a summary of possible data types to go through the different
data collection methods and sources of data based on these categories. However, we need to
understand what data is exactly?
The embodied information in terms of figures or facts used to analyze for different calculations and
finally gain a result to address the study question or hypothesis testing is known as data (Hurrel,
2005). Data can be categorized using different ways including quantitative and qualitative.
A. Qualitative Data
Both nominal and descriptive non-numerical data which cannot be shown as numbers are
known as qualitative data in words or sentences format. This type of data answers to "how and why"
questions in a research study and mostly covers data regarding feelings, perceptions, and emotions
using unstructured approaches such as interviews for data collection. Researchers use different
methods such as using audiotapes, sketches, notes, and photographs to gather these data. Although
qualitative data can be suitable to achieve further information to explore and determine new effects
and consequences of programs on the research, and finally enhance the quality of quantitative results,
its implementation is dependent on spending a considerable amount of cost and time and the results
may not be generalizable. It means the findings of case studies can be used just for the same issues as
the general patterns for different studies. Qualitative methods encompass three main categories
including observations, document reviews, and in-depth interviews in spite of the fact that there are
less common ways to gather qualitative data. In the next section, all of the methods of data collection
are discussed.
B. Quantitative
Data Numerical data which is mathematically generated and computed is recognized as
quantitative data. There are different scales for measuring quantitative data including nominal,
ordinal, interval, and ratio scales (Kabir, 2016). Scales can be categorized into two general types
as “Rating Scales and Attitude Scales” as well. Rating scales assign a numerical value to the
points or categories to evaluate them. On the other hand, more complex methods are attitude
scales that determine the predisposition of people toward any individual, phenomenon, or object
(Taherdoost, 2016b). A qualitative method addresses the “what” question type in a study. These
approaches employ structured data collection methods and are based on random sampling. In
comparison to the qualitative methods, these methods are regarded to be cheaper, and the findings
can be standardized to achieve other results based on some criteria such as size. The findings can
be easily generalized and summarized as well. A simple comparison between the results is also
possible. Nevertheless, these methods also can face unexpected differences and some difficulties
as the implementation and investigation capacity are limited in these methods. These approaches
use different methods such as experiments and structured interviews for data collection which are
discussed in the data collection methods section.
1.3. Data collection methods
Generally, data collection methods are divided to two main categories of Primary Data
Collection Methods and Secondary Data Collection Methods. Figure 1 shows some of data
collection methods for primary and secondary data. Data that is not published yet and is the first-
hand information which is not changed by any individual is known as primary data. In other
words, researchers use different approaches to gather and collect primary data for a specific
purpose. Thus, the validity, reliability, objectivity, and authenticity of data are more in primary
data in comparison with the secondary data types. These qualitie s are important in some types of
research methods such as statistical surveys as the use of the information is specific to a problem
and cannot be provided from published references. Thus, although the research can be conducted
based on secondary data, it is not possible to achieve a reliable result without using primary data
as well. As secondary data is manipulated and changed by others. Using primary sources, helps to
gain high-quality data which can improve results, and you also have the opportunity to add further
data when required during the research procedures. Primary data collection; however, can face
difficulties in defining different terms in collecting data for example, the reasons behind data
collection, what to collect, when to collect data, and the type of data collection method. It is also
an expensive approach, obtains the majority of research budget, and needs to provide funding
resources from different agencies. You need to ensure the standard of collected data by accurately
collecting them, eliminating unnecessary data and also not using fake and cooked-up ones. To
achieve primary data, different sources can be used such as experiments, surveys, interviews, and
questionnaires (Kabir, 2016; Taherdoost, 2021). Secondary data is the data gathered from
published sources meaning that the data is already gathered by someone else for another reason
and can be used for other purposes in a research as well. In all papers, the literature review section
is based on secondary data sources. Thus, secondary data is an essential part of research that can
help to get information from past studies as basis conduction for implementing a research or as the
required background information. It can also help to design a study and provide a baseline to
compare primary results. However, it should be noted that researchers need to re-examine the
validity and reliability of these backgrounds to gain authentic results.
1.3.1. Primary data collection methods
Primary data collection is based on the processes by which you gather data yourself for your purpose
of study and no one has access to use this data until it is published and both qualitative and
quantitative approaches are used for this purpose. The main primary data collection is discussed here,
considering 14 different types are listed in figure 1. The most common types are initially explained
including questionnaires, interviews, focus groups, observation, survey, case studies, and
experimental methods in detail. Then, other methods are reviewed shortly.
1.3.1.1. Questionnaire Method
The questionnaire is one of the common devices for collecting information and a form or instrument
including a set of questions and secure answers that respondents (from a specific population) fill to
give the researcher information needed for the study. The data given from a questionnaire cannot be
achieved from the secondary resources (Pandey & Pandey, 2015). These forms are suitable to gather
both qualitative and quantitative data. Although they are not the most common methods used in
qualitative research, they are useful in case of facing a large sample in a study.
Sir Francis Galton designed a questionnaire for the first time. A questionnaire is utilized for different
purposes, although it is commonly used to gather statistical data. It can be designed for measuring
separate variables such as behaviors, preferences, and facts (Kabir, 2016). Although the preparation
and administration of a questionnaire are not hard, specific points in these processes should be
required. This form is used, normally, when it is not possible to discuss each participant personally
(Pandey & Pandey, 2015). Thus, it helps to gather data from different individuals, groups, and
companies easily. Questionnaires can be categorized based on different aspects such as types of
questions and administration modes.
A) Types of Questions
First, questions can be designed to measure variables for example in a survey. On the other hand,
questions can be based on aggregating into indexes or scales, for instance in tests. Second, question
types can be categorized into closed-ended and open-ended questions. In close-ended questions, the
respondents face a specific range of answers to choose from, but the respondent is asked to provide
formulated answers using open-ended questions. Qualitative questions are open-ended. In this type,
then, the answers should be coded into a response scale. Therefore, in comparison to the open-ended
questions, close-ended ones are pre-coded to make the work quickly be implemented.

For close-ended questions, there are four types of options to respond the questions:
● You can have a two-option as the responses possibilities which are known as dichotomous scales.
● If you add more than two options for the respondents, the scale is known as nominalpolychromous.
● In ordinal-polytomous scales, you prepare more than two options which are also ordinal.
● Finally, you can use continuous or bounded types which use a continuous scale as a possible
response case.
B) The Mode of Administration
Questionnaires can be implemented in different ways. A face-to-face questionnaire mode can be used
which provides the chance of presenting the questions orally, paper-and-pencil types can be utilized
with the items presented in the paper or computerized questionnaires for data collection (Kabir,
2016). Questionnaires can be also utilized through telephone, online, or even posting. An online
questionnaire is a cost-efficient option; however, you should consider the possibility of missing
samples due to problems with internet access. In these types, different online survey services can be
used which provide questionnaires for the purpose of study, and then the collected data can be easily
added to the analyzing software. In all these choices, it is important to secure ethical concerns such as
the confidentiality of participants. On the other hand, participants should try to answer the questions
politely and clearly.
C) General Rules for Constructing a Questionnaire
● Use simple and short questions as much as possible;
● Navigate respondents clearly to avoid any difficulty and motivate par participants through
answering questions ;
● Use understandable, simple, and clear statements for all respondents with different educational
levels;
● Utilize positive sentences;
● Do not use more than one question (double-barreled) in one item;
● Add an open-answer possibility after providing the listed answers and where possible;
● Avoid making assumptions for the respondents ;
● Try to increase reliability by appropriate word selection;
● Avoid directing the respondent to any answer using objective questions including clues,
suggestions, and hints) ;
There are also specific challenges and concerns that may be faced through designing an
appropriate questionnaire. First, the maximum respondents’ rates should be guaranteed together
with securing maximum reliability and validity as much as possible. The respondent rates can be
maximized when you:
● Can convince them you secure their information and keep their side;
● Can reward their cooperation.
On the other hand, you can gain an accurate data set considering two points:
● Prepare a suitable set of questions;
● Select appropriate sample size and type which can avoid biases and non-responded questions.
D) Advantages of Questionnaires
Questionnaires provide several merits in comparison to other survey methods as listed in the
following:
● Collecting a large amount of data from a large sample size;
● Time saver;
● Cost-effective options;
● Highly structured;
● The possibility of gaining high accurate data;
● The possibility of being carried out by other people instead of the researcher regardless of
affecting the reliability and validity term, and the possibility of group administrations;
● Analyzing the results easily by entering the achieved data to the software quickly in the majority of
cases;
The opportunity of more objective and scientific analysis;
● The achieved quantitative data can be used to compare and contrast the results of the study with
others to measure the changes;
● The possibility of achieving comprehensive design and tests, and administrating the research with
required details;
● Creating novel theories or/ and testing an existing hypothesis using the achieved quantitative data ;
● Suitable in a wide range of study fields;
● Suitable and reliable in special cases .
E) Disadvantages of Questionnaires
However, there are also several demerits that are not negligible. There are several difficulties
researchers may face using questionnaires as the following:
● Hard or inadequate to perceive gathered data in some cases such as emotional, feelings, and
behavioral changes;
● Human errors for example if the respondent is forgetful and cannot consider the whole concept
truly;
● Determining the reliability of answers is not possible;
● The possibility of misunderstanding the questions which can overshadow the answers;
● The effects of differences in human beliefs on their answers in some cases since even a standard
subject can be considered good for one group and bad for others (Kabir, 2016);
● Facing difficulties when participants need clarifications for particular questions in impersonal
administrations and the possibility of failing to answer those questions (Taherdoost, 2021);
● Low response rates if respondents’ low interests cannot be addressed to answer questions
(Frechtling, 2002);
● The possibility of illegible answers;
● Useless and wrong answers are prevalent (Pandey & Pandey, 2015).
1.2.3 Verification
The results obtained are used to draw conclusions from the starting data.
1.2.4 Interpretation
The statistician presents the significance of the results obtained and proposes solutions with
associated risk assessments, to help the user choose between the different decisions.
1.3 Statistical vocabularies
1.3.1 . Observations
These are the target data relating to a phenomenon in the course of an investigation, an experiment,
etc. They must be grouped together, corrected and ordered. They must be grouped, corrected and
ordered.
1.3.2. Interviews
In interviews, as a fundamental way of social interaction, questions are asked and data is collected
using provided answers and it is in contrast to the questionnaire with indirectly collected data
methodology. Thus, the chance of getting confidential data from interviewees is also possible;
however, it requires special skills which are not necessary for questionnaires. Researchers can
employ different methods to conduct an interview and perform them in individual, or group face-to-
face interviews, as well as not personally for example using telephone, computer, etc.
1.3.3. Observational Methods
In these techniques, first-hand data is gathered through the observation of events, behaviors,
interactions, processes, etc. directly to obtain an understanding of the concepts. For example,
observation is an appropriate technique to evaluate teaching methods in the classes. It can be used
when focus groups and interviews cannot help to gather data due to the different reasons including
times that participants:
● Are not aware of the concept;
● Are not able to talk about the concept;
● Do not prefer to discuss the concept.
It can be also utilized to explore whether a study is progressing as planned, or whether the study has
been successful or not. In the evaluation of studies, these two phases are known as formative and
summative, . It also can be helpful when the concept is unexplored or not well-known. If it is
required to explore a subject in the natural setting and the reported information can be different from
the findings of the real setting, an observational technique should be used.
This method can collect both qualitative and quantitative data. The qualitative data is gathered as a
description of events in the setting. The quantitative data can be obtained by using the duration or
frequency of the particular subjects. During this kind of systematic observation, formal and structured
instruments and protocols nominal, ordinal, ratio, and interval scales are utilized. Thus, it can be used
to record the findings template coding sheets with specific guides if the observer is not the main
researcher. On the other hand, data achieved through this method can be used in conjunction with the
quantitative findings of other methods.
Generally, observation helps the researcher to find out what is going in the surrounding environment;
however, as a data collection method, it is further than just listening and looking. This method
includes an engagement with the setting, a clear expression of the events, technical improvisations,
high attention, and good recording.
A) Advantages and Disadvantages
The observational method also possesses several pros and cons. In this section, the most important
ones are listed. The advantages are as the following:
● Gathering direct information;
● The participation of evaluators in the natural setting;
● Flexible and natural atmosphere;
● Free from biases;
● Can be generalized as large samples can be covered in the studies;
● High reliable and precise data can be achieved.
These techniques also provide some difficulties as:
● They can be time-consuming and not economical
● The training of observers is effective
● Observers can be selective and distort data
● It can be sometimes unreliable due to the misrepresenting of the qualitative data measurement
● It does not consider processes and the changes during them and cannot be appropriate for fresh
concepts
B) Special Notes
After conducting an observation, the researcher should first analyze achieved data. For this purpose,
data is summarized in a process known as data reduction and it is coded based on particular criteria to
specific categories. The reliability of data according to the agreements of independent observers
should be also considered to show how the behaviors are measured accurately. It should be noted that
the participants can act differently when they are in the research setting. These acts should be
controlled using techniques of controlling reactivity such as indirect observation, the adaptation of
participants, and unobtrusive measurements.
Biases of the observers are another important point to consider what happens when an observer's bias
can affect what behaviors to choose and record; however, it can be minimized by keeping them
unaware about the aims.
1.3.4 Survey Methods
A survey simply is an appropriate method to determine feelings, opinions, and thoughts. The aim of
the survey can be both globally and specifically. They can provide a large volume of data using
telephone calls, emails, or face-to-face interviews.
On the other hand, data can be collected in self-completion surveys or by the interviewer. A survey
can be used to explore social behaviors such as measuring the behavior of political candidates and
professional people in educational institutions. However, it is not useful when evaluating people for
government programs since in these programs, all members of the population should be studied.
Overall, in both formative and summative phases of a study, surveys are useful when it is required to
collect information from a large target population, and detailed and in-depth data are not necessary in
the project.
In a survey, a set of questions are provided to give a sample that is chosen from a specific target
population. This sample presents the characteristics and behaviors of the population. Surveys are
conducted to explore the populations' attitudes, the differences between different populations'
behavior and discover the possible changes over time by repeating surveys in regular time intervals.
Thus, the sample selection is an important stage in this process which can highly affect the findings.
Sample sizes should be chosen based on the possibility of selecting every participant with a non-zero
chance. Therefore, samples need to be chosen using a non-volunteer and non-haphazard selection
technique.
Sampling process steps can be simply listed as the following:
● Defining target population such as the number of individuals that are living in the country;
● Selecting a frame for sampling as the actual cases that we select a sample from them;
● Choosing the method of sampling which can be either a random or non-random technique;
● Measuring the appropriate sample size to avoid biases and sampling errors using the related
Formula.
Biases of participants can commonly happen, especially when they need to answer about sensitive
subjects, or when the individuals need to trust the team before giving the right answer.
Questions can be designed in different ways as the following:
● Open-ended questions in which participants answer questions in their own ways;
● Close-ended questions mostly based on yes/no or true /false answers;
● Multiple choices that provide the opportunity to choose a favorite topic by participants.
Here, as discussed in the questionnaires, the questions should be written considering several aspects
ranging from their language to their length and their presenting order. For example, sensitive
questions should be added among final questions as well. Cover letter and introduction should be also
provided as discussed in other types.
1.4. Statistical series
A set of measurements of one or more variables made on a population or sample of individuals.
1.5. Population
This is the set of elements on which the statistical study will be carried out Ω.
A population is also a set of homogeneous elements (with the same characteristics) in which we are
interested and on which our statistical study is based.
In statistics, we work with populations. The term comes from the fact that demography, the study of
human populations, played a central role in the early days of statistics, particularly through
population censuses. However, in statistics, the term population is applied to any statistical object
under study, whether students (at a university or in a country), households or any other group on
which statistical observations are made. We define the notion of population.
For example, the students in a section, the bacteria in a petri dish.......
1.6.the Sample
In statistics, a sample is a set of individuals representative of a population, drawn randomly and
exhaustively.
Exhaustive drawing of an individual: Drawing without replacement: the individual is not returned
to the population after being drawn.
Non-exhaustive drawing of an individual: Drawing with delivery. An individual can be selected
several times.
1.7. Statistical units
A population is made up of individuals. The individuals that make up this statistical population are
called statistical units.
The elements that make up a sample are also called statistical units.
A population is made up of individuals. The individuals that make up this statistical population are
called statistical units.
The elements that make up a sample are also called statistical units.
1.8. Size of a population (or sample)
Represents the number of individuals in a sample or population. It is symbolised by n in the case of a
sample and by N in the case of a population.
1.9.Character (statistical variable)
Characters fall into two broad categories.
There are certain characters which take varying forms for different individuals but cannot be
expressed numerically.The brand name of motor cars plying in an Indian city is such a character; it
may be Ambassador Contessa, Premier Padmini Deluxe, Standard Herald Gazelle, Maruti 1000 or
other. The employees in a city hospital may be observed for their smoking habits; any given
employee will then be recorded as a smoker or a non-smoker. Such a character, whose possible forms
can be distinguished verbally, but not numerically, is called a qualitative character (or attribute). On
the other hand, we can express characters like the size of families, age of teachers, lieight of students,
weight of eggs, etc., in numerical or quantitative terms. The size of a family (i.e., the number of
members in the family) will be a positive integer1,2,3, etc. The age of a teacher may be given in
years or in years and months. The height of a student may be given in centimetres and may be
rounded off to the nearest centimetre. The weight of an egg may be recorded in grams and again may
be rounded off to the nearest tenth of a gram. Such characters are called quantitative characters (or
variables). A qualitative character, too ultimately yields numerical data. This is because we will
finally note how many of the individuals under study have any given form of the character. This is
the particular aspect that we wish to study.
A statistical variable or characteristic: Observations concerning a particular theme have been made
on these individuals.
We have classified characters into two categorieq: qualitative and quantitative. Now quantitative
characters or variables, in their turn, may be classified as discrete and continuous. A discrete variable
is one that can conceivably assume only some discrete,-or isolated values. The size of families, the
proportion or the number of males in each group of 25 students, or the length of a word are variables
of this type. The size of a family or the length of a word may take values like 1,2,3, etc., but no
values in between. The number of males in a group of 25 students may be 0,1,2, ..., 24 or 25, while
the proportion of males may be 0,0.04,0.08 ...., 0.96 or 1; values in between these - numbers are
inconceivable. A continuous variable, on the other hand, can possibly take any value in some
interval. For example, the age (in years) of teachers, the height (in cm.) of students, the weight (in
grams) of eggs are all continuous variables. Supposing the minimum age at which a person can join
the teaching profession is a years and that every member of the teaching community has to retire on
reaching the age P years, then the age of teachers must vary between a and p and can take an); value
within the interval [a, PI. Indeed; the actual age of a teacher may well be 32.119237 years! However,
there will be hardly any need to record the age with this much precision! The enquirer may be
satisfied by taking the age correct to the second decimal plaa so that the teachers age may be
recorded as 32.12 years. This is an example of how limitations of the measuring instruments can
introduce a discreteness into the observations of a continuous variable. Similarly, the actual monthly
income of an Indian which is a continuous variable, has to be expressed in rupees or in rupees and
paise, since the paisa happens to be the smallest denomination coin in the Indian system of currency.
This is also the case with the score in an examination of students taking the examination. The score is
invariably expressed in integers and yet it has to be regarded as a continuous variable. This is because
the score is supposed to measure the p~oficiency of the students in the subject concerned, and the
proficiency may be taken to vary in a continuous manner (say, between 0 and 100).
The distinction between a discrete and a continuous variable is important. Quite often, the statistical
analysis of the data will differ accordingly. In fact, there are some techniques of statistical infer-,
which are based on the assumption that the variable under study is continuous. These are dearly
inapplicable to data on a discrete variable. - In the next section, we shall discuss the concept of
frequency distributions of qualiiative characters and variables.
The series of observations forms what is known as a statistical variable.
For example: Students' marks in the Statistics exam, the grades they obtained in their A-levels, their
sex, the colour of their eyes, the turnover per SME, the number of children per household, grouping,
etc. ...
For example : In the case of a group of people, we may be interested in their age, sex, height,
1.10. Modalities
These are the different possible situations of the characteristic.
Example: Sex is a characteristic with two states: female or male.
Example: As for the number of children per family, the states of this characteristic can be 0, 1, 2, ...,
10, ....
Note: The states of a characteristic must be incompatible and exhaustive; every individual must have
one and only one state.
Note: It is customary to distinguish between the two types of characteristic.
1.10.1.Qualitative characteristic
Its modalities are not expressed by a number.
Example : Coat colour, blood groups, different nucleotides in DNA, ... .
1.10.2.Quantitative characteristic
Its modalities are numerical.
Example: The number of cells in a culture, the blood sugar level, the number of white or red blood
cells, ... .
1.11. Statistical variable (SV)
1.11.1. Discrete statistical variable
X is said to be discrete if E = x1 , ..., xn , finite or infinite set of isolated values countable, usually
integer values.
For example: the number of houses per neighbourhood in a town, the number of children per
household can only be 0, or 1, or 2, or 3, ... .
1.11.2 Continuous statistical variable
X is said to be continuous if E = [a0 , a1 [∪... ∪ [an-1 , an [ or ∀i = 1 : n, ai ∈ IR.
For example, the weight of students in a section, the height of students in a school, laboratory tests
(glucose levels, cholesterol levels......
1.11.3. Qualitative variable:
when the modalities (not measurable) or the values it takes are designated by names or a code.
Qualitative characteristics are those whose modalities cannot be ordered, i.e. if we consider two
characteristics taken at random, we cannot say that one of the characteristics is less than or equal to
the other.
For example, the modalities of the variable Sex are : Male and Female; the terms of the variable Eye
Colour are : Blue, Brown, Black and Green; the terms of the variable Mention au Bac are : TB, B,
AB and P.
There are two types of qualitative variables:
- Ordinal qualitative variables
- Nominal qualitative variables.
More precisely, a qualitative variable is said to be ordinal when its terms can be classified in a certain
natural order (this is the case, for example, with the variable Mention au Bac);
a qualitative variable is said to be nominal, when its terms cannot be classified in a natural way (this
is the case, for example, with the variable Eye Colour or the variable Sex).
1.12 Numbers and frequencies, cumulative numbers and frequencies
1.12.1 Ungrouped Frequency
Distributions We use ungrbuped frequency distributions when the data is of a qualitative nature, or
when the variable under consideration is discrete. Here, we will take one example of each situation
for illustration.
Frequency Distribution of a Qualitative Character
A botanist obtained a variety of linseed by cross-breeding of two pure varieties. She observed the
colour of flowers of plants grown through inbreeding of the new mixed type (called plants of the Fa
generation). On the basis of these observations, she - prepared the following table.
Table 1 : Classification of flowers in an F2, population of linseed by colour
Colour Number of Relative
flowers frequency
(frequency)
Blue 169 0.538
Lilac 61 0.194
White 62 0.197
Pink 22 0.070
Total 314 0.999
The figures in the second column of Table 1 are called the frequencies of the four classes (or of the
four colours). So 'frequency' indicates how frequently the corresponding form of the character under
study (viz., colour) occurs in the collected data. The sum of the frequencies, 314 in this case, is said
to be the total frequency. The first two columns in Table 1 constitute a frequency table. Since these
indicate the manner in which the total frequency 314 (or the total number of individuals) is
distributed among the four classes, they are also said to represent the frequency distribution of colour
for the 314 flowers. Perhaps a better expression is 'the frequency distribution of the 314 flowers by
colour'. Alternatively, we can also write the frequency distribution in terms of the proportions of
blue, lilac, white and pink flowers in the group. These proportions give the relative frequencies, and
are shown in the third column of Table 1. By definition, frequency of the class relative frequency of a
class = -, .. . (1) total of frequency Then what is the total relative frequency? One, of course. But you
can see that in Table 1, the relative frequencies do not add up exactly to 1. This is because the ,
individual figures are all approximate, rounded off to a certain number of decimal places. ' Note that
while the distribution of frequencies answers questions of the type 'How many flowers in tbe given
group are blue?', the relative frequency has to do with questions like 'what is the proportion (or
percentage) of blue flowers in the group?' Further, in any situation, a frequency must be non-negative
integer. The value 0 is admissible, for in the above situation it is conceivable that we might have a
fifth flower colour, say yellow, which was absent in the sample. A relative frequency, on the other
hand, must be a rational number in the interval [0,1]. The simplest type of classification of a group of
individuals by a qualitative character is a dichotomy, i.e., a classification with just two classes. A
group of students may ' thus he classified by sex as boys and girls or by performancr at an
examination as succesdful and msuccessful.
1.12.2 Headcount, cumulative headcount
The headcount of a class (or of a value) designates the number of individuals associated with this
class (or with this value).
If, in a statistical series, the values of a characteristic can be ordered, the cumulative number of
individuals for the value x is the sum of the numbers of all the values less than or equal to x.
This is an increasing cumulative number, but a decreasing cumulative number could also be defined
by taking the sum of the numbers of all the values greater than or equal to x. The number of
individuals (denoted ni) with the characteristic xi is called the number of individuals.
1.12.3. Relatives frequencies, cumulatives frequencies
A relative frequency is the percentage corresponding to the number of a data item in relation to the
total number.
A cumulative relative frequency is the percentage corresponding to the cumulative size of a data item
in relation to the total size.
The number fr = ni /N is called the frequency of characteristic xi.
Note 0 ≤ fr ≤ 1
2. Statistical test
The aim of descriptive statistics is to study the characteristics of a set of observations, such as the
measurements obtained in an experiment. The experiment is the preliminary stage in any statistical
study. It involves making ‘contact’ with the observations. Generally speaking, the statistical method
is based on the following concept. The statistical test is an experiment that is provoked.
2.1.Graphical representation of a variable
2.1.1 Table presentation
When gathering the first data on a given phenomenon, it is difficult to take advantage of this data in this
form, which is why we try to present it in the form of a table and then a graph. The steps to follow to
draw up a table are:
*Calculate the range e= Xmax- Xmin of the statistical distribution.
Class constitution rule: The number of classes should be no less than 5 and no more than 20 (it
generally varies between 6 and 15).
This choice depends on the number of observations and their dispersion. In practice,
the Sturges formula can be used: k = 1+ 3.32log10n
or the Yule formula k = 2.54 √n (k=2.5(n)1/4 )
or
k = √n.
∗ Calculate the class length a=E/k
For example :
For a group of 15 students, we observed the values of the variables: Eye Colour, Sex, Bac Mark and
Statistics Exam Mark, and obtained the following data table. This data will be used frequently in this
chapter.
Data Table
Individual Eye colour Sex Baccalaureate score Statistics exam score
X1 green women TB (very good) 18
X2 black women B (good) 14
X3 blue men P (passable) 10
X4 black men AB (quite good) 12
X5 black men B(good) 15
X6 green women P(passable) 8.75
X7 black women AB (quite good) 10
X8 black women TB(very good) 17.5
X9 black men B(good) 13.75
X10 brown men P (passable) 9
X11 brown men TB(very good) 18
X12 black men B(good) 14
X13 black men B(good) 14.75
X14 black women AB (quite good) 12
X15 Green women P (passable) 11

Note
Generally speaking, an individual belongs to one and only one mode of a qualitative variable.
Very often, among the categories of a qualitative variable, there is an Other category (non-
respondents or missing values or something like that) in which we place the individuals that we are
unable to fit into another category of this variable.
Let's look at the example of the Eye Colour variable.
We start by counting the number of individuals belonging to each of the modalities of this variable:
nBlue = 4 individuals 2 have blue eyes, nbrown = 3 have brown eyes, nblack = 4 have black eyes and
ngreen = 4 have green eyes; all this can be summarised in the following summary table:

Color Blue Brown Black Green


Effectif 4 3 4 4
Let's take the example of the variable ‘Mention au Bac’, and we obtain the following summary table:
We can see that the students are unevenly distributed between the different modalities of the variable
‘Mention au Bac’. In general terms, the frequency of an ‘M’ mode of a qualitative variable is
calculated using the following formula:
Baccalaureate score Effective Frequencies Percentages
very good (Très Bien) 1 fT B = 1/15 = 0.067 6.7%
Good (Bien) 2 fB = 2/15 = 0.133 13.3%
(Quite good) (Assez Bien) 4 fAB = 4/15 = 0.267 26.7%
Passable (Passable) 8 fP = 8/15 = 0.533 53.3%
Total Effective Passable + assez bien + Total = 100%
N = 15 bien + très bien = 1

fM = (frequency of the ‘M’ category of a qualitative variable) = (Number corresponding to ‘M’) /


(Total number).
In addition, pM = (percentage of individuals corresponding to mode ‘M’) = fM × 100.
Finally, we have (sum of the frequencies of all the terms of a qualitative variable) = 1 (sum of all the
percentages corresponding to the terms of a qualitative variable) = 100.
3. Position and dispersion parameters
Positional parameters (central tendency characteristic)
The statistical indicators of central tendency (also known as positional indicators) frequently
considered are the mean, the median and the mode.
3.1. The mode
The mode of a statistical series is the most repeated value. It is denoted by Mo.
N.B.: In a statistical series, two modes can be found; this is known as a bi-modal series.
Three modes: tri-modal series
More than three: this is known as a multi- modal series.
Mo = linf+ Δ1 /(Δ1 + Δ2 ) * ai
𝑙𝑖: the lower limit of the modal class
Δ1: the difference between the frequency of the modal class and the one before.
Δ2: the difference between the frequency of the modal class and the one after.
a𝑖: the length of the modal class.
3.2. The median
The median is the value of the variable such that half of the values are greater than or equal to it and
the other half of the values are less than or equal to it. There are two cases depending on the parity of
n.The median of a set of numbers arranged in ascending order is:
* the value in the middle if the number of data is odd,
* the arithmetic mean of the two values in the middle if the number of data is even.
Even number : Me = 𝑥 (𝑛/2)+ 𝑥 (𝑛/2 +1)/ 2
Uneven number : Me = x ( 𝑛+1/2 )
*Example :
Consider the following series: 2.50; 2.75; 3.20; 2.18; 1.85; 5.40; 3.65; 4.25; 5.65; 6.30. Sorting in
ascending order gives: 1.85; 2.18; 2.50; 2.75; 3.20; 3.65; 4.25; 5.25; 5.40; 5.65; 6.30.
The number of observations is therefore n = 10 (even):
La médiane est Me = 𝑥 (𝑛/2)+ 𝑥 ( 𝑛/2 +1 )/2 = 𝑥 (5)+ 𝑥 (6)/ 2 = 3.20 + 3.65/2 = 3.425
* Example: the set of numbers 3, 4, 4, 5, 6, 8, 8.8 and 10 has a median of 6.
* Example: the set of data 5, 5, 7, 9, 11, 12, 15 and 18 has the median(9+11)/12= 10.
To determine the median in the continuous case, it is necessary to consider the cumulative increasing
or decreasing numbers and to find, if necessary by interpolation, the value of the characteristic
corresponding to 50% of the total number of people.
3.3. Arithmetic mean
The mean of X is the quantity
Its mathematical expression is :
which implies :

Let 𝑥1, 𝑥2 , 𝑥3, ..., 𝑥𝑛 be a finite sequence of numbers.


The arithmetic mean is:𝑥̅ =𝑥 1 + 𝑥 2 + 𝑥 3 + ⋯ + 𝑥 𝑛𝑛=1/ 𝑛∑ 𝑥𝑖/𝑛𝑖=1
If each value 𝑥𝑖 appears 𝑛𝑖 times in the series; we can still write :𝑥̅ =1 /𝑛∑ 𝑛𝑖/𝑥𝑖.
En remarquant que 𝑛𝑖 𝑛 est la fréquence relative 𝑓𝑖 qui correspond à la valeur 𝑥𝑖 , on a aussi : 𝑥̅ = ∑
𝑓𝑖 * 𝑥𝑖
Example: in example1 the average number of children per family is:
𝑥̅ = (16×0)+(18×1)+(14×2)+(11×3)+(3×4)+(2×564)≅ 1,58
= (0,25 × 0) + (0,281 × 1) + (0,218 × 2) + (0,17 × 3)+ (0,047 × 4) + (0,31 × 5) ≅ 1,58 .
In the case of data grouped into classes, the values of the 𝑥𝑖 are taken to be the centres of the classes.
In example we have :
𝑥̅ = 3×0.88+4×0.94+7×1.00+8×1.06+6×1.12+4×1.1832= 1.04 𝑔/𝑙
3.4. Comparison of different position parameters:
* The arithmetic mean is not very sensitive to sampling fluctuations. It lends itself well to
comparisons. However, outliers can alter it significantly.
* The median is more sensitive to sampling fluctuations and less sensitive to outliers. However, it
lends itself less well to algebraic calculations.
* The mode is representative of the value of the most common, most typical characteristic, but it can
be somewhat ambiguous.
Comparing the three gives a more complete idea of the distribution (if the three are approximately
equal, then the statistical series is approximately symmetrical).
Pearson's empirical relationship is as follows:
Karl Pearson notes that the approximation can be made using the following formula :
1 (mode) + 2 (mean) ≈ 3 (median)
3.5.The quartile:
The quartile divides the statistical series into 4 equal parts
- The first quartile, Q1, is the smallest value in the series where at least 25% of the values are less
than or equal to Q1.
- The third quartile, Q3, is the smallest value in the series such that at least 75% of the values are less
than or equal to Q3.

3.6.Discrete quantitative variable


Q1= N/4 (25% of N), (a quarter of the values taken by X are less than or equal to Q1)
Q2 = N/2 = Me is the median (50% of N).
Q3 = ¾ of N (75% of N), (a quarter of the values taken by X are greater than or equal to Q3).
The interquartile range (IQR) is the difference between the third quartile and the first quartile; it is
written: IQR = Q3 - Q1:
3.6.1. Continuous quantitative variable
Q1= binf+(bsup -binf)/no *[N/4-F1 ]
Q2=Me= binf+(bsup -binf)/no *[N/2-F1 ]
Q3= binf+(bsup -binf)/no *[3/4N-F1 ]
5. Dispersion parameters (variability)
A dispersion parameter refers to the difference between two values of the character, whereas a
position parameter represents one value of the character.
The usual statistical indicators of dispersion are :
range, variance and standard deviation.
5.1.Range (Etendue)
The difference between the largest and smallest values of the characteristic, given by the quantity E =
Vmax - Vmin, is called the statistical variable's range. Calculating the range is very simple. It gives
an initial idea of the dispersion of the observations. It is a very rudimentary indicator.
5.2. Arithmetic mean deviation: given by
𝐸̅ = 1/𝑛 ∑ 𝑛𝑖 * |𝑥𝑖 - 𝑥̅|
5.3.Variance and standard deviation
The variance of this statistical series X is the number
The variance of the characteristic in the sample, noted 𝑠2 é𝑐ℎ𝑎𝑛, is given by
𝑠 2 é𝑐ℎ𝑎𝑛 = 1/ 𝑛 ∑ 𝑛𝑖(𝑥𝑖 - 𝑥̅)2 = 𝑥 2̅̅ - 𝑥̅2 𝑖 .
The variance of the character in the population, denoted 𝜎2 , is generally unknown.
The estimator of the variance of the population, denoted 𝑠2 , is given by
𝑠 2 = 𝑛/ 𝑛 – 1* 𝑠 2 é𝑐ℎ𝑎𝑛 .
Mean and empirical variance
Given a sample of size n and (x1 ,x2 ,...,xn ) the n observed values.
The empirical mean is
m=x1 +x2 +...+xn /n
The empirical variance is :
s2 =(x1 −m)2 +...+(xn −m)2 /n
Given a sample of size n.
The n observed values (x1 ,x2 ,...,xn ) of the characteristic are considered to be the values of n
independent random variables X1 ,X2 ,...,Xn following the same distribution F with expectation µ and
standard deviation σ.
The set of averages for samples of size n is the random variable X=X1 +X2 +...+Xn /n.
If the observed result is x1 ,x2 ..xn , then the observed value of X is the empirical mean m :
m=x1 +x2 +..+xn /n.
The set of variances of samples of size n is the random variable S 2 =(X1 -X)2 +(X2 -X)2 +..+ (Xn -X)2 /n.
If the observed result is x1 ,x2 ..xn , then the observed value of S 2 is the empirical variance s2 :
s2 = (x1 −m)2 + (x2 −m)2 +..+ (xn −m)2 /n.
5.4.Standard deviation: the standard deviation is the root of the variance 𝑠 = √s2 .
Example: (for example1) s2 échan = 7.75 × 10-3, s= 0.089
5.5. Coefficient of variation:
Coefficient of variation is a value (without unit) which expresses the ratio (given in %) between the
value of the central tendency (mean) and the value of the dispersion (standard deviation).
standard deviation). High values of this coefficient indicate great heterogeneity in the group studied
in relation to the variable. The formula for this coefficient is: C.V = 𝜎𝑥/ 𝑋̅ *100 (%)
6. Graphical representations of a statistical series
Data representation:
This method involves presenting the results in the form of illustrative diagrams, and should be as
clear and concise as possible.
Graphical representations are very important in descriptive statistics. They have the advantage of
providing immediate information about the general shape of the distribution. They make it easier to
interpret the data collected.
Data are represented in two main ways:
* Tables: which represent a list of the results of observations, possibly grouped together and
presented in an appropriate way,
* Graphs: which can take many different forms depending on the nature and number of variables
taken into consideration.
6.1.Study of a quantitative variable:
6.1.1.Discrete quantitative variable
Based on the observation of a discrete quantitative variable, two diagrams can be used to represent
this variable: the bar chart and the cumulative diagram.
Bar chart:
values of X on the x-axis, bar of length equal to the frequency (or headcount) of these values on the
y-axis.
Bar chart
Example: To illustrate, we take the previous example (number of children per family). We recall the
associated statistical table.
xi 1 2 3 4 5 6
ni 32 66 41 32 9 2
We want to represent this distribution in the form of a bar chart. Each mark corresponds to a bar.The
heights of the bars are proportional to the numbers represented.

6.1.2.Continuous quantitative variable:


Histograms:
These are juxtaposed rectangles with a base equal to and a height proportional to the frequency (or
headcount). Generally, the height is taken (the area of the histogram is equal to 1).
6.1.3.The frequency polygon:
This is obtained by joining the midpoints of the upper sides of each rectangle. The area thus obtained
is identical to that of the histogram. The graphical representation of a distribution of cumulative
numbers or cumulative frequencies is illustrated by a graph called a cumulative curve (integral
diagram) in the case where the variable studied is discrete) in the case where the variable studied is
continuous and the classes are represented by intervals.
6.1.4. Qualitative distribution
Based on the observation of a qualitative variable, two diagrams can be used to represent this
variable: the band diagram (known as an organ pipe) and the angular sector diagram (known as a pie
chart).
Pie chart
Circular or semi-circular diagrams consist of dividing a disc or half-disc into slices, or sectors,
corresponding to the modalities observed and whose area is proportional to the number of people, or
the frequency, of the modality.

Sector diagram
di = ni × 360 /N
The band diagram (organ pipes)
On the x-axis we plot the arbitrary categories. The ordinates are rectangles whose length is
proportional to the numbers, or frequencies, of each mode.

organ pipes
Example
The breakdown of the number of mobile phone subscribers in Algeria in 2014 is given below.
Opérators Djezzy Ooridoo Mobilis Autres Total
Number of subscribers 9.86 19.72 29.68 1.74 61
in millions
Frequencies in % 16.16 32.327 48.655 2.852 100%
Corresponding angle 58.17 116.377 175.158 10.26 360°
Complete the table above.
Construct below the appropriate circular diagram showing the distribution of subscribers, specifying
for each sector: the Operator, the percentage and the corresponding angle.
Example
We counted 1000 leukocytes in an individual and looked at their shape (category).
category of leucocytes Neutrophils Eosinophils Basophils Lymphocytes Monocytes
Number ni 600 20 10 110 260
-What is the characteristic being studied and what is its nature?
Graph this statistical series.
Solution
Characteristic studied : Category of leukocytes
Nature: a qualitative characteristicGraphical representation

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy