0% found this document useful (0 votes)

11 views13 pages

lec4-EDA2025

Statistics is the collection and analysis of data, essential for informed decision-making in various fields. It encompasses methods for data collection, processing, and interpretation, highlighting the importance of clear questioning to avoid misleading results. Key concepts include descriptive statistics, statistical inference, and various types of data and distributions, which can be visually represented through graphs and charts.

Uploaded by

tupdumpybudzzy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views13 pages

lec4-EDA2025

Uploaded by

tupdumpybudzzy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

STATISTICS

- A collection of data.
- The totality of methods used in the collection, processing, analysis or interpretation of
any kind of data.

Reasons why Statistics are important:

1) The increasingly quantitative approach employed in all the sciences as well as in

business and many other activities which directly affect our lives.
2) The amount of data that is collected, processed and disseminated to the public for one
reason or another has increased almost beyond comprehension.
3) The results of costly surveys can be useless if questions are ambiguous or asked in the
wrong way.

Example:
To determine the public sentiment about the government’s program against drug related
cases, an interviewer asks a respondent the question: Do you feel that this wasteful program of
the government regarding unhuman killings related to drug cases should be stopped or not?

- This is called “ begging the question” and may well yield misleading results, because the
interviewer suggests that the program in fact is wasteful.
- Question asked should be clear, that the answer would be of interest mainly to persons
in relation to it.

Statistical model – methods necessary which apply regardless of whether the data are IQ’s, tax
payments, reaction time, gest scores and so on.

Origin
a) Government
b) Games of chance
Descriptive Statistics – methods which originally consisted mainly of presenting data in the
forms of tables and charts. This includes anything done to data which is designed to summarize
or describe them without attempting to infer anything that goes beyond the data themselves.
Statistical Inference – methods in which analysis will require generalizations which go beyond
the data.

Probability Theory – was applied to many problems in the behavioural, natural and social
sciences and provides an important tool for the analysis of any situation which in some way
involves an element of uncertainty or chance.

Statistical Data – are the raw material of statistical investigation and they arise whenever
measurements are made or observations are classified.

Nominal Data – numbers in which it represents coding of various categories. In this artificial
way or nominal way, categorical data can be made into numerical data.

Ordinal data – data in which inequalities are set-up.

Interval data – data in which we can form differences but not multiply or divide.

Ratio Data – data in which we can form quotients and not difficult to find.
FREQUENCY DISTRIBUTION
The most common method of summarizing data is to present them in condensed form in
tables or charts. When we deal with large sets of data, a good over-all picture and sufficient
information can often be conveyed by grouping the data into a number of classes. Tables like
these are called Frequency distribution.

Numerical or Quantitative Distribution – if data are grouped according to numerical size.

Categorical or Qualitative Distribution - if data are grouped into non-numerical categories.

Frequency Distribution present data in relatively compact form, give a good over- all
picture and contain information that is adequate for many purposes, but some things which can
be determined from the original data cannot be determined from a distribution.

Frequency Distribution present RAW or unprocessed data in a more readily usable form and
the price we pay for this – the loss of certain information – is usually a fair exchange.

The construction of frequency distribution consists essentially of three steps:

1) Choosing the Classes ( intervals or categories )
2) Tallying the data into these classes, and
3) Counting the number of items/tallies in each class.

Last two steps are purely mechanical, we shall concentrate on the first, namely; the problem
of choosing a suitable classification. The two things we must consider in choosing a
classification scheme for a numerical distribution are how many classes we should use and the
range of values each class should cover, that is; from where to where each class should go.

RULES FOR STEP ONE:

1) We seldom use fewer than six or more than fifteen classes ( 7 – 14 ); the exact number
we use in a given situation will depend mainly on the number of measurements or
observations we have to group.2)
2) We always make sure that each item ( measurement/observation ) will go into one and
only one class.
3) Whenever possible, we make the classes the same length; that is, we make cover equal
ranges of values.

Open Class – any class of the “ less than or less”, “ more than or more “ type. If a set of data
contains a few values which are much greater than or much smaller than the rest, open classes
are quite useful in reducing the number of classes required to accommodate the data.
However, we usually avoid open classes because they make it impossible to calculate certain
values of interest, such as an average or a total.

Example:

Construct a distribution of the following amounts of sulphur oxides ( in tons ) emitted by an

industrial plant on 80 days.

15.8 26.4 17.3 11.2 23.9 24.8 18.7 13.9 22.7 9.8
6.2 14.7 17.5 26.1 12.8 26.8 22.7 18.0 20.5 11.0
20.9 15.5 19.4 19.1 15.2 22.9 26.6 20.4 21.4 19.2
21.6 18.5 23.0 24.6 20.1 16.2 18.4 7.8 13.5 14.6
29.6 19.4 17.2 20.9 24.6 22.5 24.6 8.3 21.9 12.3
22.3 13.3 11.8 19.2 20.4 25.9 10.3 15.1 27.5 18.1
17.9 9.8 24.1 13.2 10.8 14.5 31.9 9.0 16.7 23.5
25.7 23.7 19.1 18.4 28.6 17.7 16.8 18.4 20.1 6.8
Class Mark – are simply the midpoints of the classes. They are found by adding the upper and
lower limits of a class and dividing by two.

Class Intervals – is merely the length of a class or the range of values it contain and it is given by
the difference between its boundaries. If the classes of a distribution are all equal in length ,
their common class interval is also given by the difference between any two successive class
marks.

PERCENTAGE DISTRIBUTION

To convert a distribution into a percentage distribution; divide each class frequency by the
total number of items grouped and then multiply by 100.

CUMULATIVE DISTRIBUTION

The other way of modifying a frequency distribution is to convert it into a “ less than “ or l “
“ less “, “ more than “ or “ more “ cumulative distribution. To this end, we simply add the class
frequencies, starting either at the top or at the bottom of the distribution.

Note : In the same way; we can also convert a percentage distribution into a cumulative
percentage distribution. We simply add the percentages starting either at the top or at the
bottom of the distribution.

GRAPHICAL PRESENTATION

When frequency distributions are constructed primarily to condense large sets of data and
display them in an easy to digest form, it is usually advisable to present them graphically.

A) HISTOGRAM

The most common form of graphical presentation of statistical data is the histogram. A
histogram is constructed by representing the measurements or observations that are grouped
on a horizontal scale, the lass frequencies on a vertical scale and drawing rectangles whose
bases equal the class interval and whose heights are determined by the corresponding class
frequencies. The markings on the horizontal scale can be the class limits, the class boundaries,
the class marks or arbitrary lay values. For easy readability, it is usually better to indicate the
class limits although the rectangles actually go from one class boundary to the next. Histograms
cannot be used in connection with frequency distributions having open classes and they must
be used extreme care if the class intervals are not all equal.

B) BAR GRAPHS

The height of the rectangles or bars again represent the class frequencies there is no
pretense of having a continuous horizontal scale.

C) FREQUENCY POLYGON

The class frequencies are plotted at the class marks and the successive points are
connected by means of straight lines. Note that we added classes with zero frequencies
at both ends of distribution to “ tie down “ the graph to the horizontal scale.
D) OGIVE

If we apply the same technique to a cumulative distribution, we obtain what is called

an Ogive. However, the cumulative frequencies are plotted at the class boundaries
instead of the class marks. It stands to reason that the cumulative frequency
corresponding, say, to “ less than 13 “ should be plotted at 12.95, the class boundary;
since “ less than 13” actually includes everything up to 12.95.

E) PICTOGRAMS

Distributions are presented more dramatically and often effectively ( they are often
seen in newspapers, magazines and reports of various sorts).

F) PIE CHARTS

Categorical or qualitative distributions are often presented graphically as pie charts;

where a circle is divided into sectors ( pie-shaded pieces) which are proportional in size
to the corresponding frequencies or percentages. To construct a pie chart, we first
convert the distribution into a percentage distribution. Then, since a complete circle
corresponds to 360 degrees, we obtain the central angles of the various sectors by
multiplying the percentages by 3.6.

POPULATIONS AND SAMPLES

Population – if a set of data consists of all conceivably possible observations of a certain

phenomenon.
Sample – if a set of data contains only a part of these observations.

MEASURES OF LOCATION

a) Arithmetic mean ( 𝒙 ̅ ) – the mean of a set of values is the sum of the values divided by
their number. ( in everyday language; the mean is often called the “ average” )

𝑋1 + 𝑋2 + 𝑋3 + ………..+ 𝑋𝑛 ∑𝑋
Sample mean = 𝑥̅ = =
𝑛 𝑛

where: x – number of values in a sample

n - the 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑙𝑦;
∑𝑋
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 = 𝜇 =
𝑁
𝑷𝒓𝒐𝒑𝒆𝒓𝒕𝒊𝒆𝒔:

1) 𝐼𝑡 𝑐𝑎𝑛 𝑏𝑒 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑠𝑒𝑡 𝑜𝑓 𝑛𝑢𝑚𝑒𝑟𝑖𝑐𝑎𝑙 𝑑𝑎𝑡𝑎, 𝑠𝑜 it always exists.

2) A set of numerical data has one and only one mean, so it is always unique.
3) It tends itself to further statistical treatment for example: the means of several sets of
data can always be combined into the overall mean of all the data.
4) It is relatively reliable in the sense that means of many samples drawn from the same
population usually do not fluctuate, or vary as widely as other statistical measures used
to estimate the mean of a population.

5) It takes into account every item or a set of data.

b) Median ( 𝒙̃ ) – the median of a set of data is the value of the middle item, or the mean
of the values of the two middle items, when the data are arranged according to size.

̂ ) - it is defined as the value which occurs with the highest frequency. Its two
c) Mode ( 𝒙
main advantages are that it requires no calculations, only counting and it can be
determined even for qualitative or nominal data.

WEIGHTED MEAN

In general, the weighted mean 𝑥̅ w of a set of numbers x1, x2, x3, ………and xn,
whose relative importance is expressed numerically by a corresponding set of numbers
w1, w2, w3, ……and wn is given by:

𝒘𝟏𝒙𝟏 + 𝒘𝟐𝒙𝟐 + 𝒘𝟑𝒙𝟑 + ……….+ 𝒘𝒏𝒙𝒏 ∑ 𝑤𝑥

̅w =
𝒙 =
∑𝑤
𝒘𝟏 +𝒘𝟐 + 𝒘𝟑 + ……..+ 𝒘𝒏

If all the weights are equal, the formula reduces to that of the ordinary arithmetic
mean. A special application of the formula for the weighted mean arises when we must
find the over-all mean or grand mean of k sets of data having the means 𝑥1 ̅̅̅ , 𝑥2
̅̅̅ , 𝑥3
̅̅̅ +
………… Xk and consisting of n1, n2, n3 ………….and nk measurements or observations.
The result is given by :

̅̅̅̅
𝑛1𝑥1 ̅̅̅̅
𝑛2𝑥2 ̅̅̅̅
𝑛3𝑥3 ̅̅̅̅
𝑛𝑘𝑥𝑘 ∑ 𝑛𝑥̅
𝑥̿ = + + + …….. + =
𝑛 𝑛 𝑛 𝑛 𝑛
MEASURES OF VARIATION
It is a statistical measure which provides ways of measuring the extent to which darta are
dispersed or spread out.
Example:

Suppose that in a hospital each patient’s pulse rate is taken in the morning, at noon and
in the evening and that on a certain day, the pulse rate of patient A is 72, 76 and 74
while that of patient B is 72, 91 and 59. The mean pulse rates of two patients are the
same as 74. But observe the difference in variability, whereas patient A’s pulse rate is
stable and that of patient B fluctuates widely.

RANGE – differences between the respective extremes ( smallest and largest ). The range is
easy to calculate and easy to understand, but despite these advantages, it is generally
not a very useful measure of variation. Its main shortcoming is that it tells us nothing
about the dispersion of the values which fall between the two extremes.

Example:
Sample 1 : 6 18 18 18 18 18 18 18 18 18
Sample 2: 6 6 6 6 6 18 18 18 18 18
Sample 3: 6 7 9 11 12 14 15 16 17 18
All of them has a Range ( R ) = 18 – 6 = 12, but the dispersion is quite different in each case.

In some cases, when the sample size is quite small, the range can be an adequate measure
of variation. For instance, it is used widely in industrial quality control to keep a close check on
the consistency of raw materials or products, or on the uniformity of a process, on the basis of
small samples taken at regular intervals of time.

VARIANCE AND STANDARD DEVIATION

If a set of numbers X1, X2, X3, …….. and Xn, constituting a sample, has the mean 𝑥̅ , the
differences X1 - 𝑥̅ , X2 - 𝑥̅ , X3 - 𝑥̅ , ………….. and Xn - 𝑥̅ are called the Deviations from the
Mean, and it suggests itself that we might use their average as a measure of the variation of the
sample ( neglect negative result ).
If we add the deviations from the mean as if they were all positive or zero and divide by n,
we obtain the statistical measure which is called the Mean Deviation.
If we average the squared deviations from the mean and take the square root of the
∑(𝑥− 𝑥̅ )^2
result, we get √ and this is how, traditionally, the standard deviation used to defined.
𝑛

Expressing literally what we have done here mathematically, it is also called the Root-Mean-
Square Deviation. Nowadays, it is customary to modify this formula by dividing the sum of the
squared deviation from the mean by ( n-1 ) instead of n. Therefore:

∑(𝑥− 𝑥̅ )^2
s = √ ( sample standard deviation )
𝑛−1
and its square, as the Sample Variance:
∑( 𝑥 − 𝑥̅ )^2
s2 = ( sample variance )
𝑛−1

Also, by applying the rules of summation:

𝑛(∑ 𝑥 2) − ( ∑ 𝑥 )^2
s = √
𝑛 ( 𝑛−1 )

Rule:
1) Find 𝑥.
̅
2) Determine the n deviations from the mean x - 𝑥̅ .
3) Square the deviation.
4) Add all the squared deviations.
5) Divide by ( n – 1 ).
6) Take the square root of the result obtained in step 5.

For population Standard Deviation:

∑(𝑥 − 𝜇)^2
𝜎= √
𝑁

APPLICATION

CHEBYSHEV’s THEOREM
For any set of data ( population or sample ) and any constant k greater than 1, at least
1
of the data must lie within k standard deviations on either side of the mean.
1−𝑘 2

k ( s ) = |𝑥̅ − 𝑥|

STANDARD UNIT OR Z – SCORES

In general, if x is a measurement belonging to a set of data having the mean 𝑥̅ and the
standard deviation s , then its value in standard units, denoted by z is:

𝑥 − 𝑥̅
z =
𝑠

COEFFICIENT OF VARIATION
Expresses the standard deviation as a percentage of what is being measured, at least
on the average.

𝑠
v = x 100
𝑥
Example:

1) If all the 1-lb cans of coffee filled by a food processor have a mean weight of 16 ounces
with a standard deviation of 0.02 ounce, at least what percentage of the cans must
contain between 15.8 and 16.2 ounces of coffee?
2) Suppose that the final examination in a French course consists of two parts, vocabulary
and grammar and that a certain student got 66 points in the vocabulary part and 80
points in the grammar part. In which part does the student is higher in command
compared to the rest of the class if all the students in the class averaged 51 in the
vocabulary part with a standard deviation of 12 and averaged 72 in the grammar part
with a standard deviation of 16.
3) In recent months, the price of sirloin steak averaged $ 2.87 with a standard deviation of
$ 0.13 and the price of T-bone steak averaged $ 3.90 with a standard deviation of $ 0.16.
For which of these two cuts of beef is the price relatively more variable?

DESCRIPTION OF GROUPED DATA

As we have already seen, the grouping of data entails some loss of information. Each item
losses its identity, so to speak, we only know how many items there are in each class. In the
case of the mean and the standard deviation, we can usually get good approximations by
assigning to each item falling into a class, the value of the class mark.

To write general formulas for the mean and the standard deviation of a distribution with k
classes, let us denote the successive class marks by X1, X2, X3, …………………..and Xk and the
corresponding class frequencies by f1, f2, f3, …………………..and fk. Then the sum of all the
measurements or observations is given by ∑ 𝑋. 𝑓, the sum of their squares is given by ∑ 𝑋 2 .f
and the formula for 𝑥̅ and the corresponding formula for s can be written as:

∑ 𝑥.𝑓 𝑛( ∑ 𝑥 2 .𝑓) − ( ∑ 𝑥.𝑓 )2

𝑥̅ = and s = √
𝑛 𝑛( 𝑛−1 )

CODING

If calculations were tedious, we can simplify this by coding the class marks so that we have
smaller numbers to work with. When the class intervals are all equal, this coding consists of
assigning the value zero to one of the class marks ( preferably at or near the center of the
distribution ) and representing all the class marks by means of successive integers. ( For
instance, if a distribution has nine classes and the class marks of the middle class is assigned the
value zero, the successive class marks of the distribution are assigned the values -4, -3, -2, -1, 0,
1, 2, 3 and 4).
Of course, when we code the class marks in this way, we must account for it in the formulas
for the mean and the standard deviation. Referring to the new(coded) class marks as u’s, we
write:
∑ 𝑢.𝑓
𝑥̅ = Xo + (c)
𝑛

𝑛( ∑ 𝑢2 .𝑓 ) − ( ∑ 𝑢.𝑓 )2
s = 𝑐√ Type equation here.
𝑛(𝑛−1)
where:
Xo - class mark in the original scale to which we assign zero in the new scale.

n – number of items grouped.

c - class interval.
∑ 𝑢. 𝑓 - sum of the products obtained by multiplying the new class marks by corresponding
class frequency.
∑ 𝑢2 . 𝑓- sum of the products obtained by multiplying the squares of the new class marks by
the corresponding class frequencies.

MEDIAN OF A DISTRIBUTION

Once a set of data has been grouped, we cannot find the exact value of the median
because of the loss of information which result from the act of grouping. So, we define the
median as follows:

“ The median of a distribution is the number which is such that half the total area of the
rectangles of the histogram of the distribution lies to its left and the other half lies to its right. “

In general, if L is the lower boundary of the class into which the median must fall, f is its
frequency, c is the class interval and j is the number of items we still lack when we reach L, then
the median of the distribution is given by:

𝑗
𝑥̃ = L + (c)
𝑓

Also, we can find the median of a distribution by starting to count at the other end
( beginning with the largest values ) and subtracting an appropriate fraction of the class interval
from the upper boundary U of the class into the median must fall, the corresponding formula is
given as:

𝑗
𝑥̃ = U - (c)
𝑓

where j is the number of items we still lack when we reach U.

FRACTILES/QUARTILES
- Is a value at or below which a given fraction of the data must lie.

a) Quartiles ( Q1 up to Q4 )
b) Deciles ( D1 up to D10 )
c) Percentiles ( P1 up to P100 )

PROBABILITY DISTRIBUTION

Random Variables – are usually classified according to the number of values which maybe
assumed. Random variables are functions and not variables.
Example:

1) Annual production of coffee.

2) Number of person visiting a famous park every week.
3) Wind velocity at NAIA.

Probability Distribution – is a correspondence which assigns probabilities to the values of a

random variable. Probability distribution is expressed by means of mathematical formulas
which enable us to calculate directly the probabilities associated with the various values of a
random variable.

RULE:
1) Since the values of a probability distribution are probabilities, they must be numbers on
the interval from zero to one.
2) Since a random variable has to take on one of its values, the sum of all the values of a
probability distribution must be equal to one.

BINOMIAL DISTRIBUTION
Assumption:
The number of trials is fixed; the probability of a success is the same for each trial; and
the trials are all independent( that is, what happens in any one trial does not affect the
probability of a success in any other trial ).
The probability of getting x successes in n independent trials is:

f( x ) = (𝑛𝑥) px ( 1-p )n-x for x = 0, 1, 2 ……….n

where:

p – is the constant probability of a success for each trial.

x - is the number of successes.
n – x - is the number of failures

(𝑛𝑥) - the number of combinations of x objects selected from a set of n objects.

HYPERGEOMETRIC DISTRIBUTION
Suppose that n objects are to be chosen from a set of a objects of one kind ( successes)
and b objects of another kind ( failures ) and that we are interested in the probability of getting
“ x successes and n-x failures”. We can say that the x successes can be chosen in (𝑎𝑥) ways, then
𝑏
n-x failures can be chosen in (𝑛−𝑥 ) ways and hence, x successes and n-x failures can be chosen
in (𝑎𝑥) (𝑛−𝑥
𝑏
) ways. Also n objects can be chosen from the whole set of a+b objects in (𝑎+𝑏 𝑛
) ways
and if we regard all these possibilities as equally likely, it follows that for sampling without
replacement the probability of getting “ x successes in n trials “ is:

(𝑎)( 𝑏 )
𝑥 𝑛−𝑥
f(x) = for x = 0, 1, 2, 3 ………..n
(𝑎+𝑏
𝑛
)

POISSON DISTRIBUTION
If n is large and p is small, binomial probabilities are often approximated by means of the
formula:

( 𝑛 𝑝 )𝑥
f(x) = ( 𝑒 −𝑛𝑝 ) for x = 0, 1, 2, 3 ………..n
𝑥!

Example:

1) If the probability is 0.8 that a cleaning fluid will remove any one spot, what is the
probability that it will remove exactly six out of eight spots?

2) A mailroom clerk is supposed to send six of fifteen packages to Europe by airmail, but he
gets them all mixed up and randomly puts airmail postage on six of the packages. What
is the probability that only three of the packages which are supposed to go by airmail
will go by airmail?

3) Out of 2500 cars who passed in EDSA , 2% of them causes traffic due to flat tires. What
is the probability that at most 5 of these cars causes traffic at EDSA?

MEAN OF A PROBABILITY DISTRIBUTION

If a random variable takes on the values X1, X2, X3, ………….. and Xk with the probabilities
f(x1), f(x2), f(x3) , …………….. and f(xk), its expected value is given by:

X1 f(x1) + X2 f(x2) + X3 f(x3) + …………………… + Xk f(xk)

and it is customary to refer to this quantity as the Mean of the Random variable or the Mean of
its Distribution. Using the ∑ 𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛 , we write:

𝜇 = ∑ 𝑋 𝑓(𝑥)
where:
𝜇 is the mean of the distribution

Also, 𝜇 = np ( mean of binomial distribution )

𝑛𝑎
and 𝜇 = ( mean of hypergeometric distribution )
𝑎+𝑏

STANDARD DEVIATION OF A PROBABILITY DISTRIBUTION

For probability distribution, we measure variability in almost the same way, but instead of
averaging the squared deviations from the mean, we calculate their expected value. If x is a
value of some random variable whose probability distribution has the mean 𝜇, the deviation
from the mean is x – 𝜇 and we define the variance of the probability distribution as the
expected value of the squared deviation from the mean, namely as:

𝜎2 = ∑( 𝑥 − 𝜇 )2 f ( x )
The square root of the variance defines the Standard Deviation of a Probability Distribution and
we write:

𝜎 = √∑( 𝑥 − 𝜇 )2 √𝑓( 𝑥 )

Also, using the computing formula:

𝜎2 = ∑ 𝑥 2 f ( x ) - (∑ 𝑥 𝑓(𝑥))2
We can also write:

𝜎 = √𝑛𝑝 ( 1 − 𝑝 ) ( standard deviation of binomial distribution )

CONTINUOUS DISTRIBUTION

Continuous curves are the graphs of functions called probability densities or informally,
continuous distributions. A probability density is characterized by the fact that:
“ The area under the curve between any two values a and b gives the probability that a
random variable having the continuous distribution will take on a value on the interval from a
to b.

NORMAL DISTRIBUTION

The graph of a normal distribution is a bell-shaped curve that extends indefinitely in both
directions, the curve comes closer and closer to the horizontal axis without ever reaching it, no
matter how far we go in either direction away from the mean.
An important feature of the normal distribution is that its mathematical equation is such
that we can determine the area under the curve between any two points on the horizontal
scale if we know its mean and its standard deviation; in other words, there is only one and only
one normal distribution with a given mean 𝜇 and a given standard deviation 𝜎.
In practice, we find areas under the graphs of normal distributions, or simply areas under
normal curves in special tables. As it is physically impossible and also unnecessary, to construct
separate tables of normal-curve areas for all conceivable pairs of values of 𝜇 and 𝜎. We
tabulate there areas only for the normal distribution with 𝜇 = 0 and 𝜎 = 1 called the Standard
Normal Distribution. Then, we obtain areas under any normal curve by performing the change
of scale which converts the units of measurements from the original scale, or x-scale into the
standard units, standard scores or z-scores by means of the formula:

𝑥− 𝜇
Z =
𝜎

Module in Statistic Data Representation
No ratings yet
Module in Statistic Data Representation
12 pages
MEFall2023_2
No ratings yet
MEFall2023_2
26 pages
TOPIC 2 Presentation of Data
No ratings yet
TOPIC 2 Presentation of Data
25 pages
TOPIC 8 STATISTICS (1)
No ratings yet
TOPIC 8 STATISTICS (1)
16 pages
Package Outliers': R Topics Documented
No ratings yet
Package Outliers': R Topics Documented
15 pages
Chapter 2 - Frequency Distrubution and Graphical Methods
No ratings yet
Chapter 2 - Frequency Distrubution and Graphical Methods
12 pages
Investments Global Edition 10th Edition Bodie Solutions Manual pdf download
100% (1)
Investments Global Edition 10th Edition Bodie Solutions Manual pdf download
48 pages
2024 Module II M.sc 2 Statistics 2
100% (1)
2024 Module II M.sc 2 Statistics 2
24 pages
ES-M3-24-25-1
No ratings yet
ES-M3-24-25-1
18 pages
Chapter 2 Notes
No ratings yet
Chapter 2 Notes
6 pages
Utilization of Assessment Data
100% (4)
Utilization of Assessment Data
50 pages
Asslearn 2 Prelims
No ratings yet
Asslearn 2 Prelims
7 pages
Final-Chaptt-1-2-3 Draft
No ratings yet
Final-Chaptt-1-2-3 Draft
30 pages
Quantitative Data Analysis Assignment (Recovered)
100% (1)
Quantitative Data Analysis Assignment (Recovered)
26 pages
Presentation of Data
No ratings yet
Presentation of Data
10 pages
Lec 3(Data Organization)
No ratings yet
Lec 3(Data Organization)
62 pages
STA111 Complete Note
No ratings yet
STA111 Complete Note
74 pages
Displaying & Organizing Data Statistics
No ratings yet
Displaying & Organizing Data Statistics
22 pages
Taylor Ims11 Tif Ch11-Probability and Statistics
No ratings yet
Taylor Ims11 Tif Ch11-Probability and Statistics
31 pages
FMISver2 0
No ratings yet
FMISver2 0
74 pages
Statistics- slide 2
No ratings yet
Statistics- slide 2
15 pages
Lecture Two
No ratings yet
Lecture Two
33 pages
Stat-Module-2-PPT
No ratings yet
Stat-Module-2-PPT
57 pages
Test For A Variance or Standard Deviation: Megastat Hypothesis Tests Proportion vs. Hypothesized Value
No ratings yet
Test For A Variance or Standard Deviation: Megastat Hypothesis Tests Proportion vs. Hypothesized Value
10 pages
3-Data Organization and Presentation
No ratings yet
3-Data Organization and Presentation
78 pages
1. Descriptive Statistics (1)
No ratings yet
1. Descriptive Statistics (1)
65 pages
FINAL-TERM-NOTES-ANDS-
No ratings yet
FINAL-TERM-NOTES-ANDS-
43 pages
3.Descriptive Statistics Assig
No ratings yet
3.Descriptive Statistics Assig
92 pages
Edte 326 Statistics
No ratings yet
Edte 326 Statistics
103 pages
Stanine Scores
No ratings yet
Stanine Scores
9 pages
Data visualization (3)
No ratings yet
Data visualization (3)
5 pages
Graphical Representation of Data
No ratings yet
Graphical Representation of Data
6 pages
Ch.IV_
No ratings yet
Ch.IV_
15 pages
Statistics and Probability_CSE (1)
No ratings yet
Statistics and Probability_CSE (1)
49 pages
1739892143
No ratings yet
1739892143
8 pages
Epidem chapter 8
No ratings yet
Epidem chapter 8
62 pages
1 Stats Intro 14022024 105127am
No ratings yet
1 Stats Intro 14022024 105127am
26 pages
Statistics - Theory Notes
No ratings yet
Statistics - Theory Notes
12 pages
4.1 Normal Distribution: Properties
No ratings yet
4.1 Normal Distribution: Properties
4 pages
ch2-22092024-104300am
No ratings yet
ch2-22092024-104300am
97 pages
Week 02 Data Organizatiion and Presentaion
No ratings yet
Week 02 Data Organizatiion and Presentaion
51 pages
Ekonomika Teknik: Uncertainty Analysis
No ratings yet
Ekonomika Teknik: Uncertainty Analysis
18 pages
Lesson 3.1 Data Gathering and Organizing Data
No ratings yet
Lesson 3.1 Data Gathering and Organizing Data
38 pages
Estimation of Geneticparameters For Growth Traits in South Africa Brahman Cattle
No ratings yet
Estimation of Geneticparameters For Growth Traits in South Africa Brahman Cattle
79 pages
Six Sigma_Black-Belt2021 (1)
No ratings yet
Six Sigma_Black-Belt2021 (1)
11 pages
Yr10 Chapter 22U Statistics 2023
No ratings yet
Yr10 Chapter 22U Statistics 2023
12 pages
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
100% (1)
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
14 pages
MMW Module 4
No ratings yet
MMW Module 4
41 pages
agroeco_CH_2 (1) (1)
No ratings yet
agroeco_CH_2 (1) (1)
28 pages
Statistics Applied To Research
No ratings yet
Statistics Applied To Research
91 pages
Wordpress Documentation
No ratings yet
Wordpress Documentation
24 pages
07 Introduction To Risk, Return, and The Opportunity Cost of Capital
100% (1)
07 Introduction To Risk, Return, and The Opportunity Cost of Capital
12 pages
PB2MAT - 02Bahan-Presenting Data in Tables and Charts For Categorical and Numerical Data Pert 2
No ratings yet
PB2MAT - 02Bahan-Presenting Data in Tables and Charts For Categorical and Numerical Data Pert 2
23 pages
Chapter 4 Data Management
No ratings yet
Chapter 4 Data Management
29 pages
1 Intro To Stat & Data Presentation
No ratings yet
1 Intro To Stat & Data Presentation
21 pages
Module 4 - Data Management
No ratings yet
Module 4 - Data Management
38 pages
Statistics
No ratings yet
Statistics
17 pages
Nabl 141
No ratings yet
Nabl 141
50 pages
Problems 6. Risk and Return
No ratings yet
Problems 6. Risk and Return
2 pages
Lesson 2 Frequency Distribution and Data Presentation 18
No ratings yet
Lesson 2 Frequency Distribution and Data Presentation 18
11 pages
Chapter 5
No ratings yet
Chapter 5
143 pages
chapter2
No ratings yet
chapter2
32 pages
2. presenting of data_١١١٠٥٩
No ratings yet
2. presenting of data_١١١٠٥٩
39 pages
Describing Data New
No ratings yet
Describing Data New
13 pages
DDLT
No ratings yet
DDLT
4 pages
Data Explorations-Frequency Distributions
No ratings yet
Data Explorations-Frequency Distributions
21 pages
Chapter 2 - Organization and Presentation of Data: Learning Outcomes
No ratings yet
Chapter 2 - Organization and Presentation of Data: Learning Outcomes
8 pages
Chapter 4: Resistance To Progress of A Vehicle - Measurement Method On The Road - Simulation On A Chassis Dynamometer
No ratings yet
Chapter 4: Resistance To Progress of A Vehicle - Measurement Method On The Road - Simulation On A Chassis Dynamometer
12 pages
Towards A Statistical Paradigm For Climate: Change
No ratings yet
Towards A Statistical Paradigm For Climate: Change
9 pages
MODULE 3.docx-ASIAN
No ratings yet
MODULE 3.docx-ASIAN
5 pages
Case Problem 2 Gulf Real Estate Properties
No ratings yet
Case Problem 2 Gulf Real Estate Properties
14 pages
Assessment Learning 2. M4
No ratings yet
Assessment Learning 2. M4
10 pages
Application of Network Scheduling Techniques in The Project Management
No ratings yet
Application of Network Scheduling Techniques in The Project Management
14 pages
Effect Sugar On Children
No ratings yet
Effect Sugar On Children
5 pages
Statistics (Kind of Statistics, Classification)
No ratings yet
Statistics (Kind of Statistics, Classification)
2 pages
Module 3 Data Presentation
No ratings yet
Module 3 Data Presentation
9 pages
Lesson 5 - Quantitative Analysis and Interpretation of Data
No ratings yet
Lesson 5 - Quantitative Analysis and Interpretation of Data
78 pages
Basic Statistics
No ratings yet
Basic Statistics
23 pages
MMW Module 4 1 Data Presentation
No ratings yet
MMW Module 4 1 Data Presentation
7 pages
Assignment For Final 2 PDF
No ratings yet
Assignment For Final 2 PDF
4 pages
Data Management
No ratings yet
Data Management
23 pages
Statistics For Managers Notes
No ratings yet
Statistics For Managers Notes
57 pages
E Centroid
67% (3)
E Centroid
5 pages
MMW Module 4 - Statistics
No ratings yet
MMW Module 4 - Statistics
18 pages
Profitability Analysis of Standard Charted Bank Nepal Limited
100% (2)
Profitability Analysis of Standard Charted Bank Nepal Limited
41 pages
STA301 MIDTERM SOLVED MCQS by JUNAID
No ratings yet
STA301 MIDTERM SOLVED MCQS by JUNAID
79 pages
The Commitment-Trust Theory of Relationship Marketing
No ratings yet
The Commitment-Trust Theory of Relationship Marketing
20 pages
Chapter 8 Solution To Example Exercises PDF
No ratings yet
Chapter 8 Solution To Example Exercises PDF
3 pages
Sport Obermeyer
No ratings yet
Sport Obermeyer
4 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

lec4-EDA2025

Uploaded by

lec4-EDA2025

Uploaded by

STATISTICS

Reasons why Statistics are important:

1) The increasingly quantitative approach employed in all the sciences as well as in

Ordinal data – data in which inequalities are set-up.

Numerical or Quantitative Distribution – if data are grouped according to numerical size.

Categorical or Qualitative Distribution - if data are grouped into non-numerical categories.

The construction of frequency distribution consists essentially of three steps:

RULES FOR STEP ONE:

Construct a distribution of the following amounts of sulphur oxides ( in tons ) emitted by an

If we apply the same technique to a cumulative distribution, we obtain what is called

Categorical or qualitative distributions are often presented graphically as pie charts;

POPULATIONS AND SAMPLES

Population – if a set of data consists of all conceivably possible observations of a certain

where: x – number of values in a sample

1) 𝐼𝑡 𝑐𝑎𝑛 𝑏𝑒 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑠𝑒𝑡 𝑜𝑓 𝑛𝑢𝑚𝑒𝑟𝑖𝑐𝑎𝑙 𝑑𝑎𝑡𝑎, 𝑠𝑜 it always exists.

5) It takes into account every item or a set of data.

𝒘𝟏𝒙𝟏 + 𝒘𝟐𝒙𝟐 + 𝒘𝟑𝒙𝟑 + ……….+ 𝒘𝒏𝒙𝒏 ∑ 𝑤𝑥

VARIANCE AND STANDARD DEVIATION

Also, by applying the rules of summation:

For population Standard Deviation:

STANDARD UNIT OR Z – SCORES

DESCRIPTION OF GROUPED DATA

∑ 𝑥.𝑓 𝑛( ∑ 𝑥 2 .𝑓) − ( ∑ 𝑥.𝑓 )2

n – number of items grouped.

where j is the number of items we still lack when we reach U.

1) Annual production of coffee.

Probability Distribution – is a correspondence which assigns probabilities to the values of a

f( x ) = (𝑛𝑥) px ( 1-p )n-x for x = 0, 1, 2 ……….n

p – is the constant probability of a success for each trial.

(𝑛𝑥) - the number of combinations of x objects selected from a set of n objects.

MEAN OF A PROBABILITY DISTRIBUTION

X1 f(x1) + X2 f(x2) + X3 f(x3) + …………………… + Xk f(xk)

Also, 𝜇 = np ( mean of binomial distribution )

STANDARD DEVIATION OF A PROBABILITY DISTRIBUTION

Also, using the computing formula:

𝜎 = √𝑛𝑝 ( 1 − 𝑝 ) ( standard deviation of binomial distribution )

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.