0% found this document useful (0 votes)
18 views54 pages

Lecture - 1 Introduction To Statistics

The document outlines the IS 141 course on Statistics and Probability, detailing its contents, expected learning outcomes, and various statistical concepts such as descriptive statistics, probability theory, and regression analysis. Students will learn to apply statistical methodologies to analyze data and draw conclusions. Additionally, the course includes practical applications using statistical software tools.

Uploaded by

Hussein Kingazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views54 pages

Lecture - 1 Introduction To Statistics

The document outlines the IS 141 course on Statistics and Probability, detailing its contents, expected learning outcomes, and various statistical concepts such as descriptive statistics, probability theory, and regression analysis. Students will learn to apply statistical methodologies to analyze data and draw conclusions. Additionally, the course includes practical applications using statistical software tools.

Uploaded by

Hussein Kingazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

IS 141: Statistics and Probability

Dr. Emmanuel-UDSM

March 27, 2024


Table of contents

Course Contents

Basic Concepts in Statistics


Introduction to Statistics

Descriptive Statistics
Describing data using graphs and tables
Grouped Data
A: Expected Learning Outcomes

Upon completion of this course students will be able to:


(a) Use statistical methodology and tools in the problem-solving
process
(b) Demonstrate understanding of basic concepts derived from
probability and statistics.
(c) Demonstrate ability to manage and organize data, and
identify appropriate statistical analysis.
(d) Model and analyze information and arrive at reasonable
conclusions.estimate univariate confidence intervals and
multivariate confidence regions
B: Content Coverage

I Basic Concepts in Statistics: Introduction to statistics,


Need and objectives of Statistical investigation. Divisions of
statistics, Importance and Limitations of statistics, Variables
and types, Collection of data, presentation of data, frequency
distributions.
B: Content Coverage

I Basic Concepts in Statistics: Introduction to statistics,


Need and objectives of Statistical investigation. Divisions of
statistics, Importance and Limitations of statistics, Variables
and types, Collection of data, presentation of data, frequency
distributions.
II Measures of Central Tendency and Variation: Measuring
center: mean, mode and median Measuring spread: range,
interquartile range, standard deviation and mean deviation
Measuring position: quartiles, percentiles, z-scores, Moments
and their relationships, Sheppard’s correction, Skewness and
Kurtosis, Measuring Kurtosis for a distribution, calculating
coefficient of variation.
III Probability Theory: Introduction to probability, operations
with events, mutually exclusive events, sample space, classical
definition of probability and the relative frequency definition,
rules of probability, Total probability and Bayes’
Theorem.Random Variables and Probability distributions:
Probability density functions for random variables (discrete
and continuous), Discrete distribution (Bernoulli, Binomial,
Hypergeometric, Poisson and Geometric distributions),
continuous distribution function (exponential and normal
distributions). Expected value variance and moment
generating functions.Sampling: Sampling distribution of
means and of proportions, Confidence intervals and hypothesis
testing.
III Probability Theory: Introduction to probability, operations
with events, mutually exclusive events, sample space, classical
definition of probability and the relative frequency definition,
rules of probability, Total probability and Bayes’
Theorem.Random Variables and Probability distributions:
Probability density functions for random variables (discrete
and continuous), Discrete distribution (Bernoulli, Binomial,
Hypergeometric, Poisson and Geometric distributions),
continuous distribution function (exponential and normal
distributions). Expected value variance and moment
generating functions.Sampling: Sampling distribution of
means and of proportions, Confidence intervals and hypothesis
testing.
IV Chi-squared Goodness of fit test: Chi-squared goodness of
fit test for various distributions.
V Regression and Correlation Analysis: Scatter diagrams,
Correlation measures, Simple regression analysis, Estimation
of regression coefficients, forecasting with simple regression,
relationship between correlation and regression and multiple
regression analysis. Application of technology: Introduction to
statistical packages (Eg.R, Python, SPSS, MATLAB etc.)
V Regression and Correlation Analysis: Scatter diagrams,
Correlation measures, Simple regression analysis, Estimation
of regression coefficients, forecasting with simple regression,
relationship between correlation and regression and multiple
regression analysis. Application of technology: Introduction to
statistical packages (Eg.R, Python, SPSS, MATLAB etc.)
C: Reading List:
1. Ross, S. M. (2014). Introduction to probability and statistics
for engineers and scientists. Academic Press.
2. Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2012).
Introduction to probability and statistics. Cengage Learning.
3. Balakrishnan, N., Voinov, V., & Nikulin, M. S. (2013).
Chi-squared goodness of fit tests with applications. Academic
Press.
Introduction to Statistics

Statistics is the art of learning from data.


Introduction to Statistics

Statistics is the art of learning from data.

It is concerned with the collection, description and analysis of data,


which often leads to the drawing of conclusion.
Introduction to Statistics

Statistics is the art of learning from data.

It is concerned with the collection, description and analysis of data,


which often leads to the drawing of conclusion.

Statistics can be divided into two major categories:


Introduction to Statistics

Statistics is the art of learning from data.

It is concerned with the collection, description and analysis of data,


which often leads to the drawing of conclusion.

Statistics can be divided into two major categories:


▶ Descriptive statistics: dealing with description and
summarization of data including averages.
Introduction to Statistics

Statistics is the art of learning from data.

It is concerned with the collection, description and analysis of data,


which often leads to the drawing of conclusion.

Statistics can be divided into two major categories:


▶ Descriptive statistics: dealing with description and
summarization of data including averages.
▶ Inferential statistics: dealing with drawing conclusion from
data, where we must take into account the possibility of
chance (probabilities).
For instance, there are two groups of students in IS141, each group
taught using different method. Suppose that the average score of
members of the first group is quite a little higher than that of the
second group.
Can we conclude that this increase is due to the teaching method
used?
Or is it possible that the teaching method was not responsible for
the increased scores but rather the higher scores of the first group
were just a chance occurrence?
For instance, there are two groups of students in IS141, each group
taught using different method. Suppose that the average score of
members of the first group is quite a little higher than that of the
second group.
Can we conclude that this increase is due to the teaching method
used?
Or is it possible that the teaching method was not responsible for
the increased scores but rather the higher scores of the first group
were just a chance occurrence?

To draw logical conclusions from data, usually make some


assumptions about the chances (or probabilities) of obtaining the
different data values.
For instance, there are two groups of students in IS141, each group
taught using different method. Suppose that the average score of
members of the first group is quite a little higher than that of the
second group.
Can we conclude that this increase is due to the teaching method
used?
Or is it possible that the teaching method was not responsible for
the increased scores but rather the higher scores of the first group
were just a chance occurrence?

To draw logical conclusions from data, usually make some


assumptions about the chances (or probabilities) of obtaining the
different data values.
The totality of these assumptions is referred to as a probability
model for the data.
For instance, there are two groups of students in IS141, each group
taught using different method. Suppose that the average score of
members of the first group is quite a little higher than that of the
second group.
Can we conclude that this increase is due to the teaching method
used?
Or is it possible that the teaching method was not responsible for
the increased scores but rather the higher scores of the first group
were just a chance occurrence?

To draw logical conclusions from data, usually make some


assumptions about the chances (or probabilities) of obtaining the
different data values.
The totality of these assumptions is referred to as a probability
model for the data.

Sometimes the nature of the data suggests the form of the


probability model that is assumed.
For instance, suppose that a scientist wants to find out what
proportion of water bottles produced by a new method, will be
defective.
For instance, suppose that a scientist wants to find out what
proportion of water bottles produced by a new method, will be
defective.

The scientist might select a group of these bottles, with the


resulting data being the number of defective bottles in this group.
For instance, suppose that a scientist wants to find out what
proportion of water bottles produced by a new method, will be
defective.

The scientist might select a group of these bottles, with the


resulting data being the number of defective bottles in this group.

Provided that the bottles selected were ”randomly” chosen, it is


reasonable to suppose that each one of them is defective with
probability p, where p is the unknown proportion of all the bottles
produced by the new method that will be defective. The resulting
data can then be used to make inferences about p.
For instance, suppose that a scientist wants to find out what
proportion of water bottles produced by a new method, will be
defective.

The scientist might select a group of these bottles, with the


resulting data being the number of defective bottles in this group.

Provided that the bottles selected were ”randomly” chosen, it is


reasonable to suppose that each one of them is defective with
probability p, where p is the unknown proportion of all the bottles
produced by the new method that will be defective. The resulting
data can then be used to make inferences about p.

Population and Samples


Population is the total collection of elements/things under
consideration.
For instance, suppose that a scientist wants to find out what
proportion of water bottles produced by a new method, will be
defective.

The scientist might select a group of these bottles, with the


resulting data being the number of defective bottles in this group.

Provided that the bottles selected were ”randomly” chosen, it is


reasonable to suppose that each one of them is defective with
probability p, where p is the unknown proportion of all the bottles
produced by the new method that will be defective. The resulting
data can then be used to make inferences about p.

Population and Samples


Population is the total collection of elements/things under
consideration.

The subgroup of a population is called a sample.


Describing data
The numerical results/findings of a study should be presented
clearly, concisely, and in such a manner that someone can quickly
obtain the essential characteristics of the data.
Describing data
The numerical results/findings of a study should be presented
clearly, concisely, and in such a manner that someone can quickly
obtain the essential characteristics of the data.

Data are often described by using tables and graphs. They reveal
important features such as the range, the degree of concentration,
and the symmetry of the data.
Describing data
The numerical results/findings of a study should be presented
clearly, concisely, and in such a manner that someone can quickly
obtain the essential characteristics of the data.

Data are often described by using tables and graphs. They reveal
important features such as the range, the degree of concentration,
and the symmetry of the data.

Frequency Tables and Graphs


A data set having a relatively small number of distinct values can
be conveniently presented in a frequency table. For example,
Describing data
The numerical results/findings of a study should be presented
clearly, concisely, and in such a manner that someone can quickly
obtain the essential characteristics of the data.

Data are often described by using tables and graphs. They reveal
important features such as the range, the degree of concentration,
and the symmetry of the data.

Frequency Tables and Graphs


A data set having a relatively small number of distinct values can
be conveniently presented in a frequency table. For example,

Starting Salary 47 48 49 50 51 52 53 54 56 57 60
Frequency 4 1 3 5 8 10 0 5 2 3 1
Table: Frequency table for starting yearly salaries (thousands USD) of 42
recently graduated students with B.Sc degree in environmental science.
Line Graph
Data from a frequency table can be graphically represented by a
line graph, showing distinct data vs frequencies.

Figure: Line graph


Bar Graph
Also, data from a frequency table can be graphically represented
by a bar graph.

Figure: Bar graph


Frequency polygon

Another type of graph used to represent a frequency table is the


frequency polygon
Frequency polygon

Another type of graph used to represent a frequency table is the


frequency polygon

Plots the frequencies and data values on the vertical axis, and then
connects the plotted points with straight lines.
Figure: Frequency polygon
Relative frequency tables and graphs

Consider a data set consisting of n values, if f is the frequency of a


particular value, then the ratio f /n is called its relative frequency.
Relative frequency tables and graphs

Consider a data set consisting of n values, if f is the frequency of a


particular value, then the ratio f /n is called its relative frequency.

The relative frequency of a data value is the proportion of the data


that have that value.
Relative frequency tables and graphs

Consider a data set consisting of n values, if f is the frequency of a


particular value, then the ratio f /n is called its relative frequency.

The relative frequency of a data value is the proportion of the data


that have that value.

The relative frequencies can be represented graphically by a relative


frequency line or bar graph or by a relative frequency polygon.

Starting salary 47 48 49 50 51 52 53 54 56 57 60
4 1 3 5 8 10 5 2 3 1
Frequency 42 42 42 42 42 42 0 42 42 42 42

Table: Relative frequency table for starting yearly salaries (thousands


USD) of 42 graduated students.
A pie chart

This is often used to indicate relative frequencies when the data


are not numerical in nature.
A pie chart

This is often used to indicate relative frequencies when the data


are not numerical in nature.

A circle is constructed and then sliced into different sectors; one


for each distinct type of data value.
A pie chart

This is often used to indicate relative frequencies when the data


are not numerical in nature.

A circle is constructed and then sliced into different sectors; one


for each distinct type of data value.

The relative frequency of a data value is indicated by the area of


its sector, this area being equal to the total area of the circle
multiplied by the relative frequency of the data value.
A pie chart

This is often used to indicate relative frequencies when the data


are not numerical in nature.

A circle is constructed and then sliced into different sectors; one


for each distinct type of data value.

The relative frequency of a data value is indicated by the area of


its sector, this area being equal to the total area of the circle
multiplied by the relative frequency of the data value.

Example: The following data relate to the different types of


cancers affecting the 200 most recent patients to enrolled at a
clinic specializing in cancer. These data are represented in the pie
chart presented as follows:
Type of Cancer Number of New Cases Relative Frequencies
Lung 42 0.21
Breast 50 0.25
Colon 32 0.16
Prostate 55 0.275
Melanoma 9 0.045
Bladder 12 0.06
Figure: Pie chart
Grouped data
When the number of distinct values in the data set is too large, it
is useful to divide the values into groupings or class intervals, and
then present the number of data values in each class interval.
Grouped data
When the number of distinct values in the data set is too large, it
is useful to divide the values into groupings or class intervals, and
then present the number of data values in each class interval.

The the appropriate number is a subjective choice, though 5 to 10


are typical depending on the number of values in the data set.
Grouped data
When the number of distinct values in the data set is too large, it
is useful to divide the values into groupings or class intervals, and
then present the number of data values in each class interval.

The the appropriate number is a subjective choice, though 5 to 10


are typical depending on the number of values in the data set.

It is common, although not essential, to choose class intervals of


equal length.
Grouped data
When the number of distinct values in the data set is too large, it
is useful to divide the values into groupings or class intervals, and
then present the number of data values in each class interval.

The the appropriate number is a subjective choice, though 5 to 10


are typical depending on the number of values in the data set.

It is common, although not essential, to choose class intervals of


equal length.

The endpoints of a class interval are called the class boundaries.

We will adopt the left-end inclusion convention, which stipulates


that a class interval contains its left-end but not its right-end
boundary point.
Grouped data
When the number of distinct values in the data set is too large, it
is useful to divide the values into groupings or class intervals, and
then present the number of data values in each class interval.

The the appropriate number is a subjective choice, though 5 to 10


are typical depending on the number of values in the data set.

It is common, although not essential, to choose class intervals of


equal length.

The endpoints of a class interval are called the class boundaries.

We will adopt the left-end inclusion convention, which stipulates


that a class interval contains its left-end but not its right-end
boundary point.

Thus, for instance, the class interval 20-30 contains all values that
are both greater than or equal to 20 and less than 30.
Consider the following set of data for life in hours of 200 lamps;
A grouped frequency table for life in hours of 200 lamps is given by

The class intervals are of length 100, with the first one starting at
500.
The class intervals are of length 100, with the first one starting at
500.
A histogram for the grouped frequency is presented as

Figure: Histogram for the grouped frequency table of 200 lamps


Cumulative frequency Graph

Cumulative frequency is the total of frequencies distributed over


different class intervals (a class interval and all class intervals
below it).
Cumulative frequency Graph

Cumulative frequency is the total of frequencies distributed over


different class intervals (a class interval and all class intervals
below it).

Cumulative frequency graph represents upper class boundaries of


class intervals (horizontal axis) and cumulative frequency (vertical
axis).
Cumulative frequency Graph

Cumulative frequency is the total of frequencies distributed over


different class intervals (a class interval and all class intervals
below it).

Cumulative frequency graph represents upper class boundaries of


class intervals (horizontal axis) and cumulative frequency (vertical
axis).

The relative cumulative frequency graph (ogive) for Lifetime of 200


Lamps is given by
Figure: Relative cumulative frequency graph for lifetime in hours of 200
Lamps

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy