Statistics and Probability - Power Point Slides
Statistics and Probability - Power Point Slides
Objective
To inculcate in you an attitude of Statistical and Probabilistic thinking. To give you some very basic techniques in order to apply Statistical analysis to realworld situations/problems.
WHAT IS STATISTICS?
That science which enables us to draw conclusions about various phenomena on the basis of real data collected on sample-basis A tool for data-based research Also known as Quantitative Analysis Any scientific enquiry in which you would like to base your conclusions and decisions on real-life data, you need to employ statistical techniques! Now a days, in the developed countries of the world, there is an active movement for of Statistical Literacy.
Application Areas
A lot of application in a wide variety of disciplines Agriculture, Anthropology, Astronomy, Biology, Economics, Engineering, Environment, Geology, Genetics, Medicine, Physics, Psychology, Sociology, Zoology . Virtually every single subject from Anthropology to Zoology . A to Z!
STATISTICS
DESCRIPTIVE STATISTICS
INFERENTIAL STATISTICS
WEEKS
EXAMS
1 TO 5 6 TO 10 11 TO 15
1 TO 15 16 TO 30 31 TO 45
DESCRIPTIVE STATISTICS
PROBABILITY INFERENTIAL STATISTICS
MID-TERMI
MID-TERMII FINAL EXAM
GRADING
There will be two term exams and one final exam. In addition, there will be 15 homework assignments. The final examination will be comprehensive in nature. (Approximately 25-30% of the final exam paper will be on the course covered upto the Mid-Term-II Exam.) These will contribute the following percentages to the final grade: Mid-Term-I: 20% Mid-Term-II: 20% Final Exam: 30% Homework Assignments: 30%
Meaning of Statistics
STATUS
Statistics
Meanings
Political State
EXAMPLES OF DATA
Data are collected in many aspects of everyday life. Statements given to a police officer or physician or psychologist during an interview are data. The correct and incorrect answers given by a student on a final examination. Almost any athletic event produces data. The time required by a runner to complete a marathon, The number of errors committed by a baseball team in nine innings of play.
EXAMPLES OF DATA
And, of course, data are obtained in the course of scientific inquiry: The positions of artifacts and fossils in an archaeological site, The number of interactions between two members of an animal colony during a period of observation, The spectral composition of light emitted by a star.
Types of Data
Data
Quantitative (Numeric)
Variable
A quantity that, varies from an individual to individual.
Variable
Quantitative (Numeric)
QUANTITATIVE & QUALITATIVE VARIABLES Variables may be classified into quantitative and qualitative according to the form of the characteristic of interest. A variable is called a quantitative variable when a characteristic can be expressed numerically such as age, weight, income or number of children. On the other hand, if the characteristic is nonnumerical such as education, sex, eye-colour, quality, intelligence, poverty, satisfaction, etc. the variable is referred to as a qualitative variable. A qualitative characteristic is also called an attribute. An individual or an object with such a characteristic can be counted or enumerated after having been assigned to one of the several mutually exclusive classes or categories.
Variable
Variable
Quantitative (Numeric)
Continuous
Discrete
Continuous Variable
Continuous Variable
Discrete Variable
Discrete Variable
Gaps, Jumps
Measurement Scales
Nominal Scale Ordinal Scale
Measurement Scales
Interval Scale
Ratio Scale
MEASUREMENT SCALES
By measurement, we usually mean the assigning of number to observations or objects and scaling is a process of measuring. The four scales of measurements are briefly mentioned below: NOMINAL SCALE The classification or grouping of the observations into mutually exclusive qualitative categories or classes is said to constitute a nominal scale. For example, students are classified as male and female. Number 1 and 2 may also be used to identify these two categories. Similarly, rainfall may be classified as heavy moderate and light. We may use number 1, 2 and 3 to denote the three classes of rainfall. The numbers when they are used only to identify the categories of the given scale, carry no numerical significance and there is no particular order for the grouping.
Example
Chemical and manufacturing plants sometimes discharge toxic-waste materials such as DDT into nearby rivers and streams These toxins can adversely affect the plants and animals inhabiting the river and the river bank.
A study of fish was conducted in the Tennessee River in Alabama and its three tributary creeks: Flint creek, Limestone creek and Spring creek. A total of 144 fish were captured, and the following variable measured for each one:
1. River/Creek from where fish was captured 2. Species of fish (Channel fish, Largemouth bass or smallmouth buffalo fish) 3. Length of fish (Centimeters) 4. Weight of fish (grams) 5. DDT concentration in the bodily system of the fish (parts per million)
Also, identify the types of measurement scales for each of the five variables.
Solution
The variables Length, weight and DDT concentration are quantitative variables because each is measured on a nominal scale (Length is centimeters, Weight is grams and DDT in parts per million).
All three of these variables are being measured on the Ratio Scale.
Rationale
Whenever we speak about the weight of an object, obviously, if our measuring instrument reads zero, this means that the object being measured has zero weight --- and, in this sense, the zero would be a true zero. An exactly similar argument holds for the length of an object.
As far as DDT concentration in the bodily system of the fish is concerned, obviously, if there is absolutely no DDT in the fish, then the DDT concentration reads zero --- and, this particular zero reading will be true zero.
As, explained above, the three variables length of fish, weight of fish and DDT concentration in the bodily system of the fish are quantitative variables measures on the ratio scale. In contrast:
Data on River/Creek from which the fish were captured, and the species of fish are qualitative data. Both of these variables are measured on Nominal Scale.
Rationale
The river/creek from which the fish were captured, and the species of fish are qualitative data because these can not be measured quantitatively, they can only be classified into categories. (i.e. Channel fish, Largemouth bass or smallmouth buffalo fish for the species and Tennessee River, Flint creek, Limestone creek and Spring creek)
The Statistical methods for describing, reporting and analyzing data depend on the type of data measured (i.e. whether data are quantitative or qualitative).
Experience has shown that a continuous variable can never be measured with perfect fineness because of certain habits and practices, methods of measurements, instruments used, etc. the measurements are thus always recorded correct to the nearest units and hence are of limited accuracy. The actual or true values are, however, assumed to exist. For example, if a students weight is recorded as 60 kg (correct to the nearest kilogram), his true weight in fact lies between 59.5 kg and 60.5 kg, whereas a weight recorded as 60.00 kg means the true weight is known to lie between 59.995 and 60.005 kg. Thus there is a difference, however small it may be between the measured value and the true value. This sort of departure from the true value is technically known as the error of measurement. In other words, if the observed value and the true value of a variable are denoted by x and x + respectively, then the difference (x + ) x, i.e. is the error. This error involves the unit of measurement of x and is therefore called an absolute error. An absolute error divided by the true value is called the relative error. Thus the relative error, which when multiplied by 100, is percentage error. These errors are independent of the units of measurement of x. It ought to be noted that an error has both magnitude and direction and that the word error in statistics does not mean mistake which is a chance inaccuracy.
ERRORS OF MEASUREMENT
Errors of Measurements
Errors of Measurements
Statistical Inference
A Statistical Inference in an estimate or prediction or some other generalization about a population based on information contained in sample. That is, we use information contained in sample to learn about the larger population.
In order of understand the concept of Reliability, a very important point to be understood is that making an inference about population from the sample is only part of the story. We also need to know its reliability --- that is, how good our inference is.
Measure of Reliability
A measure of reliability is a statement (usually quantified) about the degree of uncertainty associated with a statistical inference.
The point to be noted is that the only way we can be certain that an inference about population is correct is to include the entire population in our sample. However, because of resource constraints, (i.e. Insufficient time and/ or money). We usually can not work with whole population, so we base our inference on just a portion of population (i.e. Sample)
Consequently, whenever possible, it is important to determine and report the reliability of each inference made. As such, reliability is the fifth element of statistical inferencial problems.
Example
A large paint retailer has had numerous complaints from customers about underfilled paint cans. As, a result retailer has begun inspecting incoming shipments of paint from suppliers. Shipments with under-filled problems will be sent back to supplier.
A recent shipment contained 2,440 gallonsize cans. The retailer sampled 50 cans and weighted each on a scale capable of measuring weight to four decimal places. Properly filled cans weigh 10 pounds.
a) b) c) d) e)
Describe a population Describe a variable of interest Describe a sample Describe the Inference Describe a measure of uncertainty of our inference.
Solution
a) The population is the set of units of interests to the retailer, which is the shipment of 2,440 cans of paint. b) The weight of paint cans is the variable, the retailer wishes to evaluate.
c) The sample is the subset of population. In this case, it is the 50 cans of paint selected by the retailer.
d) The inference of interest involves the generalization of the information contained in the sample of paint cans to the population of paint cans.
In particular, Retailer wants to learn about the content of under-filled problem (if any) In the population. This might be accomplished by finding the average weight of the cans in the sample, and using it to estimate the average weight of the cans of population.
e) As far as the measure of reliability of our inference is concerned, the point to be noted is that, using statistical methods, we can determine a bound on the estimation error.
This bound is a measure of the uncertainty of ou r inference, or, in other wo rd s, the reliability of statistical inference. The crux of the matter is that an inference is incomplete without a measure of its reliability
When the weights of 50 paint cans are used to estimate the average weight of all the cans, the estimate will not exactly mirror the entire population. For Example:
If the sample of 50 cans yields a mean weight of 9 pounds, it does not follow (nor is it likely) that the mean weight of population of can is also exactly 9 pounds.
Nevertheless, we can use sound statistical reasoning to ensure that our sampling procedure will generate estimate that is almost certainly within a specified limit of the true mean weight of all the cans.
For example such reasoning might assure us that the estimate of the population from the sample is almost certainly within 1 pound of the actual population mean. The implication is that the actual mean weight of the entire population of the cans is between 9 1=8 pounds and 9 +1=10 pounds --- that is, (9 1) pounds. This interval represents the a measure of reliability for the inference.
Methods of data collection In other words, you will begin your journey in a subject with reference to which it has been said that statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.