Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
41 views
66 pages
7a1a96f31c748dbb0763fa4427dffe7b
Uploaded by
rehmat ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download
Save
Save 7a1a96f31c748dbb0763fa4427dffe7b For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
0 ratings
0% found this document useful (0 votes)
41 views
66 pages
7a1a96f31c748dbb0763fa4427dffe7b
Uploaded by
rehmat ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save 7a1a96f31c748dbb0763fa4427dffe7b For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
Download
Save 7a1a96f31c748dbb0763fa4427dffe7b For Later
You are on page 1
/ 66
Search
Fullscreen
- Probability and Statistics Unit -1Introduction to Statistics Statistics: * The word statistics has two meanings: * In the most common usage - statistics refers to numerical facts + The number that represents — a) annul income b) age c) the percentage of students who scored grade A d) the starting salary of a typical college graduate + What will be other examples of statistics? ......The following examples present some Statistics: + Approximately 30% of Google’s employees were female in July 2014 (USA TODAY, July 24, 2014). «In 2013, author James Patterson earned $90 million from the sale of his books (Forbes, September 29, 2014). * As per the CBS report, the hotel and restaurant, manufacturing and transportation sectors of Nepal will witness negative growth of 16.3 percent, 1.1 percent and 2.3 percent, respectively, in the current fiscal year (The Himalayan Times, April 30, 2020).+The second meaning of statistics refers to the field or discipline of study. + Statistics is the science of collecting, analyzing, presenting, and interpreting data, as well as of making decisions based on such analyses. + A comprehensive definition given by Croxton and Cowden is: “Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data”* Statistical methods help us make scientific and intelligent decisions. ‘Decisions made by using statistical methods are called educated guesses. *Decisions made without using statistical (or scientific) methods are called pure guesses and, hence, may prove to be unreliable. * For example: .......Applications: Accounting: Generally the number of individual accounts receivable is large and time taking to check its validity. Based on sample data auditors make conclusions as to whether the accounts receivable amount shown on the client's balance is acceptable or not.Finance: Financial analysis, uses variety of statistical information and methods to guide their investment recommendations. Economics: Economists use a variety of statistical information and methods in making forecasting, planning and formulations economic policies price index numbers, unemployment rates, manufacturing capacity utilization, human development indicator indices, and quality control charts ete.Basic Terms Population or target population: The collection of all elements/members whose characteristics are being studied. For example:. Sample: A portion/fraction of the population of interest. For example: .. Semple Fig1. the relation between population and sampleGoal of Sample: Usually populations are so large that a researcher cannot examine the entire group. Therefore, a sample is selected to represent the population in a research study. The goal is to use the results obtained from the sample to help answer questions about the population.THE POPULATION All of the individuals of interest {f N en The sample mate ori eT is selected from are generalize ¢ to the population the population. \ THE SAMPLE # The individuals selected to participate in the research studyBasic terms continued... Survey: A survey is a research method used for collecting data from @ predefined group of respondents to gain information and insights into various topics of interest. Census: procedure of systematically calculating, acquiring and recording information about the members of a given population. Sample Survey: procedure of systematically calculating, acquiring and recording information from only a portion of a population of interest.* Variable - A variable is a characteristic under study that assumes different values for different elements. - A variable is often denoted by letters x, y, or z - The value of a variable for an element is called an observation or measurement. * Data - collection of information/observations - The goal of statistics is to help researchers organize and interpret the data.Types of Variables Some variables (such as the height of person, price of groceries) can be measured numerically, whereas others (such as occupation, income sources) cannot. Variables are classified into two types: a) Quantitative Variable b) Qualitative Variablei) Quantitative Variable + A variable that can be measured numerically is called a quantitative variable. + The data collected on a quantitative variable are called quantitative data. * Example: Number of workers: 23, 24, 25, 15, 19, 18 + Other examples: - Annual Gross sale - No, of accidents - Weight of a laptop - Temperature - No, of gadgets owned* As you can see from the above examples that certain quantitative variable can assume may be countable or noncountable * Quantitative variables may be classified into two categories a) Discrete Variable b) Continuous VariableA) Discrete Variable + Variable whose values are countable. + In other words, a discrete variable can assume only certain values with no intermediate values. + For example: - No. of accidents - The no. of daily admissions in a general hospitals - The no. of people visit bank in on any day - The no, of books ina libraryB) Continuous Variable * A variable that can assume any numerical value over a certain interval or intervals is called a continuous variable. * Example: - Price of book: USD105.6 - Annual salary - Body temperature - Expenditure on food on any day - The time it takes to complete a certain taskii) Qualitative or Categorical Variable *A variable that cannot assume a numerical value but can be classified into two or more nonnumeric categories is called a qualitative or categorical variable. * The data collected on such a variable are called qualitative data. + Examples: - Gender of a person - Aperson’s blood type - Occupation - Modes of transportationFigure 1.1 summarizes the different types of variables. Variable Quantitative ‘Quatitative or categorical (6g, make of a ——) computer, opinions of People, gender) Discrete Continuous (é.g., number of (€.9., length, houses, cars, age, height, -aceidents) weight, time) Figure 1.1. Types of variables,Measuring Variables * To establish relationships between variables, researchers must observe the variables and record their observations. This requires that the variables be measured. + The process of measuring a variable requires a set of categories called a scale of measurement and a process that classifies each individual into one category.Four Types of Measurement Scales measurements, true haagehtin (Strongest forms of f measurement) ssgevtes Cinenatpaa measurements but no- true zero: tT Higher Levels Ordered Categories (rankings, order, or sealing) Lowest Levet Categories (no ordering or direction) (Weakest form of measurement)Nominal data: Categorical data and numbers that are simply used as identifiers or names represent a nominal scale of measurement. Examples: Gender: a) male b) female Ordinal data: An ordinal scale of measurement represents an ordered series of relationships or rank order. Individuals competing in a contest may be fortunate to achieve first, second, or third place. First, second, and third place represent ordinal data Examples: organizational chart, post, educational qualification,Interval data: * Ascale which represents quantity and has equal units but for which zero represents simply an additional point of measurement is an interval scale + Example: Temperature, Ph, SAT Score, IQ Test Ratio data: * The ratio scale of measurement is similar to the interval scale in that it also represents quantity and has equality of units. + However, this scale also has an absolute zero (no numbers exist below the zero). * Very often, physical measures will represent ratio data (for example, height and weight).Example: Scale of measurement Scale Nominal — Numbers Assigned to Runners Ordinal Rank Order Finish of Winners: Interval Performance Rating on a 8.2 oa 3.8 Oto 10 Scale Time to Finish, inBranches of Statistics + Descriptive statistics are methods for organizing and summarizing data, + For example, tables or graphs are used to organize data, and descriptive values such as the average score are used to summarize data. * A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic.+ Inferential statistics are methods for using sample data to make general conclusions (inferences) about populations. * Because a sample is typically only a part of the whole population, sample data provide only limited information about the population. As a result, sample statistics are generally imperfect representatives of the corresponding population parameters.Things to remember... * A descriptive study may be performed either on a sample or ona population. Only when an inference is made about the population, based on information obtained from the sample, does the study become inferential. * Descriptive statistics and inferential statistics are interrelated. You must almost always use techniques of descriptive statistics to organize and summarize the information obtained from a sample before carrying out an inferential analysis.Describing Data with Numerical Measures a) Measure of central tendency and location b) Measure of VariabilityTopics: * Compute and interpret the mean, median, and mode for a set of data * Compute the range, variance, and standard deviation and know what these values mean * Construct and interpret a box and whiskers plot * Compute and explain the coefficient of variation * Use numerical measures along with graphs, charts, and tables to interpret dataSummary Measures Describing Data Numerically ‘Other Measures Variation of Location Mean Range Percentiles [Median | interquartile Range Mode [Giuarties ] =a ‘Standard Deviation Coefficient of VariationMeasures of Center and Location Overview Center and Location Mean | [ Median Mode [Weighted Mean | __Es Boome a come. Tw = . wx, we n=Measures of Center for Ungrouped and Grouped Data a) Mean Calculating Mean for Ungrouped Data The mean for ungrouped data is obtained by dividing the sum of all values by the number of values in the data set. Thus, Mean for population data: ee Mean for sample data: ae where J.r is the sum of all values, 'Vis the population size, n is the sample size, w isthe population mean, and x is the sample mean.Calculating Mean for Grouped Data Mean for population data: 2 Mean for sample data: j-U n where m is the midpoint and fis the frequency of a class,b) Median + In an ordered array, the median is the “middle” number + Ifn or N is odd, the median is the middle number + If n or N is even, the median is the average of the two middle numbers * The advantage of using the median as a measure of central tendency is. that it is not influenced by outliers. + When outliers exist, use median instead of mean as a measure of central tendency.»The median is the value of the middle term in a data set that has been ranked in increasing order. th Median = (2 " ‘| value 173,175 49,723 20,352 10,824 40,911 18,038 61,848 Find the median for these data. 10,824 18,038 20,352 bic 49,723 61,848 173,1752010 Total Compensa Guillions of dollars) CEO and Company Michael D, White (DirecTV) 32.9) 22.9 y Black & Dee! J, Ellison (Oracle) 32.6 329 Ol 76.1 845 = PhO 2E2 a =28.1=$28.1 million MedianCalculating median for grouped data nflwof , f Median= /+ Where I= lower limit of median class n/2= median position cf = cumulative frequency preceding to median class f= median class frequency h=class width of median classc) Mode Mode for ungrouped data * The mode is the value that occurs with the highest frequency in a data set. + Example: ....... + Advantage: - Can be used for both Qualitative and Quantitative data, whereas the mean and median can be calculated for only quantitative data - Not affected by outliers * Disadvantage: (dependent on the nature of data set) - There may be no mode - There may be several modesCalculating mode for grouped data My=L+—"— xh At ly where, —_L= Lower limit of the modal class 4, = (f,- fa) = difference between highest frequency and preceding frequency 4, = (f; -f,) = difference between highest frequency and succeeding frequency h=class width or class intervald) Weighted Mean * Weighted Mean is an average computed by giving different weights to some of the individual values. If all the weights are equal, then the weighted mean is the same as the arithmetic mean. * It represents the average of a given data. The Weighted mean is similar to arithmetic mean or simple mean. The Weighted mean is calculated when data is given ina different way compared to an arithmetic mean or simple mean. + The Weighted mean for given set of non-negative data x,, x», X3,...x,, with non- negative weighted w,, W., W3,.. Wy Then the weighted mean is given by; Kg = WekatWarytWangt...tWay _ Lwe w Wy tw, +W3t... Wy ~ Ew where, w = given weightExample: Sample of 26 Repair Projects Days to | Frequency Weighted Mean Days Complete to Complete: 5 4 | > 44124842 6 12 164 7 3 = 5g 7 631 days 8 2Which measure of location is the “best”? *Mean is generally used, unless extreme values (outliers) exist *Then median is often used, since the median is not sensitive to extreme values.Relationships Among the Mean, Median, and Mode all, —_L+— | ‘Right and let sides are miro images * Pastvy Skanred “Negatively Skewed a tea * ange ttn the bgh ves * Lange tan th ew values “Haan Men = Hose ‘Me > Madan > Hoce = Moan > Medn » Mes ZN. fe 2m®™p ‘wa hoe senor” talentPartition values * The variate values dividing into the total number of observation in equal number of parts are known as partition values, + If the values of the variate are arranged in ascending or descending order of magnitudes, then we have seen that median is that value of the variate which divides the total frequencies in two equal parts. imilarly the given series can be divided into four, ten and hundred equal parts. * Quartile: The values of the variate which divide the total frequency into four equal parts, are called quartiles. there are three types of quartiles:- first quartile (Q,), second quartile (Q,), and third quartile (Q; ). * Decil Deciles are those values that le any set of a given observation into a total of ten equal parts. Therefore, there are a total of nine deciles, These representation of these deciles are as follows Djs Dyy Dyy Dy sesessees Dys + Percentile: Percentile basically divide any given observation into a total of 100 equal parts. The representation of these percentiles or centiles is given as P,, P, Py, Py «.Percentiles * The p" percentile in an ordered array of n values is the value in i” position, where hi qo") “a5 = Example: The 60" percentile in an ordered array of 19 values is the value in 12" arr i= Boin+t) = a9 4+1)=12Calcul: * Quartile: n_of Partition value: in qoee. (| dy Q=L+ ho Where, i= 1,23 + Decile: where, i= 1,2,3,...,9 * Percentile: PR=L+ where, i= 1,2,3,4,.......99 Note ; Median = Q2= Ds= PsoInterquartile Range * Can eliminate some outlier problems by using the interquartile range + Eliminate some high-and low-valued observations and calculate the range from the remaining values. * Interquartile range = 3 quartile — 1* quartileInterquartile Range Example: Median x... hii cll (22) maximum | #2 | 25% 25% |wx_| 12 30 45 70 — Interquartile range =57=30=27Box and Whisker Plot * A Graphical display of data using 5-number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum Example: = = = aeFeatures of Box and Whisker plot: Gives a graphic presentation of data using five measures: the median, the first quartile, the third quartile, and the smallest and the largest values in the data set between the lower and the upper inner fences. Can help visualize the center, the spread, and the skewness of a data set. It also helps detect outliers. Always located at actual data points, are quickly computable (originally by hand), and have no tuning parameters. They are particularly useful for comparing distributions across groups.* Symmetric + Right Skewed * Left SkewedWhy Use a Boxplot? * A boxplot provides an alternative to a histogram, a dot plot, and a stem-and- leaf plot. Among the advantages of a boxplot over a histogram are ease of construction and convenient handling of outliers. In addition, the construction of a boxplot does not involve subjective judgements, as does a histogram. That is, two individuals will construct the same boxplot for a given set of data - which is not necessarily true of a histogram, because the number of classes and the class endpoints must be chosen. On the other hand, the boxplot lacks the details the histogram provides. * Dot plots and stem plots retain the identity of the individual observations; a boxplot does not. Many sets of data are more suitable for display as boxplots than as a stem plot. A boxplot as well as a stem plot are useful for making side-by-side comparisons.Measures of Variation Variation Range Variance Standard Deviation Coefficient of | Variation Population Interquartile Population ‘op it Variance [> Standard foe atten Deviation Sample L_[Sampie Variance Standard DeviationVariation * Measures of variation give information on the spread or variability of the data values. Same center, different variationMeasures of Dispersion for Grouped and Ungrouped Data Range * Range = Largest value — smallest value Area fsquane miles) SHIN2 49.651 Oklahoma Texas am Range = Largest value — smallest value = 267,277 - 49,651 = 217,626 square milesDisadvantages of the Range * Ignores the way in which data are distributed <———s 2-4 2 © @ @ @ @ 7 8 9 10 tl 1 7 8 9 WwW HW RR Range =12-7=5 [Range =12-7=5 * Sensitive to outliers 4,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range =5-1=4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120-1= 119Variance « Average of squared deviations of values from the mean(individual series) n - * Sample variance: x — x)? ie _ 2A i y n-1 N * Population variance: ye & —u)? oe = NStandard Deviation * Most commonly used measure of variation * Shows variation about the mean * Has the same units as the original data + Sample standard deviation: (Ungroup data) * Population standard deviation:For group data standard deviation is computed by using the following relationship rs n= _O py n(a—1) Population standard deviation G = Df lx-a) N = (Le Ky N N? Sample standard deviation (s) = a-lComparing Standard Deviations Data A Mean = 15.5 i112 43 14 15 16 17 «18 49 20 21 S = 3.338 Data B 12°13 «14 «15 16 17 18 19 20 21 Data C Mean = 15.5 W 12 «13 «14 «18 16 17 18 19 20 21 S$ =4.57Coefficient of Variation (CV) * C\V. is most widely used relative measure of dispersion in comparing two or more than two distribution. * While comparing the two or more distribution, lower the C.V., more homogeneous or more consistent or more uniform or more regular or more stable distribution. *C.V. is used to compare two or more distribution about their variability, consistency, uniformity, homogeneity, equitability, stability etc.Coefficient of Variation (CV) Note: A low CV indicates that there is a low variation in the data set and hence, a higher consistency. cvs2x1 00% (population) ut CV= s x100% (sample) ¥+ E.g. 1. Consider the distribution of the yields(per plot) of two paddy varieties and the information is given below: Va Mean (KG) 60 SD.(K.G) 0 9 ‘Vari C.V. for Variety 1= > 100 = 16.7 % mmp Less varinbility=mp More consistent C.V.for Variety 1= 2 100 = 18,0.% * But in terms of 8.D. the interpretation could be reverse.The Empirical Rule * If the data distribution is bell-shaped, then the interval: * +10 contains about 68% of the values in the population or the sample % a= u ‘—ptto -:The Empirical Rule * w+2o contains about 95% of the values in the population or the sample * ~pt3o contains about 99.7% of the values in the population or the sampleTchebysheff’s Theorem * Regardless of how the data are distributed, at least (1 - 1/k?) of the values will fall within k standard deviations of the mean * Examples: Atleast (1-2/1?) = 0% . (1- 1/22) = 75% (1-1/3?) = 89%
You might also like
Basic Statistics PDF
PDF
No ratings yet
Basic Statistics PDF
43 pages
Statistics by Begashaw Moltot
PDF
100% (2)
Statistics by Begashaw Moltot
232 pages
Descriptive Statistics: Atistics
PDF
No ratings yet
Descriptive Statistics: Atistics
49 pages
1 Introduction To Statistics
PDF
No ratings yet
1 Introduction To Statistics
89 pages
STATISTICS
PDF
No ratings yet
STATISTICS
98 pages
Stat195 Handout (Rev)
PDF
50% (2)
Stat195 Handout (Rev)
101 pages
Lecture 1: Introduction To Statistics
PDF
No ratings yet
Lecture 1: Introduction To Statistics
23 pages
Introduction To Statistics-Part I
PDF
No ratings yet
Introduction To Statistics-Part I
28 pages
ENGDAT1 Module1 PDF
PDF
No ratings yet
ENGDAT1 Module1 PDF
34 pages
Basic Concepts in Statistics
PDF
No ratings yet
Basic Concepts in Statistics
42 pages
STATISTICS Powrepoint 2
PDF
No ratings yet
STATISTICS Powrepoint 2
82 pages
Basic Statistics For Testing
PDF
No ratings yet
Basic Statistics For Testing
58 pages
Chapter 1
PDF
No ratings yet
Chapter 1
17 pages
LESSON 2 Introduction To Statistics Continuation
PDF
100% (1)
LESSON 2 Introduction To Statistics Continuation
32 pages
Introduction of Statistics
PDF
100% (1)
Introduction of Statistics
14 pages
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
PDF
No ratings yet
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
26 pages
Statistics Analysis With Software Application
PDF
No ratings yet
Statistics Analysis With Software Application
22 pages
Ecs Notes
PDF
No ratings yet
Ecs Notes
10 pages
Intro - Stat
PDF
No ratings yet
Intro - Stat
29 pages
Lec 1 - Data, Tables and Graphs
PDF
No ratings yet
Lec 1 - Data, Tables and Graphs
18 pages
1 Chapt 1 Part 1
PDF
No ratings yet
1 Chapt 1 Part 1
41 pages
Chapter 1 Introduction To Statistics
PDF
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Basic Concepts About Statistics
PDF
No ratings yet
Basic Concepts About Statistics
28 pages
Chapter 1 - NATURE OF STATISTICS
PDF
No ratings yet
Chapter 1 - NATURE OF STATISTICS
14 pages
1 - Intro To Statistics
PDF
No ratings yet
1 - Intro To Statistics
11 pages
Nature of Statistics
PDF
No ratings yet
Nature of Statistics
7 pages
Chapter 1. Introductory Notions Meaning of Statistics
PDF
No ratings yet
Chapter 1. Introductory Notions Meaning of Statistics
4 pages
Introduction To Statistical Science
PDF
No ratings yet
Introduction To Statistical Science
5 pages
Statistics Note 1to 4 2
PDF
No ratings yet
Statistics Note 1to 4 2
25 pages
Lecture 1 - Introduction To Statistics
PDF
No ratings yet
Lecture 1 - Introduction To Statistics
3 pages
STATAPP1
PDF
No ratings yet
STATAPP1
11 pages
Nature of Statistics
PDF
100% (1)
Nature of Statistics
7 pages
Chapter One Definition of Statistics
PDF
No ratings yet
Chapter One Definition of Statistics
17 pages
Statistics and Probability Lesson 1
PDF
100% (1)
Statistics and Probability Lesson 1
6 pages
Introduction Statistics
PDF
100% (1)
Introduction Statistics
23 pages
Lesson 1:: Basic Terminologies in Statistics
PDF
No ratings yet
Lesson 1:: Basic Terminologies in Statistics
3 pages
Week 1 Lecture
PDF
No ratings yet
Week 1 Lecture
32 pages
Module One Two One
PDF
No ratings yet
Module One Two One
32 pages
Lesson 1 Basic Concepts of Statistics
PDF
No ratings yet
Lesson 1 Basic Concepts of Statistics
9 pages
Stats Bio Supp. 1
PDF
No ratings yet
Stats Bio Supp. 1
11 pages
CHP1 Mat161
PDF
No ratings yet
CHP1 Mat161
4 pages
Basic Concept in Statistics-Biostat
PDF
No ratings yet
Basic Concept in Statistics-Biostat
29 pages
Statistik 1
PDF
No ratings yet
Statistik 1
17 pages
Lecture 1 Introduction To Biostatistics
PDF
No ratings yet
Lecture 1 Introduction To Biostatistics
31 pages
Chapter 1 The Nature of Probability and Statistics Updated Spring 2023-2024
PDF
No ratings yet
Chapter 1 The Nature of Probability and Statistics Updated Spring 2023-2024
38 pages
1-STAT-302 - Spring 2019 (4 Slides Per Page Can Be Printed)
PDF
No ratings yet
1-STAT-302 - Spring 2019 (4 Slides Per Page Can Be Printed)
25 pages
Note For Int To Statistics
PDF
No ratings yet
Note For Int To Statistics
24 pages
Educational-Statistics Basic-Terms Sampling Data-Gathering
PDF
No ratings yet
Educational-Statistics Basic-Terms Sampling Data-Gathering
21 pages
Math 101 Statistics
PDF
No ratings yet
Math 101 Statistics
100 pages
m1002 Lecture One 2025
PDF
No ratings yet
m1002 Lecture One 2025
15 pages
Variables & Scales of Mesaurement
PDF
No ratings yet
Variables & Scales of Mesaurement
21 pages
Chap 1
PDF
No ratings yet
Chap 1
5 pages
Chapter One
PDF
No ratings yet
Chapter One
7 pages
Chapter-1 Data Analysis
PDF
No ratings yet
Chapter-1 Data Analysis
14 pages
Statistics Lesson 1
PDF
No ratings yet
Statistics Lesson 1
111 pages
Prob and Stat - Unit1
PDF
No ratings yet
Prob and Stat - Unit1
67 pages