0% found this document useful (0 votes)
20 views22 pages

Biostatistics Biochemistry 1

Data analysis

Uploaded by

Binte Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Biostatistics Biochemistry 1

Data analysis

Uploaded by

Binte Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 22
Chapter 1:Definition of Statistic ‘Statistics is concerned with scientific methods foreollecting, organising, summarising, presenting and analysing dataas well as deriving valid conclusions and making reasonabledecisions on the basis of this analysis. A yariableis a characteristic or attribute that can assume different values Dataconsist of information coming from observations, counts, measurements or responses Descriptive statisticyconsisis of the collection, organization, summarization, and presentation of data Tnferential statisticgconsists of generalizing from samples to populations, perfomming estimations and hypothesis tests, determining relationships among variables, and making predictions ‘A population is the collection of ail outcomes, responses, measurements, or counts that are of interest. A sample isa subset, or part of a population, Example [ All the students of Jazan university + Population ‘The students of Computer Science at Jazan university — Sample Example 2: Ina recent survey, 1500 adults in the United States were asked if they thought there was solid evidence of global warming, Eight hundred 55 of the adults said yes. Identify the population and the sample. Describe the sample data set ‘The sample consists of the responses of the 1500 adults in the United States in the survey. The sample is a subset of the responses of all adults in the United States. The sample data set consists of 855 yes’s and 645 no's A parameter: is a numerical description of a population characteristic A statisti¢is a numerical description of a sample characteristic Example 3: Decide whether the numerical value describes « population parameter of a sample statistic. Explain your reasoning. 1. A recent survey of 200 college career centers reported that the average starting salary for petroleum engineering majors is $83,121 2. The 2182 students who accepted admission offers to Northwestern University in 2009 have an average SAT score of 1442 3. In a random check of sample of retail stores, the Food and Drug Administration found that 34% of the stores were not storing fish at the proper temperature. Solution 1. Itis a sample statistic, because the average of $83,121 is based on a subset of the population 2. Itis a population parameter, because the SAT score of 1442 is based on all the students who accepted admission offers in 2009, 3. it is a sample statistic, because the percent of 34% is based on a subset of the population. Variables and Types of Data Data sets can consist of two types of data: qualitative data and quantitative data Qualitative data consist of attributes, labels, or nonnumerical entries, Quantitative data consist of numerical measurements or counts. Example ¢ ‘The suggested retail prices of several Ford vehicles are shown in the table, Which data are qualitative data and which are quantitative data? Model ‘Suggested retail price Focus Sedan [$15,995 Fusion $19,270 Mustang $20,995 Edge 326.920 Flex $28,495 Escape Hybrid_[ $32,260 Expedition [$35,085 F450 S145 ‘The information shown in the table can be separated into two data sets One data set contains the names of vehicle models, and the other contains the suggested retail prices of vehicle models, The names are nonnumerical entries, so these are qualitative data. The suggested retail prices are numerical entries, so these are quantitative data ‘Qualitative variables are variables that can be placed into distinet categories, according to some characteristic or attribute For example, if subjects are classified according to gender (male ar female), then the variable gender is qualitative ‘Quantitative variables are numerical and can be ordered or ranked. For example, the variable age is numerical, and people ean be ranked in order according to the value of their ages Quantitative variables can be further classified into two groups: discrete and continuous, Discrete variables assume values that can be counted. Examples * The number of children ina family = The number of students in a classroom, = The number of calls received by a switchboard operator each day for a month. Note: Discrete variables can be assigned values such as 0, 1, 2, 3 and are said to be countable Continuous variables can assume an infinite number of values between any two specific values. They are obiained by measuring, They often include fractions and decimals Examples: = Temperature is a continuous variable, since the variable can assume an infinite number of values between any two given temperatures Note: Continuous variables, by comparison, can assume an infinite number of values in an interval between any two specific values NN 2 55\PDF Printer Example: Classify each variable as discrete or continuous a, Number of pizzas sold by Pizza Express each day (Discrete) 6. Relative humidity levels in operating rooms at local hospitals. (Continuous) ©, Number of bananas in a bunch at several local supermarkets. (Discrete) d. Lifetimes (in hours) of 15 iPod batteries (Continuous) e. Weights of the backpacks of first graders on a school bus. (Continuous) _f Number of students each day who make appointments with a math tutor at a local college (Discrete) g. Blood pressures of runners ina marathon, (Continuous) ‘i, Temperature of room. (Continuous) i Results of rolling two dices. (Discrete) The length of a leaf’ (Continuous) KA dog's weight. (Continuous) Example: Classify each variable as qualitative or quantitative 4. Marital status of nurses in a hospital. (Qualitative) 6. Time it takes to run a marathon, (Quantitative) €. Weights of lobsters in a tank in a restaurant (Quantitative) d. Colors of automobiles in a shopping center parking lot (Qualitative) e. Ounces af ice cream in a large milkshake. (Quantitative) J Capacity of the NFL football stadiums. (Quantitative) & Ages of people living ina personal care home. (Quantitative) +h, The age of your car, (Quantitative) #-The number of hairs on your knuckle, (Quantitative) J. The sofiness of a cat. (Qualitative) A.The color of the sky. (Qualitative) Chapter 2:Frequency Distributions and Graphs Organizing Data Definition: A frequency distribution is the organization of raw data in table form, using classes and frequencies Definition: A frequency distribution is a table that shows elasses or intervals of data entries witha count of the number of entries in each class, The frequency fof'a class is the number of data entries in the class. Example of a Frequency Distribution ci Frequency, | lnthe frequency distribution shown atthe lef there ate six classes, The frequencies a: i for cach of the six classes are 5, 8, 6, 8, 5, and 4. Each class has a lowes is which is the least number that can belong to the class, and an upperclass limit, which is the greatest number that ean belong to the class. In the frequency distribution are 5, 10, 15, 20, 25, and 30. The class width is the distance between lower (or upper) limits of consecutive classes. For instance, the class width in the frequency distribution shown is6 ~ 1 = 5 5 = Gg shown, the lower class limits are 1, 6, 11, 16, 21, and 26, and the upper class limits = 3 Constructing a Frequency Distribution from a Data Set 1. Decide on the number of classes to include in the frequency distribution ‘The number of classes should be between 5 and 20, otherwise, it may be difficult to detect any patterns 2, Find theRange = highest value - lowest value Range Find the class wideh = oe 3. Find the class limits. You can use the minimum data entry as the lower limit of the first class, To find the remaining lower limits. add the class width to the lower limit of the preceding class. ‘Then find the upper limit of the first class. Remember that classes cannot overlap, Find the remaining upper class limits 4, Make a tally mark for each data entry in the row of the appropriate class S. Count the tally marks to find the total frequency f for each elass. Example 1: Suppose a researcher wished to do a study on the ages of the top 50 wealthiest people in the world The researcher first would have to get the data on the ages of'the people. In this case, these ages are listed in Fordes Magasine. When the data are in original form, they are called raw data and are listed next 49, S7, 38, 73, 81, 74, 59, 76,65, 69, 54, 56, 69, 68, 78, 65, 85, 49. 69, 61, 48, 81, 68, 37, 43, 67, 52, 56,81, 77, 79, 85, 35, 85, 59, 80,60, 71, 57, 61, 69, 61, 83, 90, 87, 74, 82, 43, 64,78 Construct a frequency distribution for the data with 8 classes. Solution:Range =90-35=55 , Class width =" = 6875 =7 Lower limit of first class = 35, Upper limit of first class = 35+ 7-1 = 41 Class Tally Frequency, 3541 Me 3 a8 Ut ¥ 19-55 it + 36-62 Ti 10 63-69) HHH 10 70-76 Fi 5 77-83 HH fit 10 84-90 Ht 5 Sum 0 NN 2 55\PDF Printer Definitions: 1+ lower boundary = Lower limit -0.5 , upper boundary = Upper limit + 05 Classes boundaries: is the column of'class from lower boundary up to upper boundary. 2. The midpoint of a class is the sum of the lower and upper limits of the class divided by two. ‘The midpoint is sometimes called the class mark Lower class limit er class limit Midpoint er cla: sre class limi 3. The relative frequency of a class is the portion or pereentage of the data that falls in that class To find the relative frequency of a class, divide the frequency fby the sample size n class frequency ff Samplesize =n 4. The cumulative frequency of a class is the sum of the frequencies of that class and all previous classes. The cumulative frequency of the last class is equal to the sample size n Relative frequency = Tn previous example: Find the Class boundary, Midpoint , Relative frequency and Cumulative frequenc Class] Frequency] Cs Midpoint | Relative frequency | Cemulative boundary frequency 35-41 345-415 [StU 3-006 0+3=3 415-485 = 34+3=6 ass-555 [St 64+4=10 555-625 10+ 10 = 20 63-69 625-695 |S 20410 = 30 z 70-76 695-765 = 304+5=35 TI-B3 165-835 i 35+10=45 S490 835-905 4545=50 Sum__[ E/=50 ‘Then the final table is ‘Cumulative Class | Frequeney Class frequene 3541 5 Less than 34.5 415-485 5 Less than 41.5 485-555 Less than 48.5 35.5- 62.5 i Less than $5.5 62.5- 69.5 Less than 62.5 69.5- 76.5 Less than 69.5 765— 835 Less than 76.5 835-905 Less than 83.5 Less than 90.5 Example 2: These data represent the record high temperatures in degrees Fahrenheit (F) for each of the SO states Construct a grouped frequency distribution for the data using 7 classes. 112, 100, 127, 120, 134, 118, 105, 110, 109, 112, 110, 118, 117, 116, 118, 122, 114, 114, 105, 109, 107, 112, 114, 115, 118, 117, 118, 122, 196, 110, 116, 108, 110, 121, 113, 120, 119, 111, 104, 111, 120, 113, 120, 117, 10S, 110, 118, 112, 114, 114, Solution Range =134—100=34 —, Class width == 4.957 =5 Lower limit of firstclass = 100 _, Upper limit of first-class = 10045 —1 Class Tally Frequency 700-104 a z 105-109 Te MD 10-1 Hitt th TD i TIS “ttt on B THz Ht Ui 7 125-129 ft 1 130-134 ft T Sum =30 Find Midpoint , Relative frequency and Cumulative frequency. Class Midpoint Relative | Cumulative ToO+ TOT _ 20T Class} Freaueney | sundary 100-104 2 995-1085 frequency _| frequency =0.04 o+2=2 105-109 8 1045-1095 ae 2+a=10 tie 10-14 18 1095-1145 10+18 = 28 Ts+i9 730 HS-119 1145-1195 28419 =41 120-124 nigsis | OFT 4147248 1245-1295 | S129 441249 130-134 1295-1645 ‘Then the final table is Cumulative Frequency a Midpoint | Relative frequency boundary, frequency Less than 99.5 0 =e ee fester ta 1095-1145 12 0.36 Hess than 105.5 10 TST TH S195 17 0.26 Less than. 120-124 1195-1243 Less than 125-129 1245-1295 ‘Less than 130-134 1295-1345 Less than aoe Less than easyPDF Printer GRAPHS OF FREQUENCY DISTRIBUTIONS Definitions: (frequency Histogram) A frequency histogram is a bar graph that represents the frequency distribution of a data set histogram has the following properties. 1, The horizontal scale is quantitative and measures the data values, 2. The vertical scale measures the frequencies of the classes. 3. Consecutive bars must touch. Because consecutive bars of a histogram must touch, bars must begin and end at class boundaries instead of class limits ® Class boundaries are the numbers that separate classes without forming gaps between them > [data entries are integers, subtract 0 5 from each lower limit to find the lower class boundaries > To find the upper class boundaries, add 0.5 to each upper limit > The upper boundary of a class will equal the lower boundary of the next higher class Construct a frequency histogram Step 1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the y axis is always the vertical axis Step 2 Represent the frequency on the y axis and the class boundaries on the x axis, Step 3 Using the frequencies as the heights, draw vertical bars for each class Example | Class | Frequency Class ‘Ages of Top 50 Wealthiest People boundary 3341 345-415 ak a1 5= 485 49-55 aR 5-555 36-62 355-625 63-69 625-695 70-76. 695— 165 45 45 8S SS ES SOs Aas Example 2 Class Class | Frequency | oundary 100-104 2 99.5-104.5 105-108) Tos 51095 110-114 409 5-145 UIS-119 M4 5-119.5 120-124 1195-1245 125-129 124 5-129.5 130-134 1295-1345 ‘Sum wy wae tae Tr eee Definitions: ( frequency Polygon ) ‘The frequency polygon is 2 graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the cl Remark:*The frequencies are represented by the heights of the points +4 frequency polygon isa line graph thal emphasizes the continuous change in frequencies. Construct a frequency Polygon Step 1 Find the midpoints of each class. Stop 2 Draw the x and y-axes. Label the x axis with the midpoint of each class, and then use a suitable scale on the y axis, for the frequencies. Stop 3 Using the midpoints for the x values and the frequencies asthe » values, plot the points. Step 4 Connect adjacent points with line segments Drawa line back to the x axts at the beginning and end of the graph, at the sume distance that the previous and next midpoints would be locates! Example 1 Class] Frequency | _ Midpoint Ages of Top 50 Wealthiest People aI 31 38 45 82 59 66 73 80 B7 4 Age Example 2: Class [Frequency | Midpoint 7 Record High Temperatures 100-104 105-109 10-114 1IS-119 120-124 125-129 130-134 e7 102 107 112 117 122 427 192 137 ‘Temperature “F Definitions: The Ogive Cumulative frequency Graph) “The egives a graph that represents tne cumulative frequencies forthe classes ina frequency distribution. Construct cumulative frequency Graph Step 1 Find the cumulative frequency for each class Step 2 Draw the.x and y-axes. Label the x axis with the class boundaries, Use an appropriate scale for the 1) axis to represent the cumulative frequencies. (Depending on the numbers in the cumulative frequency columns, scales such as 0, 1,2, 3,... ,or, 10, 15,20, ,or 1000, 2000, 3000, can be used. Do nof label the y axis with the numbers in the cumulative frequency column.) In this example, a scale of 0,5, 10, 15, will be used. Step 3 Plot the cumulative frequency at each upper class boundary Upper boundaries are used since the cumulative frequencies represent the number of data values accumulated up-to the upper boundary of each class Step 4 Starting with the first upper class boundary, connect adjacent points with line segments Example I: Ages of Top 50 Wealthiest People Class Cumulative Less than 41 5 Less than 48.5 Less than 55.5 Less than 62.5, Less than 69.5 aes ‘Cumulative frequency Less than 83.5, B45 415 485 555 625 695 765 835 905 Less than 90.5 Age Example 2: Record High Temperatures Cumulative frequency 60 Tess than 995 0 Ee Less than 109.5, “ Less than 114.5 a Less than 119.5 20 10 Class Less than 124.5 Less than 129.5 ol Less than 134.5 Cumulative frequency 995 1045 1095 1145 1195 1245 1296 1345 Temperature "F ions: The Pie Graph yh is a circle that is divided into sections or wedges according to the percentage of encies in each category of the distribution Construct a Pie Graph Step I Since there are 360° in a circle, the frequency for each class must be converted into a proportional part of the circle. This conversion is done by using the formula Degrees = f +360 Where = frequency for each class and = sum of the frequencies Hence, the following conversions are obtained. The degrees should sum to 360° Step 2 Each frequency must also be converted to a percentage. by using the formula% =4 + 100 Step 3 Next, using a protractor and a compass, draw the graph using the appropriate degree measures found in step I, and label each section with the name and percentages, Example 1: Class | Frequency Degrees Percentage 35-41 360 = 21.6 360 = 21.6 +360 = 28, +360 = 72 apt 360 = 72 60 = 36 T ay? 360 = 72 100 = 10% Sum |Z f=50 Then the final table is Class_[ Frequency: 3ST a 2-48 3 49-55 4 36-62 10 63-69 10 70-76 5 10 ‘Ages of Top 50 Wealthiest People 5 Yf= 50 Example 2: Class Frequen 100-104 2 105-109 10-114 1S-119 ey 100 = 26% 120-124 7 £100 = ay 100 = 14% 125-129 130-134 ‘Then the final table is ‘Class T0010 Frequency | Degrees. zi, Percentage 4% 105=109 5 57.6 16% TOL if 36% TS-H19 1B 26% 120-128 125-129 14% 2% 130-138 ‘Sum 256 100 Record High Temperatures 128-125, #2 134-130 104-100, ae) Example 3: Twenty-five army inductees were given a blood test to determine their blood type. The data set is A, B, B, AB, 0,0, 0, B, AB, B.B, B, 0, A, 0, A, 0, 0, O, AB, AB, A, 0, B, A 1- Construct a frequency distribution for the data. 2- Draw the Histogram, Polygon & Pie Graph Solution: Class Frequency, 360 =576 | 2+ 100 = 16% ye 1360 = 1296 | + 100 = 36% Blood types Histogram Sources of data: Data sources are classified as: + Primary Data Sources + Secondary Data Sources Primary Data Sources: The data which are collected for the first time by the investigators for theit wn purpose is known as primary dala. For example, to check the blood group of the students of a certain class, the principal asks the school doctors ta carry out the blood test on the students and make a record of it in their diary. The data so collected will be primary data for the school. Any one or a combination of the following methods can be chosen to collect primary data: i. Direct personal observations ii. Direct or indirect oral interviews iii, Administrating questionnaires Secondary Data Sources: When an investigator uses the data which has already been collected by others, such data are called secondary data. The difference between primary and secondary data is a matter of degree or relative only, The same data may be secondary in the hands of one, and primary in the hands of others. Thus, data are primary to the source who collects and processes them for the first time and are secondary for all other sources who later use such data. Sources of such data are as follows: i. Newspapers and business magazines fi, Government and Non-Government publications, such as Reserve Bank of India Bulletins, wholesale price index published by Ministry of Commerce and Industry, ete, iii, Intemational organizations which publish data, such as The Intemational Labour Organization publishes data for employment, unemployment, wages and consumer prices, The organization for Economic Cooperation and Development publishes data on foreign trade, and so on. Data: Types and Presentation 5 plants are alw: plants are always formed in ptr, then only even integers sre possible values of the variable that may only ofnunber of wings to number of legs of inscets is 2 discrete iespectiveh)\= @ value of 0, 0.3833... or 0.6686... (Le Bs Be OF Ratio-, interval-, a atio-, -, and ordinal-scal i i Nominal sale date by their nature are jena either contin oF dere 2 ACCURACY AND SIGNIFICANT FIGURES Accuracy is the neamess of a 3 * measurement to the tru measured. Precision is not a synonymous term but refers 10 # value of the variable being the closeness 10 €: llustrates the differ Other of repeated measurements of the same quantity. Figure 2 il sion of measurements. cence between accuracy and prec (eo) FIGURE 1: Accuracy end precision of measurenert ‘A-ptogram animal ts weighed 10 times The 10 Sei oerurate and precise; those in sample () relatively id those of sample in sample (a) 3f© fe but not accurate; 30 In those of sample (c) are relatively Precis ind imprecise. measurements shown accurate but not preci Gd) are relatively inaccurate a sist in the recording of data, For example, 0 person may mis- ds in a tract of land or misread the numbers on a heart-rate My sbtain correct data but record them in such a way (peT- ta subsequent data analyst makes an erro in reading ‘errors have not occurred, but there are other aspects ‘Human error may count the number of bi monitor. OF, @ pel haps with poor handwriting) that re We shall assume that such of accuracy that should be considered. Rseuracy of measurement cal) Be expressed in numerical reporting. If we report the number 8 (a value of a that the hind leg of a frog is 8cm long, we are stating that Buus variable) as an estimate Of ‘he frog's true leg length. This estimate was continname some sort of # measuring devi "Hod the device been capable of more made wing Sigh have declared that the eg es 8.3 om long, or perhaps 8.32 cm pe When recording veces of continuous variables, itis important to designate the tong, We gr whch the measurements have eer made. By convention, the value goat a moasurement in te Tange ory 50000... 10 849999..., the value 8.3 8 dena range of 823000. wo S059. the value 832 implies that th designates hin te range of 831500... f0 632489. That i he reported tra value oydpont ofthe implied range, and the sia of tis ran Se wae areca place the measurement The vale of Bem Ml The elipsis marks (...) may be read as > ir repeating decimal fractions which could a eae Here they indiate thet @ and @ are ve been written as 0,3333333333333 ... and 0. respectively. ze ea Data: Types and Presentation determine length within a range of 1 em, 83 cm implies a rango of 0.1 em, and 8.32 em implies a range of 0.01 em. Thus, to record a value of 8 implies greater accuracy sy measurement than docs the recording of a value of 8 for in the first instance in true value is said to lie between 7.95000 ... and 8.049999... (Le, within a range 0.1 em), whereas 8 implies a value between 7.50000... and 8.49999... (.en within range of 1 em). To state 8.00 cm implies a measurement that ascertains ihe frog's limb length to be between 7.99500 ... and 8,00499 ....em (i.e, within a range of Oot cm) Those digits in a number that denote the accuracy of the measurement are referred to as significant figures. Thus, 8 has one significant figure, 8.0 and 83 each hes two significant figures, and 8.00 and 8.32 each have three Jn working with exact values of discrete variables, the preceding considerations do Rot apply. That is it is sufficient to state that our frog has four limbs oF the its left lung contains thirteen flukes. The use of 4.0 or 13,00 would be inappropriate, for as the But there are instances where significant figures and implied aceuracy come into play with discrete data. An entomologist may report that there are 72,000 moths in a particular forest area. In doing so, it is probably not being claimed that this is the fAact number but an estimate of the exact number, perhaps accurate to te significant figures In such a case, 72,000 would imply a range of accuracy of 1000, so that the true value might lie anywhere from 71,500 to 72,500. If the ‘entomologist wished to convey the fact that this estimate is believed to be accurate to the nearest 100 (i.c., to three significant figures), rather than to the nearest 1000, it would be better to present the data in the form of scientific notation,* as follows: If the number 7.2 x 104(= 72,000) is written, a range of accuracy of 0.1.x 104(= 1000) is implied, and the true value is assumed to lie between 71,500 and 72,500, But if 7.20 x 10! were written, a range of accuracy of 0.01 x 104(= 100) would be implied, and the true value would be assumed to be in the range of 71,950 to 72,050. Thus, the accuracy of large values (and this applies to continuous as well as discrete variables) can be expressed suecinetly using scientific notation. Calculators and computers typically yield results with more significant figures than are justified by the data. However, it is good practice—to avoid rounding error—to retain many significant figures until the last step in a sequence of calculations, and on attaining the result of the final step to round off to the appropriate number of figures. BASIC CONCEPT OF SAMPLE SURVEYS A sample survey is. i A parle sure fs 9 method of rwng a nee abot the chats of poplin fing a part of the population F e ference savers by opi ‘or example, when one has to make an infere about ge fot and is not prostate to examine eal Sea one oF us keenly fakes help of sample surveys, that iso say one examines ony afew member af the Tot and, on the bass of ts sample information one makes decisions about he whol ITs Ps on we oranges may ex Ww that basis make his purchase basket of oranges may examin fev oranges fromthe bk and on ha ass mak his Such methods are extensive Such methods ar extensialy ts by goverment bas thoughout the word fe sy different characteristics of national ezenomy as are required for taking deisions regarding the ingen of en, Hatin of es a a eee and fr sag end foes of areas ii crate orertnton of ild tse crengs under dient cops, ner of unemployed. persons in the labour forces, constuction of cost of living indices for person: different professions and so on, “em Sample survey techniques are extensivel: y are extensively used in market research surveys for assessing the preferential pattern of consumers for different types of products, the terial demand for niew product which a company wishes to introduce, scope for any diversification in the production schedule, and so on, ble because we may have limited resourees in terms of money Thus, sampling may become unavoidel convenience. and /‘or man hours, or it may be preferred because of practical ‘Sampling is first broadly classified as Subjective and Objective. on the personal judgment or discretion of the sampler himself rethhod which is fixed by a sampling rule or is independent of sampling. ‘Any type of sampling which depends up is called Subjective. But the sampling m the sampler’s own judgment is Object Non-probabilistic, Probabilistic ond there is a fixed sampling rule but there is no probability mth individual from a list. If, however, tne that each of the first 10 gets an equal chance adividual there is a definite pre- In noneprobabilistie objective sampling, attached to the mode of selection, eg. selecting every 5 Selection of the first individual is made in such a manner? SF being selected, it becomes a ease of mixed sampling, for each in Sssigned probability of being selected, the sampling is said to be probabilistic, lements, on which observations ascertained according to 2 Well a period of Iris an element or a group of el 1 information car be family, household, farm, factor, Tee, Elementary unit or simply unit an be made or from which the required statistical defined procedure. examples of unit are person. time such as an hour, day ete. ‘The collection of al 1¢ is termed as a population or unvers fomobiles in a region or a population of trees nite population or an infinite population ac 1 units ofa specified type ina given region a a particular point oF ‘se, For example, « population of persons, families, Population: or a birds ina forest etc a period of | farms, cattle, houses OF aul ‘A population is said to be fi cording to as the number of ii finite or infinite. Sampling units: Elementary units oF grows of Rentifiable and observable, are convenient for pUrposts ‘aample, ina family budget enquiry usually @ family considered as a sampling unit, since it is seat be convenient fe sampling for ascertaining the required information. In a crop survey, @ enmara eroup of farms owned oF operated by a household may Pe considered as the sampling units. uni such units, which, besides being clearly defined, of sampling, are called sampling units. For Sl PO khan us Ming method: i Sampling frame: or using sampting methods in the coffection of data, it is essential to fave a frame of all the sampling units belonging to the population to be studied with their proper identification particulars and such a frame is calted the sampti is est of units saith their identification particulars, mpling frame. This may be a list of unin As the eampling frame forms the basic material from which a sample is drawn, it should be insueed hat the fiame contains all the sampling units of th an ff ang other population. he population under consideration but excludes units 2 A sample isa subset of a pop sine Nee Be tee ef 2 Population: seleced yo obuin Informalon eamosrlog the z stati other words. ene of more sampling unite selected from a Rondon sample: A random or probabil rabability sampte is a sample drawn in such a manner that eact unit in he population has a predetermined ony aeietog en” Neh a manner at en Estimator: An estimator is a statistic obtained hy a specified procedure for estimating population parameter. The estimator is a random variable, as its value differs from sample to 5 samples are selected with specified probabilities oe Female anes The particular value. which the estimator takes for a given sample, is Known as an estimate ‘The difference between the estimator (1) and the parameter (6 is called error. An estimator (1) if said to be unt biased. Thus bias is given by EX) = Bay sed estimator for the varameter (@) if, Bir) =@, otherwise ‘The mean of squares of error taken from @ is called mear aquare error (MISE). Mathervati aily it MSEin) = Bar 0? “The AEG; may be considered to be a measure of acct: ey with which the ewimatey ¢ estimates re parwacie 6 ‘The expected value of the squared deviation of 11¢ estimanoe from its expected vale Sie sampling Suriance, 1 is. meas diverge” ce of the estimator from ify expected valve an be giver by rns Be Be? rhs measure uf variability may be termes the wrecisow of the etimator | he relntion between MSE and sampling variance or Between 207% aed? objained msec = Br 0)? = BY Bury Bi -OF # A puny! Lew -0F ane Ale = BEL erwin? en ce and the square of the Dita, Howenst ‘of the sampling variance a 1 MSE aw sasopling vartance ate randard eerue of the etisoalr This shows that AASE of 7 is the 9 Jf 7 is an untvased estemator oF ar of the sampling ¥# Aarne ts Ketaned he squar The ratio of the standard et relative standard errar oF the The eatiestio + of all possible = pected vatug of the extimator is knows 4 ror {the estimator to the expected val ‘of vasiation of Uae catia sefticie sequcece, set 9 called the sample soe ‘Sample space c Basic concept of samples Surveys , i Sampling design: nat calletia: cee. dein pai tion of the sample space and the associated probability measure is samples is . For example, let N'=4. 2 =2 and the probability of selection for different S 5) ae { Sample | ay | a9 | ay @ en an [Probabitity | V6 | 6 | 6 6 v6 16 The above table gives the sampling design. Sampling and complete enumeration “The total count of all units of the population for a certain characteristics is Known as complete ae venation. also termed census survey. The money, man-power and tims eequited for carryine eM! complete enumeration will generally ‘be large and there are many situations sith limited means st cembete enumerstion will not be possible, where recourse to sletion of few wns Wt be helpfl Soa prorated samples is selected ffom the populcn and examin, itis called sample enumeration or saraple survey. ibe less expensive then a census does not imply that economy is the only Mest important that a degree of accuracy of results is es saece of sample survey i applied to verify thatthe results oss et the ‘advantages or merits af ‘over census survey ni be outlined survey and the desired information sill consideration in co: ducting a census surveys. Ther i) Recuced cost of survey, Fy Greater speed of getting results, fii) Greater accuracy af results, ix) Greater scope. 3c tages of sampling over comple rape survey has its own imitations and the ads bp derived omy i) the units are drav propriate sarapling recinique sect in the sample is adequate. savin a scientific manner is used, and s sclect Basic principles of sample surveys “Two basi principles for sample surveys are i) Validity “Te principle of optimization takes Ito seeount the factors of a} Efficiency tthe results could be interpreted by selecting a probbabitity samples lity for each individual of (ne le should be so selected that ciple will be satisfied ite, pre-issigned probabil By validity, objectively in terms of pr which ensures that there population. Efficiency is measured by some defi the inverse of the sumple variance ofthe estimator. Neg Cost is measured by the expenditure incurred in terms of money or man-hours. The Principle of colimization insures that a given level of efficiency will be reached with minimum cost or tat in, ‘maximum possible efficiency will be attained with a given level of cost. Sampling and non-sampling errors ‘The error which arises due to only a sample (a part of population) being used to estimate the Population parameters and draw inferences about the population is termed sampling error or sampling fluctuation, Whatever may be the degree of cautiousness in selecting a samples there will always be a difference between the parameter and its corresponding estimate, This eivor is inherent and unavoidable in any and every s ampling scheme. A sample with the smallest sampling error will always be considered a good Tepresentative of the populatior mn. This error ca the size of the sample (number of units selected in the samp is inversely proportions graphically as below on | \ Ss Sampling error n be reduced by increasing Ic). In fact, the decrease in sampling ervor {0 the square root of the sample size and the relationship can be examined ‘When the sumple survey becomes a census survey, the sampling error becomes zero. Non-sampling error The non-sampling errors primarily arise at the following stages: Observational errors du to defeetive measurement technique itty sts introduced in editing, coding and tabulating the results id the sample survey. tn 'Non-ampling rors ar present in beth the complete enumeration survey a the ape suey. In ractice, che census survey results may sulfer from non-sampling erors although ay be or. The non-sampling error is likely to inerense with inerease in sane size ror decreeses with increase in sample size -inpling. ‘SIMPLE RANDOM SAMPLING ple of size n out ofa hes it of a finite populati 5 ane ul Population of size N in which e random sampling, ‘al chance of being selected is called randuvy ata be ‘ei oe ing or simple We may havi sti 'y have two distinct types of simple random sampling as follows: n° gf A Simple random sampling with replacement (srswr), ‘Simple random sampling without replacement (orswor', Simple random sampling with replacement (srswr) In sampling with replacement a unit is it fs selected from the population consisting of WV unit, its conten noted and then retumed to the population before (an thence the next draw is made, and the p r ol and the process is repeated n ‘Nimes to give a sample of 7 units. In this method, at each draw, each ofthe NV units ofthe population ate Gets the same probability +- of being selected. Here the same unit f the population may occur more ‘than once in the sample (order in which the sample units are obtained is regarded), There are 1" samples, and each has an equal probability of being selected. ¥F Note: If order in which the sample units are obtained is ignored (unordered), then in such ease the number of possible samples will be oT a Ney NC Ny tty 2) Z. Simple random sampling without replacement (srswor) Suppose the population consist of units, then, in simple random sampling without replserent Se es entent nied an th nit nt eed othe poplin er Su nade, The process is repeated 7 times to give 2 sample of units, 1 otbeing yore drawing, each of the r+ unis of the population gets the samme probability rawing, included mple, Here any unit ofthe population cannot gecur mors than ons in the sample Juded in the samplt yy unit of vannot inel s eas an equal probability NG, possible samples, and each such samp (order is ignored). There are “Cy, possible sam | of being selected. . For a population of size N=5 with valu 1, 3.6, 8 and 9 make list of all possible of size N=5 with vi . pul samples of size "= ‘thols [srs (amore) and srswoF Te hod (wu 3 by both the m er of possible samples will P= olution: By the sampling wr the numb ofp jth are as follows: Seq 4 5(e8C)) = 35, which are as . 3,6, {1s 3.8) (13:9) (60) O66 3,8), 3,9 r 6,68: 6 8G. 818), 6, 8,9), (6.9, B88) Bh No, enced te tet C2) = “a ' 9), 1, 3,3), 08 4,156) LBC LOSE C1 Ds ee Boy, 1,9, 9) CoB 33 Os wee. soe. FF 66,666.86, 6.9) 3, 8, 8), (3, By 7) Cor & 9,9), (9.99) .e number of possible samples Wi pe 8 Cy=5C3 #10, which areas follove ber of possible samples will be" Cw (C3 =10, whi th By the sampling W2"» ray 06s C8 0),66,8)s 0, 6,9-G, 8,996.89) (1,3, 13 8), (1,3, 9)»

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy