Unit 4 V Statistics
Unit 4 V Statistics
Statistics
Statistics
• Statistics is the science of analyzing, reviewing and conclude
data.
• Some basic statistical numbers include:
• Mean, median and mode
• Minimum and maximum value
• Percentiles
• Variance and Standard Deviation
• Covariance and Correlation
• Probability distributions
• The R language was developed by two statisticians. It has many
built-in functionalities, in addition to libraries for the exact
purpose of statistical analysis.
Data Set
• A data set is a collection of data, often presented in a table.
• There is a popular built-in data set in R called "mtcars" (Motor Trend Car Road Tests),
which is retrieved from the 1974 Motor Trend US Magazine.
Information About the Data Set
• You can use the question mark (?) to get information about the mtcars data set
• # Use the question mark to get information about the data set
?mtcars
Get Information
• Use the dim() function to find the dimensions of the data set,
and the names() function to view the names of the variables
Get Information
•Use the rownames() function to get the name of each row in the
first column, which is the name of each car
Statistics
• From the examples above, we have found out that the data set
has 32 observations (Mazda RX4, Mazda RX4 Wag, Datsun 710, etc)
and 11 variables (mpg, cyl, disp, etc).
• A variable is defined as something that can be measured or counted.
• Here is a brief explanation of the variables from the mtcars data set
Statistics
• Print Variable Values
• If you want to print all values that belong to a variable, access
the data frame by using the $ sign, and the name of the variable
(for example cyl (cylinders))
• Sort Variable Values: To sort the values, use the sort() function