0% found this document useful (0 votes)
44 views4 pages

Ex1a & 1b

The document describes analyzing the diamond dataset using exploratory and descriptive statistics methods in R. The diamond dataset contains prices and attributes of over 54,000 diamonds. Descriptive statistics and data visualization techniques are used to analyze price distribution, relationship between price and cut/color/carat, and more. Key steps include importing data, viewing summaries, and creating histograms, boxplots, and scatterplots to explore patterns in the data.

Uploaded by

AQ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views4 pages

Ex1a & 1b

The document describes analyzing the diamond dataset using exploratory and descriptive statistics methods in R. The diamond dataset contains prices and attributes of over 54,000 diamonds. Descriptive statistics and data visualization techniques are used to analyze price distribution, relationship between price and cut/color/carat, and more. Key steps include importing data, viewing summaries, and creating histograms, boxplots, and scatterplots to explore patterns in the data.

Uploaded by

AQ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

15UCS707 – Data Science Laboratory

Ex.No.1a Basic Data Analytic Methods using R

Aim:
To study about various R Commands and their purpose.

Description:

Technically R is an expression language with a very simple syntax. It is case


sensitive as are most UNIX based packages, so A and a are different symbols and would refer
to different variables. The set of symbols which can be used in R names depends on the
operating system and country within which R is being run (technically on the locale in use).
Normally all alphanumeric symbols are allowed2 (and in some countries this includes accented
letters) plus ‘.’ and ‘_’, with the restriction that a name must start with ‘.’ or a letter, and if it starts
with ‘.’ the second character must not be a digit. Names are effectively unlimited in length.
Elementary commands consist of either expressions or assignments. If an expression is
given as a command, it is evaluated, printed (unless specifically made invisible), and the value
is lost. An assignment also evaluates an expression and passes the value to a variable but the
result is not automatically printed.
Commands are separated either by a semi-colon (‘;’), or by a newline. Elementary
commands can be grouped together into one compound expression by braces (‘{’ and
‘}’). Comments can be put almost 3 anywhere, starting with a hashmark (‘#’), everything to the
end of the line is a comment. If a command is not complete at the end of a line, R will give a
different prompt, by default + on second and subsequent lines and continue to read input until
the command is syntactically complete. This prompt may be changed by the user Command
lines entered at the console are limited4 to about 4095 bytes (not characters).

Basic Commands and Purpose

Command: help()
Purpose : Obtain documentation for a given R command

Command: str()
Purpose : Display internal structure of an R object

Command: c(), scan()


Purpose : Enter data manually to a vector in R
Command: seq()
Purpose : Make arithmetic progression vector

Command: View()
Purpose : View dataset in a spreadsheet-type format

Command: read.csv(), read.table()


Purpose : Load into a data.frame an existing data _le

Command: dim()
Purpose : See dimensions (# of rows/cols) of data.frame

Command: length()
Purpose : Give length of a vector

Command: names()
Purpose : Lists names of variables in a data.frame

Command: hist()
Purpose : Command for producing a histogram

Command: barplot()
Purpose : Produces a bar graph

Command: barchart()
Purpose : Lattice command for producing bar graphs

Command: boxplot()
Purpose : Produces a boxplot

Command: Plot()
Purpose : Produces a scatterplot

Command: sum()
Purpose : Add up all values in a vector

Command: cut()
Purpose : Groups values of a variable into larger bins

Command: mean(), median()


Purpose : Identify \center" of distribution

Command: summary()
Purpose : Display 5-number summary and mean

Result:
Thus the various commands in R has been executed successfully.
Ex.No.1b Basic Data Analytic Methods using R

Aim:
To analyze the diamond dataset using Exploratory and Descriptive Statistics Methods.

Problem Statement:

The diamond dataset contains the prices and other attributes of almost 54,000 diamonds and is
included in the ggplot2 package. Use the methods of descriptive statistics and exploratory
analysis, to visualize and analyze the data.
Content
price price in US dollars (\$326--\$18,823)
carat weight of the diamond (0.2--5.01)
cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color diamond colour, from J (worst) to D (best)
clarity a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1,
VVS2,VVS1, IF (best))
x length in mm (0--10.74) ,y width in mm (0--58.9)
z depth in mm (0--31.8) ,depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) ,
table width of top of diamond relative to widest point (43--95)

Algorithm:

1: Import the data

2. read data into R

3. Look at the data

4. Look at the summary of the data

5: Print the entire dataset to the screen

6. Using visualization techniques on the training data.

7. Plot the graph

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3

data(diamonds)
View(diamonds)
nrow(diamonds)
ncol(diamonds)
levels(diamonds$color)

subset(diamonds, price == max(price))


subset(diamonds, price == min(price))
summary(diamonds$price)
mean(diamonds$price)
median(diamonds$price)
summary(diamonds)

ggplot(data=diamonds) + geom_histogram(binwidth=500, aes(x=diamonds$price)) + ggtitle("Diamond


Price Distribution") + xlab("Diamond Price U$") + ylab("Frequency") + theme_minimal()

ggplot(diamonds, aes(factor(cut), price, fill=cut)) + geom_boxplot() + ggtitle("Diamond Price according


Cut") + xlab("Type of Cut") + ylab("Diamond Price U$") + coord_cartesian(ylim=c(0,7500))

ggplot(diamonds, aes(factor(color), (price/carat), fill=color)) + geom_boxplot() + ggtitle("Diamond Price


per Carat according Color") + xlab("Color") + ylab("Diamond Price per Carat U$")

ggplot(diamonds, aes(x = depth, y = price)) + geom_point(alpha = 1/20)

ggplot(diamonds,aes(x=carat,y=price))+ geom_point(color='blue',fill='blue')+
xlim(0,quantile(diamonds$carat,0.99))+ylim(0,quantile(diamonds$price,0.99))+
ggtitle('Diamond price vs. carat')

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy