0% found this document useful (0 votes)
5 views2 pages

Exp 9

The document outlines an experiment focused on data visualization using the Titanic dataset, specifically plotting a box plot to analyze the distribution of age by gender and survival status. It explains the importance of analyzing numerical data and provides definitions and explanations for histograms, distplots, and boxplots. The conclusion emphasizes the need to draw inferences from the statistical analysis performed.

Uploaded by

chincholkar.sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views2 pages

Exp 9

The document outlines an experiment focused on data visualization using the Titanic dataset, specifically plotting a box plot to analyze the distribution of age by gender and survival status. It explains the importance of analyzing numerical data and provides definitions and explanations for histograms, distplots, and boxplots. The conclusion emphasizes the need to draw inferences from the statistical analysis performed.

Uploaded by

chincholkar.sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Experiment No.

Aim: Data Visualization II


Problem Statement:
1. Use the inbuilt dataset 'titanic' as used in the above problem. Plot a box plot for distribution of
age with respect to each gender along with the information about whether they survived or
not. (Column names : 'sex' and 'age')
2. Write observations on the inference from the above statistics.
Theory:

Numerical Data

Analyzing Numerical data is important because understanding the distribution of variables helps
to further process the data. Most of the time you will find much inconsistency with numerical
data so do explore numerical variables.

1) Histogram
A histogram is a value distribution plot of numerical columns. It basically creates bins in various
ranges of values and plots them where we can visualize how values are distributed.

2) Distplot
Distplot is also known as the second Histogram because it is a slightly improved version of the
Histogram. Distplot gives us a KDE(Kernel Density Estimation) over histogram which explains
PDF(Probability Density Function) which means what is the probability of each value occurring
in this column.

3) Boxplot

Boxplot is a very interesting plot that basically plots a 5 number summary. to get a 5 -number
summary of some terms we need to describe.

● Median – Middle value in series after sorting


● Percentile – Gives any number which is number of values present before this percentile
like for example 50 under 25th percentile so it explains total of 50 values are there below
25th percentile
● Minimum and Maximum – These are not minimum and maximum values, rather they
describe the lower and upper boundary of standard deviation which is calculated using
Interquartile range(IQR).

IQR = Q3 - Q1
Lower_boundary = Q1 - 1.5 * IQR
Upper_bounday = Q3 + 1.5 * IQR
Here Q1 and Q3 are 1st quantile(25th percentile) and 3rd Quantile(75th percentile).

Conclusion: Hence we have observed that inference from given statistics.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy