EDA (Omkar Mane 67)
EDA (Omkar Mane 67)
November
May, 202322, 2022 2
Read , head , info the file.
read_csv (or read_excel, read_sql, etc.):
• Function: Reads data from various file formats
(CSV, Excel, SQL database, etc.) and creates a
DataFrame.
• Usage: pd.read_csv('file_path.csv') for CSV files.
Similar functions exist for other formats.
• Example: df = pd.read_csv('data.csv')
head:
• Function: Displays the first few rows of the
DataFrame.
• Usage: df.head(n), where n is the number of rows
to display (default is 5).
• Example: df.head(), df.head(10)
info:
• Function: Provides concise summary information
about the DataFrame, including column names,
data types, non-null counts, and memory usage.
• Usage: df.info()
November
May, 202322, 2022 3
• Central Tendency Measurement .
• Mean (Average):
• Definition: The sum of all values divided by the total
number of values.
• Advantages: Easy to calculate, sensitive to every value.
• Median:
• Definition: The middle value of a sorted dataset.
• For an odd number of values, it's the middle value.
• For an even number of values, it's the average of the two
middle values.
• Advantages: Not influenced by outliers, better for skewed
distributions.
• Mode:
• Definition: The most frequent value in a dataset.
• Advantages: Easy to understand and compute, applicable to
any type of data.
• Disadvantages: Not unique, may not exist or be meaningful
in some datasets.
November
May, 202322, 2022 4
Plotting bar chart on basis of gender and states.
November
May, 202322, 2022 5
Skewness and Kurtosis
• Skewness: Skewness measures the
asymmetry of the distribution of a
variable. A skewness value of 0
indicates a perfectly symmetrical
distribution. Positive skewness
indicates a right-skewed distribution
(tail on the right), while negative
skewness indicates a left-skewed
distribution (tail on the left).
• Kurtosis: Kurtosis measures the
peakedness or flatness of the
distribution of a variable compared to
a normal distribution.
November
May, 202322, 2022 6
Histograms and Scatter Plots.
November
May, 202322, 2022 7
Conclusion
• Married women age group 26-35 yrs from UP, Maharastra and Karnataka
working in IT, Healthcare and Aviation are more likely to buy products from
Food, Clothing and Electronics category
November
May, 202322, 2022 8
Thank You