0% found this document useful (0 votes)
6 views6 pages

Sma Exp4 Ayu

The document outlines the importance of Exploratory Data Analysis (EDA) and data visualization in business analytics, emphasizing their roles in understanding data structures and uncovering insights. It discusses various visualization techniques such as line charts, bar charts, pie charts, and scatter plots, along with Python libraries like Matplotlib and Seaborn for implementing these visualizations. The conclusion highlights that EDA and data visualization are crucial for informed decision-making in businesses.

Uploaded by

ayushiijii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Sma Exp4 Ayu

The document outlines the importance of Exploratory Data Analysis (EDA) and data visualization in business analytics, emphasizing their roles in understanding data structures and uncovering insights. It discusses various visualization techniques such as line charts, bar charts, pie charts, and scatter plots, along with Python libraries like Matplotlib and Seaborn for implementing these visualizations. The conclusion highlights that EDA and data visualization are crucial for informed decision-making in businesses.

Uploaded by

ayushiijii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SMA EXPERIMENT NO.

4
Roll No.: B856
Date:
Aim: To study exploratory data analysis (EDA) and visualization of Social Media Data for business using
python like histogram, line chart, pie chart, scatter plot
Theory:
Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is an essential step in the data analysis process that involves examining,
summarizing, and visualizing data to understand its structure, detect patterns, and uncover hidden insights. It
serves as the foundation for data-driven decision-making and is commonly used in business analytics, machine
learning, and statistical modeling.
EDA is not just about looking at numbers; it is about understanding the story behind the data. It helps analysts
identify errors, missing values, outliers, and relationships between variables, ensuring that the data is clean
and ready for further analysis. Without a thorough EDA process, analysts risk making inaccurate assumptions
or drawing misleading conclusions.

Importance of EDA
EDA plays a critical role in transforming raw data into meaningful information. It allows businesses to make
informed decisions by answering key questions about the dataset. For example, in the retail sector, EDA can
help businesses understand customer buying patterns, sales trends, and product demand fluctuations. In
finance, it can help detect fraudulent transactions, while in healthcare, it can identify risk factors for diseases.
EDA is particularly valuable for:
• Identifying missing or inconsistent data
• Detecting outliers and unusual patterns
• Understanding data distributions
• Examining relationships between different variables
• Validating assumptions before applying predictive models

Data Visualization
Data visualization is the graphical representation of data, making it easier to interpret Data visualization is the
graphical representation of data and information. It involves using visual elements such as charts, graphs, and
maps to make complex data more accessible, understandable, and useful for decision-making. In the modern
world, where businesses and organizations generate vast amounts of data, visualization plays a crucial role in
transforming raw numbers into meaningful insights.
The human brain processes visual information much faster than raw text or numerical data. This makes data
visualization an essential tool in data analysis, as it allows stakeholders to quickly grasp trends, patterns, and
anomalies that might be difficult to detect in spreadsheets or databases.

Importance of Data Visualization


Data visualization is an integral part of data analysis and business intelligence. It helps in identifying trends,
understanding relationships between variables, and detecting outliers. Organizations use visualization
techniques to improve decision-making by presenting data in a way that is easy to interpret and analyze.
For instance, in the business world, sales reports presented in bar charts or line graphs allow executives to
quickly see how revenue has changed over time. Similarly, marketing teams use pie charts to analyze customer
demographics, while finance departments use heatmaps to track financial performance across different
regions.

B856
Types of Data Visualization
Different types of data visualization methods are used depending on the nature of the data and the insights
needed. Some of the most commonly used visualization techniques include:
1. Line Charts
Line charts are used to display trends over time. They connect data points with a continuous line, making it
easy to observe upward or downward trends. Businesses often use line charts to track monthly sales, stock
market movements, or website traffic over time.
2. Bar Charts
Bar charts represent categorical data with rectangular bars, where the length of each bar is proportional to the
value it represents. These charts are useful for comparing different categories, such as sales performance across
various regions or customer preferences for different products.

4.1 Types of Data Visualization

3. Pie Charts
Pie charts divide data into slices, showing the proportion of each category in a dataset. They are commonly
used in market share analysis, financial reports, and customer segmentation studies. However, pie charts
should be used carefully, as too many slices can make interpretation difficult.
4. Scatter Plots
Scatter plots are used to show relationships between two numerical variables. Each point in the plot represents
an observation, helping analysts identify correlations or patterns. For example, a scatter plot can show how
advertising spend is related to sales revenue.
5. Histograms
Histograms display the distribution of numerical data by dividing it into bins or intervals. This helps in
understanding the spread and shape of the data, making it useful for analyzing customer age distribution,
income levels, or exam scores.
6. Box Plots (Box-and-Whisker Plots)
Box plots summarize data distributions and highlight key statistical measures such as the median, quartiles,
and outliers. They are particularly useful for comparing multiple datasets and identifying unusual values.

B856
Data Visualization Tools and Libraries
Several tools and libraries are available for creating data visualizations. Python, one of the most popular
programming languages for data analysis, offers powerful libraries for visualization, including:
• Matplotlib: A foundational library for creating static, animated, and interactive visualizations.
• Seaborn: Built on Matplotlib, it provides advanced statistical visualization with aesthetically pleasing
themes.
• Plotly: Used for interactive and dynamic visualizations, making it ideal for web applications.
• Bokeh: Specialized in interactive and web-based visualizations.

Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv(r"C:\Users\Ayushi\Desktop\submissions\SMA\tiktok_dataset.csv")

print("Dataset Info:")
print(df.info())

print("\nFirst 5 rows of the dataset:")


print(df.head())

print("\nSummary Statistics:")
print(df.describe())

print("\nMissing Values:")
print(df.isnull().sum())

plt.figure(figsize=(10, 6))
sns.histplot(df['video_duration_sec'], bins=30, kde=True)
plt.title('Histogram of Video Duration (seconds)')
plt.xlabel('Video Duration (seconds)')
plt.ylabel('Frequency')
plt.show()

df_line_graph = df.head(1000)
plt.figure(figsize=(10, 6))
plt.plot(df_line_graph.index, df_line_graph['video_view_count'], marker='o', linestyle='-')
plt.title('Line Chart of Video Views Over Time (Limited Data)')
plt.xlabel('Index (or Time)')
plt.ylabel('Video Views')
plt.show()

claim_status_counts = df['claim_status'].value_counts()
plt.figure(figsize=(8, 8))
plt.pie(claim_status_counts, labels=claim_status_counts.index, autopct='%1.1f%%', startangle=140)
plt.title('Pie Chart of Claim Status Distribution')
plt.show()
B856
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['video_like_count'], y=df['video_share_count'], hue=df['verified_status'])
plt.title('Scatter Plot of Video Likes vs. Video Shares')
plt.xlabel('Video Likes')
plt.ylabel('Video Shares')
plt.show()

plt.figure(figsize=(10, 6))
sns.boxplot(x=df['author_ban_status'], y=df['video_view_count'])
plt.title('Box Plot of Video Views by Author Ban Status')
plt.xlabel('Author Ban Status')
plt.ylabel('Video Views')
plt.show()

numeric_df = df.select_dtypes(include=['number'])

plt.figure(figsize=(12, 8))
sns.heatmap(numeric_df.corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix Heatmap")
plt.show()

top_videos = df.nlargest(10, 'video_view_count')

plt.figure(figsize=(12, 6))
sns.barplot(y=top_videos['video_id'], x=top_videos['video_view_count'], palette="viridis")
plt.title('Top 10 Videos by View Count')
plt.xlabel('View Count')
plt.ylabel('Video ID')
plt.show()

Output:

B856
B856
Conclusion
EDA and data visualization are essential for business analytics, providing valuable insights that drive decision
making. By using Python’s powerful libraries, businesses can analyze data efficiently, identify trends, and
improve performance.
B856

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy