0% found this document useful (0 votes)
2 views77 pages

M05 Lecture Notes

The document provides an overview of data visualization, detailing its importance in simplifying complex data, revealing patterns, and supporting decision-making. It covers various types of visualizations for numerical, textual, and geospatial data, including charts and maps, and discusses tools like the Plotly graphing library for implementation. Additionally, it includes examples of visualizing influencer marketing data and restaurant reviews, emphasizing the significance of data preprocessing and analysis.

Uploaded by

Berly Brigith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views77 pages

M05 Lecture Notes

The document provides an overview of data visualization, detailing its importance in simplifying complex data, revealing patterns, and supporting decision-making. It covers various types of visualizations for numerical, textual, and geospatial data, including charts and maps, and discusses tools like the Plotly graphing library for implementation. Additionally, it includes examples of visualizing influencer marketing data and restaurant reviews, emphasizing the significance of data preprocessing and analysis.

Uploaded by

Berly Brigith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Visualization

© Faculty of Management
Data Visualization
Data visualization is the graphical
representation of information and data.

• Simplifies Complex Data


• Reveal Patterns and Trends
• Supports Decision Making
• Enhance Communication
Data Visual
Representation
Visualization Type
• Numerical Data
• Textual Data
• Geospatial Data
Visualizing Numerical Data

Scatter Plot Line Charts

Bar Charts Pie Charts


Bar Chart: Numerical Data
US Influencer Marketing
Spending by Platform

Bar chart showing influencer


marketing spending for Instagram,
Tiktok, YouTube, Facebook, and
Snapchat for 2023 and 2024.

Source: Samet, A. (2023, September 18). Instagram leads influencer marketing, even as marketers spread budgets across social channels. eMarketer.
https://www.emarketer.com/content/instagram-leads-influencer-marketing-even-marketers-spread-budgets-across-social-channels
Line Chart: Numerical Data
Marketers Using Instagram Reels
vs TikTok

Line chart showing the


percentage of marketers using
Instagram Reels, TikTok,
Facebook, and YouTube for
influencer marketing over time.

Source: Enberg, J. (2023, December 22). Influencer Marketing by Platform 2023: Half of US Marketers Use TikTok, but Most Build
Their Strategies Around Instagram. eMarketer. https://content-na1.emarketer.com/influencer-marketing-by-platform-2023
Visualizing Textual Data
• World Clouds: Highlight the most frequent words in a text

• Bar Charts: Compare the frequency of specific terms

• Line Charts: Visualize the trend of word or text usage

• Heatmaps: Visualize the density or frequency of text usage


Word Clouds: Textual Data
Highlight the most frequent
words in a text.
Bar Charts: Textual Data
Compare the frequency
of specific terms.

Source: Bruin, E. (2019). Text mining the Clinton and Trump election Tweets. Kaggle. https://www.kaggle.com/code/erikbruin/text-mining-the-clinton-and-trump-election-tweets
Line Charts: Textual Data
Visualize the trend of word
or text usage

Source: Bruin, E. (2019). Text mining the Clinton and Trump election Tweets. Kaggle. https://www.kaggle.com/code/erikbruin/text-mining-the-clinton-and-trump-election-tweets
Visualizing Geospatial Data
• Scatter Maps: Uses dots on a map to show the distribution and
density of data points across a geographical area

• Heatmaps: Uses color gradients to indicate the density or


intensity of data points in specific areas

• Choropleth Maps: Uses colors to represent the data density or


values across regions

Citation: Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xiao, H. (2020). COVID-19: challenges to GIS with big data. Geography and sustainability, 1(1), 77-87.
https://www.sciencedirect.com/science/article/pii/S2666683920300092
Scatter Maps

Citation: Fig. 6. National spatial segmentation of the COVID-19 epidemic risk. In Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xiao, H. (2020). COVID-19: challenges to GIS wit
h big data. Geography and sustainability, 1(1), 77-87. https://www.sciencedirect.com/science/article/pii/S2666683920300092
Heatmaps

Citation: Spatial distribution of help and donation information of COVID-19 during the epidemic period (2020/01/09 - 2020/02/10). In Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xia
o, H. (2020). COVID-19: challenges to GIS with big data. Geography and sustainability, 1(1), 77-87. https://www.sciencedirect.com/science/article/pii/S2666683920300092
Choropleth Maps

Citation: Fig. 4. Rapid mapping based on multi-scale templates. In Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xiao, H. (2020). COVID-19: challenges to GIS
with big data. Geography and sustainability, 1(1), 77-87. https://www.sciencedirect.com/science/article/pii/S2666683920300092
Visualizing
Numerical Data
Implementation

© Faculty of Management
Plotly Graphing Library
• Open source graphing library
• Creates many types of graphs:
• Scatterplots
• Line Charts
• Bar Charts
• Etc.

https://plotly.com/python/
Analyzing Restaurant Reviews
Visualization Tasks
1. Create a Basic Bar Chart of Average Ratings
2. Enhance the Bar Chart with Labels and Colors
3. Create a Bar Chart of Average Ratings by Category
4. Create a Pie Chart of Restaurant Price Levels
5. Create a Scatter Plot of Price Level vs. Average Ratings
Dataset for Visualization
Business_ID Restaurant_Name Category Avg_Rating Num_Reviews Price
1 Le Gourmet Fine 4.5 250 $$$
Dining
2 Pizza Haven Fast 3.8 123 $
Food
3 Burger World Fast 4.2 180 $$
Food
4 Sushi Zen Fine 4.8 200 $$$
Dining
5 Pasta Palace Casual 4.0 160 $$
Dining
Demo: Import Modules
Demo: DataFrame
Demo Task 1:
Create a Basic Bar Chart of Average Ratings
Demo Task 2:
Enhance the Bar Chart with Labels and Colors
Demo Task 3:
Create a Bar Chart of Avg Ratings by Category
Demo Task 3:
Group by output
Demo Task 3
reset_index method
Demo Task 3:
Visualization
Demo Task 4:
Create a Pie Chart of Restaurant Price Levels
Demo Task 5:
Create a Scatter Plot - Price Level vs Avg Ratings
Demo Task 5:
Create a Scatter Plot - Price Level vs Avg Ratings
Visualizations to Explore

Source: Plotly Graphing Libraries. https://plotly.com/python/


Visualizing
Geospatial Data
Implementation

© Faculty of Management
Geospatial Data Handling

Plotly (2024). Basic Examples with Plotly Express. In Scatter Plots on Mapbox in Python. Plotly
Graphing Libraries. https://plotly.com/python/scattermapbox/#basic-example-with-plotly-express
Geospatial Data Type
• City Names: Full names of cities (e.g., Montreal)
• Country Names: Full names of countries (e.g., Canada)
• ISO-3 Country Codes: Three-letter country codes (e.g., CAN for Canada)
• ISO-2 Country Codes: Two-letter country codes (e.g., CA for Canada)
• State Codes: Abbreviations for states within a country (e.g., QC for Quebec)
• Latitude and Longitude: Coordinates used to specify precise locations on
Earths’ surface (e.g., latitude: 45.05, longitude: -73.57)
Scatter Plot

Plotly graphical library (n.d.). plotly.express.scatter_geo. https://plotly.com/python-api-reference/generated/plotly.express.scatter_geo


Dataset: Top 5 Michelin Restaurants
Restaurant Country Country ISO-3 Country ISO-2
Osteria Francescana Italy ITA IT
El Celler de Can Roca Spain ESP ES
Eleven Madison Park United States USA US
Mirazur France FRA FR
Noma Denmark DNK DK
Demo: Country Name
Demo: Result
Demo: ISO-3 Codes
Demo: Result
ISO-2 Codes
Pycountry
Demo: Result
Selecting Visualization Type
Scatter Plots: Plotting each restaurant
individually on a map
If you know the number of Michelin
restaurants per country:
• Heatmaps: Visualizing the density of data
points in a specific area with color gradients
• Choropleth maps: Visualizing the
concentration of data points across a region
with different colors
Heatmap Dataset

Number of
Country Latitude Longitude
Restaurants
Italy 41.8719 12.5674 10
Spain 40.4637 -3.7492 8
United States 37.0902 -95.7129 15
France 46.6034 1.8883 20
Denmark 56.2639 9.5018 5
Demo : Heatmap
Example of density_mapbox
Choropleth Map Dataset

Number of
Country Country ISO-3
Restaurants
Italy ITA 10
Spain ESP 8
United States USA 15
France FRA 20
Denmark DNK 5
Demo: Choropleth Map
Example of choropleth
Demo:
Influencer Recommendation
System & Visualization

© Faculty of Management
Data Collection
& Data Pre-Processing

© Faculty of Management
Variable Description
Username Name of the influencer's account
Dataset Channel name
Country
Name of the Channel
Influencer's country
Url Instagram Url
top200_instag Main Topic Main topic of the page
rammers.xlsx Main Video Category Category of the reels and video
Like Total Likes count
Likes Avg. Average likes
Post Total Posts
Followers Total number of the followers
Boost Index Boost index value
Comments Avg. Average comments number
Views Avg. Average views
Avg. 1 Day Average views perday
Avg. 3 Day Average views for 3 days
Avg. 7 Day Average views for 7 days
Avg. 14 Day Average views for 14 days
Avg. 30 Day Average views for 30 days
Engagement Rate Percentage of Engagement with users
Import Datasets
• Import a CSV file (.csv)
df = pd.read_csv('path_to_your_file.csv')

• Import an Excel file (.xlsx)


df = pd.read_excel('path_to_your_file.csv')

• Import JSON data (.json)


df = pd.read_json('path_to_your_file.json')
Data Pre-Processing
• Explore data features
• Check for missing values
• Verify and rename column names if needed
• Create or drop any columns necessary for the analysis
Inspection: Dimensions & Columns
• An attribute in pandas that returns a tuple representing the
dimensions of a DataFrame
df.shape

• An attribute in pandas that returns a pandas Index object


containing the column labels of the DataFrame
df.columns
Inspection: Info Summary
• Provides a concise summary of a DataFrame

df.info()
Inspection: Head & Tail
• Explore the first few rows of a DataFrame
df.head() # Displays the first 5 rows
df.head(10) # Displays the first 10 rows

• Explore the last few rows of a DataFrame


df.tail() # Displays the last 5 rows
df.tail(10) # Displays the last 10 rows
Inspection: Largest or smallest values
• Returns the first n rows with the largest values
in a column, sorted in descending order.
df.nlargest(n, columns)

• Returns the rows with the smallest values.

df.nsmallest(n, columns)
Handling Missing Values
Handling missing value is a critical step in data
preprocessing, especially when preparing your
data for analysis or machine learning models.

Before handling missing values,


• Check whether missing values in the dataset.
df.isnull().sum()
Removing/Filling Missing Values
• Drop rows where any column has a missing values
df_cleaned = df.dropna()

• Fill missing values with a specific value


df_filled = df.fillna(df.mean())

• Fill missing values with the mean of each column


df_filled = df.fillna(0)
Rename Column(s)
Column names – better to be used

df.columns
Identifying Columns to Rename

- Capitalize the first character of each word


- Remove all dots ('.') from the column names
- Remove all open parentheses ('(') from the column names
- Remove all close parentheses (')') from the column names
- Remove all spaces from the column names.
Method Chaining
In Python, parentheses () can be used to group expressions or to
continue statements across multiple lines for better readability.

df.columns = (df.columns
.str.title() # Capitalize the first character of each word
.str.replace('.', '', regex=False) # Remove dots
.str.replace('(', '', regex=False) # Remove open parentheses
.str.replace(')', '', regex=False) # Remove close parentheses
.str.replace(' ', '', regex=False)) # Remove spaces
Data Analysis

© Faculty of Management
Data Analysis
1. Explore the dataset for content-based
recommendation
2. Visualization for interaction
Summarize Data
Summary statistics for all numerical columns
print(df.describe())
Visualization: Histogram
Distribution of Number of Influencers

fig=px.histogram(df, x='Followers', title='Distribution of Followers')


fig.show()
Demo: Histogram
Code Execution
Visualization: Scatter plot
Likes avg vs Followers
Comments avg vs Followers
ViewsAvg vs Followers
Summary: Histogram
Summary: Scatter plot
Summary Range
Topic
Insight Sharing

© Faculty of Management
Follower-Based Category
Follower-Based Category Selection:
• Chatbot: "Would you like to explore influencers based on their follower count?"
• User: "Yes."
• Chatbot: "Please select a category: 20M-50M, 50M-100M, or 100M-500M
Influencer by Topic
Influencer Categories by Topic:
• Chatbot: "Now, let's look at the main topics influencers cover. Would you like to
see the distribution of main topics?"
• User: "Yes."
• Chatbot: (Displays pie chart of main topics)
• Chatbot: "Would you like to see the top influencers in a specific category?"
• User: "Yes, show me the top influencers in fitness."
• Chatbot: "Here are the top 10 influencers in fitness based on engagement:"
• (Displays list of top 10 fitness influencers with their engagement metrics)
Influencer by Country
Influencer Categories by Country:
• Chatbot: "Let's explore the geographical distribution of influencers. Would you like
to see the proportion of influencers from different countries?"
• User: "Yes."
• Chatbot: (Displays pie chart of country proportions)
• Chatbot: "Here is the geographical distribution of influencers:"
• (Displays scatter geo plot)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy