M05 Lecture Notes
M05 Lecture Notes
© Faculty of Management
Data Visualization
Data visualization is the graphical
representation of information and data.
Source: Samet, A. (2023, September 18). Instagram leads influencer marketing, even as marketers spread budgets across social channels. eMarketer.
https://www.emarketer.com/content/instagram-leads-influencer-marketing-even-marketers-spread-budgets-across-social-channels
Line Chart: Numerical Data
Marketers Using Instagram Reels
vs TikTok
Source: Enberg, J. (2023, December 22). Influencer Marketing by Platform 2023: Half of US Marketers Use TikTok, but Most Build
Their Strategies Around Instagram. eMarketer. https://content-na1.emarketer.com/influencer-marketing-by-platform-2023
Visualizing Textual Data
• World Clouds: Highlight the most frequent words in a text
Source: Bruin, E. (2019). Text mining the Clinton and Trump election Tweets. Kaggle. https://www.kaggle.com/code/erikbruin/text-mining-the-clinton-and-trump-election-tweets
Line Charts: Textual Data
Visualize the trend of word
or text usage
Source: Bruin, E. (2019). Text mining the Clinton and Trump election Tweets. Kaggle. https://www.kaggle.com/code/erikbruin/text-mining-the-clinton-and-trump-election-tweets
Visualizing Geospatial Data
• Scatter Maps: Uses dots on a map to show the distribution and
density of data points across a geographical area
Citation: Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xiao, H. (2020). COVID-19: challenges to GIS with big data. Geography and sustainability, 1(1), 77-87.
https://www.sciencedirect.com/science/article/pii/S2666683920300092
Scatter Maps
Citation: Fig. 6. National spatial segmentation of the COVID-19 epidemic risk. In Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xiao, H. (2020). COVID-19: challenges to GIS wit
h big data. Geography and sustainability, 1(1), 77-87. https://www.sciencedirect.com/science/article/pii/S2666683920300092
Heatmaps
Citation: Spatial distribution of help and donation information of COVID-19 during the epidemic period (2020/01/09 - 2020/02/10). In Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xia
o, H. (2020). COVID-19: challenges to GIS with big data. Geography and sustainability, 1(1), 77-87. https://www.sciencedirect.com/science/article/pii/S2666683920300092
Choropleth Maps
Citation: Fig. 4. Rapid mapping based on multi-scale templates. In Zhou, C., Su, F., Pei, T., Zhang, A., Du, Y., Luo, B., ... & Xiao, H. (2020). COVID-19: challenges to GIS
with big data. Geography and sustainability, 1(1), 77-87. https://www.sciencedirect.com/science/article/pii/S2666683920300092
Visualizing
Numerical Data
Implementation
© Faculty of Management
Plotly Graphing Library
• Open source graphing library
• Creates many types of graphs:
• Scatterplots
• Line Charts
• Bar Charts
• Etc.
https://plotly.com/python/
Analyzing Restaurant Reviews
Visualization Tasks
1. Create a Basic Bar Chart of Average Ratings
2. Enhance the Bar Chart with Labels and Colors
3. Create a Bar Chart of Average Ratings by Category
4. Create a Pie Chart of Restaurant Price Levels
5. Create a Scatter Plot of Price Level vs. Average Ratings
Dataset for Visualization
Business_ID Restaurant_Name Category Avg_Rating Num_Reviews Price
1 Le Gourmet Fine 4.5 250 $$$
Dining
2 Pizza Haven Fast 3.8 123 $
Food
3 Burger World Fast 4.2 180 $$
Food
4 Sushi Zen Fine 4.8 200 $$$
Dining
5 Pasta Palace Casual 4.0 160 $$
Dining
Demo: Import Modules
Demo: DataFrame
Demo Task 1:
Create a Basic Bar Chart of Average Ratings
Demo Task 2:
Enhance the Bar Chart with Labels and Colors
Demo Task 3:
Create a Bar Chart of Avg Ratings by Category
Demo Task 3:
Group by output
Demo Task 3
reset_index method
Demo Task 3:
Visualization
Demo Task 4:
Create a Pie Chart of Restaurant Price Levels
Demo Task 5:
Create a Scatter Plot - Price Level vs Avg Ratings
Demo Task 5:
Create a Scatter Plot - Price Level vs Avg Ratings
Visualizations to Explore
© Faculty of Management
Geospatial Data Handling
Plotly (2024). Basic Examples with Plotly Express. In Scatter Plots on Mapbox in Python. Plotly
Graphing Libraries. https://plotly.com/python/scattermapbox/#basic-example-with-plotly-express
Geospatial Data Type
• City Names: Full names of cities (e.g., Montreal)
• Country Names: Full names of countries (e.g., Canada)
• ISO-3 Country Codes: Three-letter country codes (e.g., CAN for Canada)
• ISO-2 Country Codes: Two-letter country codes (e.g., CA for Canada)
• State Codes: Abbreviations for states within a country (e.g., QC for Quebec)
• Latitude and Longitude: Coordinates used to specify precise locations on
Earths’ surface (e.g., latitude: 45.05, longitude: -73.57)
Scatter Plot
Number of
Country Latitude Longitude
Restaurants
Italy 41.8719 12.5674 10
Spain 40.4637 -3.7492 8
United States 37.0902 -95.7129 15
France 46.6034 1.8883 20
Denmark 56.2639 9.5018 5
Demo : Heatmap
Example of density_mapbox
Choropleth Map Dataset
Number of
Country Country ISO-3
Restaurants
Italy ITA 10
Spain ESP 8
United States USA 15
France FRA 20
Denmark DNK 5
Demo: Choropleth Map
Example of choropleth
Demo:
Influencer Recommendation
System & Visualization
© Faculty of Management
Data Collection
& Data Pre-Processing
© Faculty of Management
Variable Description
Username Name of the influencer's account
Dataset Channel name
Country
Name of the Channel
Influencer's country
Url Instagram Url
top200_instag Main Topic Main topic of the page
rammers.xlsx Main Video Category Category of the reels and video
Like Total Likes count
Likes Avg. Average likes
Post Total Posts
Followers Total number of the followers
Boost Index Boost index value
Comments Avg. Average comments number
Views Avg. Average views
Avg. 1 Day Average views perday
Avg. 3 Day Average views for 3 days
Avg. 7 Day Average views for 7 days
Avg. 14 Day Average views for 14 days
Avg. 30 Day Average views for 30 days
Engagement Rate Percentage of Engagement with users
Import Datasets
• Import a CSV file (.csv)
df = pd.read_csv('path_to_your_file.csv')
df.info()
Inspection: Head & Tail
• Explore the first few rows of a DataFrame
df.head() # Displays the first 5 rows
df.head(10) # Displays the first 10 rows
df.nsmallest(n, columns)
Handling Missing Values
Handling missing value is a critical step in data
preprocessing, especially when preparing your
data for analysis or machine learning models.
df.columns
Identifying Columns to Rename
df.columns = (df.columns
.str.title() # Capitalize the first character of each word
.str.replace('.', '', regex=False) # Remove dots
.str.replace('(', '', regex=False) # Remove open parentheses
.str.replace(')', '', regex=False) # Remove close parentheses
.str.replace(' ', '', regex=False)) # Remove spaces
Data Analysis
© Faculty of Management
Data Analysis
1. Explore the dataset for content-based
recommendation
2. Visualization for interaction
Summarize Data
Summary statistics for all numerical columns
print(df.describe())
Visualization: Histogram
Distribution of Number of Influencers
© Faculty of Management
Follower-Based Category
Follower-Based Category Selection:
• Chatbot: "Would you like to explore influencers based on their follower count?"
• User: "Yes."
• Chatbot: "Please select a category: 20M-50M, 50M-100M, or 100M-500M
Influencer by Topic
Influencer Categories by Topic:
• Chatbot: "Now, let's look at the main topics influencers cover. Would you like to
see the distribution of main topics?"
• User: "Yes."
• Chatbot: (Displays pie chart of main topics)
• Chatbot: "Would you like to see the top influencers in a specific category?"
• User: "Yes, show me the top influencers in fitness."
• Chatbot: "Here are the top 10 influencers in fitness based on engagement:"
• (Displays list of top 10 fitness influencers with their engagement metrics)
Influencer by Country
Influencer Categories by Country:
• Chatbot: "Let's explore the geographical distribution of influencers. Would you like
to see the proportion of influencers from different countries?"
• User: "Yes."
• Chatbot: (Displays pie chart of country proportions)
• Chatbot: "Here is the geographical distribution of influencers:"
• (Displays scatter geo plot)