Social Suggest Team Report
Social Suggest Team Report
Entertainment Platform
Title: - Social Suggest
By: -Team 2
1|P ag e
1. MODELING TEAM
The aim of this part is to correctly model the data using certain domains based on
different users hobbies, habits and likings while surfing over media and
entertainment platforms. This will help in building the recommendation system on
media and entertainment platform hook audience by suggesting them content based
on their liking and demographic location.
Library imported :-
The libraries that were imported were numpy , pandas, sklearn.matrix.pairwise
(Calculation of cosine similarity) and finally sklearn.preprocessing for
(StandardScaler, OneHotEncoder). Import necessary libraries for data
manipulation, similarity calculations, and preprocessing.
We load the dataset from a CSV file. This dataset contains information about users,
their ratings for different items, and other relevant details.
It utilizes the read_csv function from the Pandas library to import the dataset.
In simpler terms, this code segment reads a CSV file containing entertainment
recommendation data into a Pandas DataFrame, enabling subsequent analysis and
processing of the dataset.
3|P ag e
Rows (index): Each row represents a unique user (User ID).
Columns (columns): Each column represents a unique item (Item ID).
Values (values): The cell values are the ratings that users have given to
items. If a user hasn't rated an item, the fill_value=0 parameter fills the cell
with 0.
Subset Selection:
user features = user_features.loc[common_users] selects only the rows in
user_features corresponding to common_users.
user_item_matrix = user_item_matrix.loc[common_users] does the same
for user_item_matrix.
User demographic features and user-item ratings are initially sourced from
different parts of the dataset. This step ensures that the subsequent
computations consider only those users who have both demographic data
and rating data. Ensuring both matrices have the same set of users is crucial
for accurate similarity computation and recommendation generation. It
avoids mismatches and errors that could arise from having differing sets of
users in the two matrices.
The user similarities is calculated to find the similarity between users based
on their demographic features.
Predicting Ratings :-
return predicted_ratings
Generating Recommendation :-
Next, the system calculates the cosine similarity between users based on
their demographic features, creating a user similarity matrix. This matrix
quantifies how similar each pair of users is. Using this similarity
information, the system predicts ratings for items that users haven't rated
yet. This is done by taking a weighted average of ratings from similar users,
ensuring that more similar users have a greater influence on the predicted
ratings.
Example Usage :-
desired_type = 'Game'
In this step, the gender is one-hot encoded (1 for male and 0 for
female), and the age is standardized using the previously fitted
StandardScaler.
7|P ag e
Next, the system calculates the cosine similarity between the target
demographic and the existing user profiles. This allows the system to
identify users with similar demographic characteristics.
The target user is interested in 'Games', so we set the desired type to 'Game'.
Using the calculated similarities, the system predicts ratings for all items in
the user-item matrix, focusing specifically on items categorized as 'Games'.
Finally, the recommended items are printed out, showing the title, genre,
and type for each recommended item. This provides the target user with a
list of personalized game recommendations based on their demographic
profile and the preferences of similar users. This process ensures that the
recommendations are relevant and tailored to the user's interests, enhancing
their overall experience with the recommendation system.
1. OVERVIEW
9|P ag e
Social Suggest Logo:
The logo for our project, "Social Suggest," symbolizes the core mission and vision of
the platform. The design elements and colours were carefully chosen to reflect the
project's emphasis on community-driven recommendations and user engagement.
Social Suggest aims to leverage the collective wisdom of its user base to provide
personalized and highly reliable recommendations for movies. The platform is designed
to help users discover new content based on the preferences and ratings of a large,
diverse community. By focusing on community-driven insights, Social Suggest aims to
revolutionize the way users discover and enjoy new content.
Cards:
This card displays the total number of users who have interacted with the model.
This metric is crucial for understanding the size and scope of the dataset, as
well as the breadth of user engagement. This number reflects the
10 | P a g e
engagement level with the platform or application, indicating how many unique
users have rated, reviewed, or interacted.
This card shows the title with the highest average rating given by users. It
represents the most well-received title according to user reviews and ratings.
The top rated title is a key indicator of user preferences and trends within the
dataset. It highlights the most appreciated content and can be used for
recommendations or promotional purposes.
This card highlights the director whose titles have received the highest average
ratings from users. It identifies the most favoured director based on user
feedback. The top rated director card showcases the filmmaker whose work
resonates the most with the audience. It reflects the director’s ability to
consistently produce highly rated content. Often, the top rated movie is directed
by the top rated director, indicating a strong correlation between the director's
influence and the movie's success.
This card displays the year in which the top rated movie was released. It
provides context to the top rated title by situating it within a specific timeframe.
Knowing the year of release helps understand the movie's reception in its
contemporary context and its enduring popularity. This can be useful for
analyzing trends over time, such as which years produced the most highly rated
content and how audience preferences have evolved.
Together, these cards offer a comprehensive view of the highest quality content in the
dataset, from the individual movie to the creative mind behind it, and the era it
belongs to.
11 | P a g e
Item Ratings: Stacked Column Chart Visualization
Result: The count of Item ID’s with the rating 9 has the highest having the value as
37. It is divided into several types as for the TV Show we are having 13 , Music
with 4, Movie with 8, Game with 3, Book with 9.
12 | P a g e
Top 5 Genres with Most User Interactions: Stacked Bar Chart
Visualization
• The "Top 5 Genres with Most User Interactions" visualization presents a
stacked bar chart that displays the top five genres with the highest number of
user interactions. Each bar represents a specific genre on the y-axis, while the
x-axis shows the count of interaction IDs.
Result: The genre with the highest user interactions is fiction, with a count of 115,
followed by horror, which has 75 user interactions.
13 | P a g e
User Density By Location: Map Visualization
• The "User Density By Location" visualization presents a map that displays
user demographics and density based on their geographical locations. This
interactive map allows users to explore detailed information about user
distribution, including user IDs, gender, and age, at various locations. When
users hover their cursor over any point on the map, a tooltip appears, providing
all relevant details for that location.
• Location: Location
• Tooltips:
1. User ID: The unique identifier for each user.
2. Gender: The gender of users at that location.
3. Age: The age range or specific ages of users at that location.
Result: The map offers a visual representation of user density, making it easy to
identify regions with high or low user concentrations. Placing the cursor on any
location it displays the No. of users, ID’s , Gender.
14 | P a g e
Total Items by Release Year: Area Chart Visualization
• The "Total Items by Release Year" visualization is an area chart that shows
the number of items released each year, focusing on books, movies, and TV
shows. The x-axis represents the release years, while the y-axis displays the
count of item IDs. The chart includes a legend for item types, providing a
breakdown of the number of each type released annually.
Details:
Result: For example, the year 2021 saw a peak in the release of TV shows, with a
total of 27, while the number of books released was 4 and movies were 72.
15 | P a g e
2. USER ANALYSIS
• The section involves a filter for selecting different types of content, including
books, games, movies, music, and TV shows. It allows users to filter the
analysis based on their interest in these specific types. This filter can help in
understanding user preferences and engagement with different content types,
which is crucial for targeted recommendations.
16 | P a g e
Users by Gender: Pie Chart
• A pie chart is ideal for displaying parts of a whole, making it perfect for
showcasing how the user base is divided between Genders (Female and Male).
A pie chart shows the distribution of users by gender. The pie chart reveals
that 51% of users are Male (1.1K) and 49% are Female (1.06K).
• Pie charts are ideal for displaying the proportions of a whole. In this case, it
shows the distribution of users between two categories: Male and Female. The
pie chart allows viewers to quickly understand the percentage split between
genders.
Details:
• Legend: Gender: Female and Male, it is indicated using Blue for Female
and Orange for Male along with their corresponding percentages.
• Values: Count of User ID
• Insights: It provides an understanding of number of users by gender.
Result: The pie chart conveys a near even split between genders, with a slight
majority being Male 51% as compared to Females 49%.
17 | P a g e
Why map is needed:
Details:
• Location: Location
• Legend: Gender (Male: orange color and Female)
• Tooltips: When hovering over a dot on the map, a tooltip provides detailed
information about that specific data point. This typically includes:
✓ Location: The geographic location of the users.
✓ Gender: The gender of the users (male or female).
✓ Count of User ID: The number of users at that specific location.
✓ Average of Age: The average age of the users at that location.
18 | P a g e
Users by Age Group: Stacked Bar Chart
• Stacked bar charts are useful for comparing quantities across different
categories. Here, it shows the number of users in different age groups,
separated by gender. The bars make it easy to compare the sizes of these
groups and see the distribution of users by age.
Details:
• X-axis: Number of Users (representing the number of users in each age group)
• Y-axis: Age (groups), specific labels for each age range Young Adult: 18-24,
Middle-aged: 25-53, Elderly: 54-71, Old Age:72 and above.
• Legend: Gender (Female in blue, Male in orange)
• Insights: The chart provides a clear overview of number of users falling under
each age group.
Filter applied: Clicking on male or female users will enable the chart to show the
number of users by gender falling under each Age group.
19 | P a g e
Result: The largest user groups are the Elderly (678) further dividing into 329 females
and 349 males and the lowest number of users are in Old Aged (294) further dividing
into 156 females and 138 males. The largest user groups are Elderly (678) followed
by Middle-aged (611), Young adults (374) and Old Aged (294). The number of Male
users are high in Elderly while in other age-groups the number of Female users are
high as compared to Males.
• Bar charts are effective for ranking and comparing the sizes of different
groups. This chart displays the top five countries with the most users, making
it easy to see which countries have the largest user bases and compare the
gender distribution within those countries.
Details:
20 | P a g e
• Y-axis: Country
• Legend: Gender, the legend distinguishes between different types of Gender
such as Blue for Females and Orange for Males.
• Insights: The chart provides a clear visual representation of gender
distribution within each country.
Button feature: A button is included in this “More details” for the purpose of
navigating to “Detailed Analysis” page to give us more insights about the number of
users in other countries also.
Filter Applied: Clicking on a segment, such as the male users in the United States,
will highlight the specific segment within the chart. For instance, clicking on the
orange part representing males will show the 393 males out of the total 796 users in
the USA.
Result: The United States has the highest user base, suggesting a strong market
presence. The United States leads with 796 users, followed by Colombia-75 users,
Mongolia 48 users, American Samoa and Northern Mariana Islands with same
number of users i.e.,47 users. The number of Female users are high in these countries
as compared to Male users.
21 | P a g e
Why stacked column chart is needed:
• It allows for easy comparison of the total number of young adults between
countries. By stacking the columns, it shows how the total is divided between
males and females within each country. It consolidates data into a single
column per country, making efficient use of space and allowing for more
countries to be displayed without cluttering the chart.
Details:
• X-axis: Country
• Y-axis: Number of Users
• Legend: Gender (Male: Orange color and Female: Blue color)
• Insights: The chart highlights the concentration of young adult users within
each country.
Result: The United States has the highest count (210), followed by several countries
with much smaller Young Adult populations. The United States not only has the most
users overall but also the Young Adult users followed by several countries withsmall
number of users Azerbaijan with 16 users, and American Samoa, Indonesia, Lao
People’s Democratic Republic, Venezuela with same number of users (15 users).
22 | P a g e
3. Creator wise & Interaction Analysis
Button feature: The Creator wise analysis contains a button, which will navigate us
to “Detailed Analysis” page after clicking on it. This button was necessary so as to get
more details about the “Release Masters” as the current visuals only includes those
Director/Author/Artists/Publisher who have the count of titles greater than or equal to
5.
23 | P a g e
Release Masters: Stacked Bar Chart
• The above is stacked bar chart which visualizes the Number of Titles released
by various Directors/Authors/Artist/Publishers across different genres. The
visualization only shows those values which has Number of titles greater
than or equal to 5 corresponding to a particular
Director/Authors/Artist/Publisher.
• A stacked bar chart is ideal here because it allows for easy comparison of the
number of titles across creators. It accommodates long names and ensures that
the viewer can quickly have a look which creators have the most titles.
Details:
Results: Agatha Christie leads with 7 titles in Detective and Mystery Stories. Janet
Evanovich follows with 6 titles in Fiction. Christopher Paolini, Mary Stewart, and
Neal Stephenson each have 5 titles in Fiction. Peggy Parish has 5 titles, mainly in
Juvenile Fiction.
24 | P a g e
Top Rated Creators
• The above mentioned is a stacked column chart titled “Top Rated Creators”
which displays the Count of Rating received by Top Creators.
• The stacked column chart type is effective for showing the magnitude of
ratings for each creator, making it easy to compare the popularity.
Details:
• X-axis: Director/Authors/Artist/Publisher
• Y-axis: Count of Rating
• Insights: This visualization provides a clear understanding of Top Rated
Creators.
Results: Agatha Christie is the top-rated creator with 23 ratings. C.S. Lewis follows
with 9 ratings. Janet Evanovich has less count of ratings which is 6.
Interaction Analysis
• The Interaction Analysis page provides a detailed look into user engagement
with various content types across different demographics.This analysis helps
in understanding user behavior and preferences, allowing for better content
targeting and personalized recommendations.
Slicer: This section has a slicer for selecting different Types of content, including
Books, Games, Movies, Music, and TV shows. It allows users to filter the analysis
based on their interest in these specific types.
25 | P a g e
Why Interaction Analysis was necessary:
• To understand how different age groups interact with content and the typesof
interactions they engage in.
• To know more about which age groups are viewing content the most helps in
tailoring content offerings.
• To reveal the preferences for different types of interactions (e.g., adding to
playlists, bookmarking, commenting and many more) among various age
groups. It provides insights into user behavior and engagement strategies.
• To analyze interactions by Genre and Type helps in understanding which
Genres are most engaging and the preferred ways users interact with them.
Views by Age Groups: Stacked column chart
• The above chart titled “Views by Age groups” visualizes using a stacked
column chart which shows the Count of Interaction ID (Views) by different
Age (groups).
26 | P a g e
Why stacked column chart is needed:
• This chart illustrates the number of interactions (views) across different age
groups. Stacked column charts are effective for categorical comparisons, and
they make it simple to compare the engagement levels between age groups.
Details:
Result: Count of Interaction ID are highest for Elderly (109), followed by Young
Adults (97), Middle Aged (85) and lowest being the Old Aged (49).
• The Stacked column chart shows the various types of interactions (Add to
Playlist, Bookmark, Comment, Dislike, Like, Pause, Purchase, Rate, Rent,
Return, Share, Skip, View) across different age (Group) along with the count
of interactions.
• A stacked bar chart allows for the comparison of total interactions per age
group while also breaking down the contribution of each interaction type
27 | P a g e
within those groups. This provides a comprehensive view of both overall
and specific interaction trends.
Details:
Result:
28 | P a g e
Why Ribbon chart is needed:
• A ribbon chart shows how interactions vary by genre over time or across
different types. It highlights changes and trends, making it easy to see which
genres are gaining or losing popularity and how they rank relative to each
other over the period. The ribbons show the transitions smoothly, helping to
compare interactions between genres in a visually appealing way.
Details:
29 | P a g e
4. TITLE ANALYSIS
Cards:
30 | P a g e
glance. Card visualizations are an essential component of data dashboards,
providing a quick, clear, and user-friendly way to present key metrics.
• Type: Movie.
• Type: Game.
31 | P a g e
• Description: Brief synopsis or notable details about the game.
• Type: Book.
• Description: Brief synopsis or notable details about the book.
Selection Criteria:
• Filtering: The dataset is filtered to find the highest-rated item for each type.
• Max Rating: The item with the maximum rating within each category is
selected as the top-rated item.
• Specific Type: Each card is focused on a specific type of content, ensuring
clarity and relevance.
32 | P a g e
users quickly identify the most popular and highly rated items, aiding in
content discovery and decision-making.
Details:
• Filters: Interaction Type (View), Top 5 by Max Rating, All Titles, All Types
• Columns: Titles, Ratings
• Insights: The table provides a clear and concise list of top-rated titles, helping
users identify which items are most appreciated by the audience. This insight
is useful for understanding user preferences and highlighting standout content.
Result: The number of titles with a rating of 10 is 1, which is "The Paper Tigers."
There are 7 titles with a rating of 9.90, 2 titles with a rating of 9.80, 2 titles with a
rating of 9.70, and 3 titles with a rating of 9.60.
Fresh Flicks of 2022-2023: Area Chart Visualization
• The "Fresh Flicks of 2022-2023" visualization is an area chart that displays
all titles released between the years 2022 and 2023. The x-axis shows the
titles, while the y-axis displays the release year, highlighting the distribution
of new releases over this period.
33 | P a g e
and popular new releases, which is valuable for users looking to discover
current and relevant content.
Details:
• X-axis: Titles
• Y-axis: Release Year
• Insights: The chart provides a visual representation of titles released in 2022
and 2023, helping users track the latest additions to the content library. This
is particularly useful for users interested in exploring recent releases.
Result: The titles released in the specified period include The Last of Us (2023) ,
Halo (2022) , House of Cards (2022).
Title Count by Release Year (Groups): Stacked Area Chart
Visualization
• The "Title Count by Release Year (Groups)" visualization is a stacked area
chart that displays the count of titles released over different periods, divided
into 10-year groups from 1950 to 2023. This chart helps visualize the
distribution and trends of title releases over these grouped time periods.
34 | P a g e
Details:
Result: The highest number of titles, 575, were released during the period of 2000-
2009, while the lowest number of titles, 190, were released during the period of 2020-
2023.
Type: Slicer Visualization
• The slicer is an interactive filter that allows users to select a specific type of
content (e.g., book, movie, TV show, game, music). By choosing a type in the
slicer, all visualizations on the page update dynamically to reflect data related
only to the selected type.
35 | P a g e
5. RATING ANALYSIS
• This visualization analyses the distribution of ratings for fiction books using
a clustered column chart. The chart was selected to clearly represent the count
of items within each rating category, highlighting user preferences within the
fiction genre.
36 | P a g e
Graph Type with Overview:
• Graph Type: Clustered Column Chart
• Overview: This visualization shows the ratings for the Fiction genre. The
chart illustrates the count of items based on their ratings.
Details:
• X-axis: Rating
• Y-axis: Count of Items
• Legend: Type (All items are books in this genre)
• Insights: Fiction contains only book items, so the count of users with respect
to ratings is shown.
• The result shows the distribution of user ratings, helping to determine the
most and least favoured ratings within the fiction genre.
• The result shows that the rating of 8.00 has the highest count, with 3 books
receiving this rating. All other ratings (1.00, 2.00, 3.00, 5.00, 7.00, 10.00)
each have only 1 book.
• This indicates that 8.00 is a relatively common rating for fiction books in this
dataset.
37 | P a g e
Graph Type with Overview:
• Graph Type: Clustered Column Chart
• Overview: This visualization shows the ratings for the Horror genre,
displaying the count of items of different types.
Details:
• X-axis: Rating
• Y-axis: Count of Items
• Legend: Type (Book, Movie, TV Show)
• Insights: The chart shows the ratings distribution for three different types of
items in the Horror genre.
38 | P a g e
• Overview: This visualization shows the ratings for the Thriller genre,
displaying the count of items of different types.
Details:
• X-axis: Rating
• Y-axis: Count of Items
• Legend: Type (Book, Movie, TV Show)
• Insights: The chart displays the ratings distribution for books, movies, and
TV shows in the Thriller genre.
39 | P a g e
• Overview: This visualization shows the ratings for the Drama genre,
displaying the count of items of different types.
Details:
• X-axis: Rating
• Y-axis: Count of Items
• Legend: Type (Movie, TV Show)
• Insights: The chart shows the ratings distribution for movies and TV shows
in the Drama genre.
• This visualization studies the ratings for adventure items using a clustered
column chart. The chart was selected to depict the count of items within the
adventure genre across different rating categories.
40 | P a g e
• Overview: This visualization shows the ratings for the Adventure genre,
displaying the count of items of different types.
Details:
• X-axis: Rating
• Y-axis: Count of Items
• Legend: Type (Game, Movie, TV Show)
• Insights: The chart illustrates the ratings distribution for games, movies, and
TV shows in the Adventure genre.
41 | P a g e
6. Age-Wise Analysis
42 | P a g e
Why this Chart is needed:
Details:
• This visualization explores movie viewing patterns across age groups using
a funnel chart. It was selected to show the count of movies viewed within
each age group.
43 | P a g e
Why this Chart is needed:
Details:
Filter Applied:
• Type: Movie
• Interaction Type: View
Result of the Visualization: The age group 67-73 has the highest movie viewership
(89), while the age group 25-31 has the lowest(58).
44 | P a g e
Age Groups' Interaction across Entertainment Types
• This chart provides insights into how different age groups interact with
various types of content. It helps identify trends in user behaviour, informing
content strategy and user engagement efforts.
Details:
45 | P a g e
Result of the Visualization:
• The highest number of users is for the type "Movie" (667) and the least
number of users is for "Game"(335).
• For example, among 2500 users, 667 interacted with movies: 117 purchased,
103 viewed, 94 disliked, etc.
• This visual is needed to provide users with a tool for detailed analysis, making
the dashboard more interactive and user-friendly. It helps in understanding
specific user behaviours and preferences for different content types.
46 | P a g e
Filter Applied:
Result of the Feature: This feature enhances the flexibility of the dashboard,
allowing for targeted analysis of user interactions with specific item types.
47 | P a g e
7. Detailed Analysis:
• Each section presents data that is vital for making informed decisions
regarding content preference, marketing strategies, and user engagement.
48 | P a g e
No. of Users by Country and Gender:
• This chart presents the distribution of users by country and gender. It helps
in understanding the geographic and gender demographics of the user base.
Details:
49 | P a g e
o Other countries like Colombia, Mongolia, and American Samoa have
significantly fewer users.
Filter Applied:
• Clicking on a segment, such as the female users in the United States, will
highlight the specific segment within the chart.
• For instance, clicking on the blue part representing females will show the
403 females out of the total 796 users in the USA.
No. of Young Adults by Country:
50 | P a g e
o Other countries show a more balanced or different distribution but with
much smaller numbers.
Filter Applied:
• Selecting a segment, such as young males in the United States, filters the
"Release Masters" table accordingly.
• For example, selecting the orange part representing 104 young males out of
210 total young adults in the USA filters the release data to show titles based
on the preferences and data of young male users from the selected country.
Release Masters Table:
51 | P a g e
Details:
• Columns:
o Director/Authors/Artist/Publisher: The name of the content creator.
o Count of Title: Number of titles released by the creator.
o Genre: Genre of the titles.
• Insights:
o The table shows a diverse range of genres, indicating a wide variety of
content available to users.
o Agatha Christie is a prominent figure with the highest count of titles (7)
in the "Detective and mystery stories" genre.
o Other notable creators include Janet Evanovich, Christopher Moore, and
Mary Stewart, each with multiple titles primarily in fiction.
o The total count of titles listed is 2496, reflecting a substantial library.
52 | P a g e