0% found this document useful (0 votes)

41 views3 pages

w9b Netflix Prize

The Netflix Prize was a competition launched by Netflix in 2006 to improve their recommendation engine by 10%. The competition helped Netflix improve recommendations and gain publicity. However, Netflix did not properly consider privacy issues with releasing their dataset, which led to some users being re-identified and legal issues. The winning approach used an ensemble of many methods like SVD and restricted Boltzmann machines.

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views3 pages

w9b Netflix Prize

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Netflix Prize

In 2006 Netflix was just a mail-based DVD rental company (they weren’t streaming videos
yet). Customers with a subscription could rent as many DVDs as they liked, and Netflix
wanted to keep posting DVDs to their customers. People who weren’t using the service
would realize, and cancel their subscription. Netflix kept up demand by recommending
movies their customers would like, and they had data showing that better personalized
recommendations led to higher customer retention. To improve recommendations further,
Netflix launched a challenge — with a million dollar prize — to improve the root mean
square error of their recommendation engine by 10%.
The rules of the competition and FAQ are still online if you want more detail. Much of
how the competition was set up was well thought out. The way the leaderboard works, and
limitations on the number of submissions, was since adopted in much the same form by
Kaggle’s competitions. As we’ll see though, Netflix didn’t give enough thought to privacy.
Wikipedia has a good, short summary: https://en.wikipedia.org/wiki/Netflix_Prize

1 SVD approach
One of the most significant approaches to the competition was referred to as “SVD”. Simon
Funk, one of the competitors, beat Netflix’s existing system early on with a short and simple
C program, which performed stochastic gradient descent on a simple model.
The model stated that the C × M matrix of movie ratings for the C customers and M
movies can be approximately decomposed into the product of a tall thin C × K matrix and
a short wide K × M matrix. This low-rank approximation is like a standard truncated SVD
approximation, but without an intermediate diagonal matrix, which can be absorbed into
the other matrices. A conventional SVD routine finds an approximation with the minimum
possible square error, summed over every element of the matrix. However, the Netflix data
matrix isn’t fully observed (no customer rates every movie), so we minimize the sum of
the square differences between the observed matrix M and its approximation, only at the
observed elements. A conventional SVD routine can’t fit this cost function. However, we can
apply stochastic gradient descent to the cost function, where the thin rectangular movie and
customer matrices contain the parameters of the model. The resulting approximate matrix
can then be evaluated at any cell, giving predictions for ratings that haven’t been observed.
One of the fitted matrices contains K learned features about each customer. The other
contains K learned features about each movie. The inner product of these features is used
to predict the customer’s rating of a movie. One of the K indexes might correspond to
“romance”: the corresponding customer feature will take on large values if the customer likes
romantic movies, and the movie feature will take on large values if it is a romantic movie. In
this model customers can like multiple genres of movie, and no other customer has to have
the same combination of tastes for the system to work. No genre labels are required to fit
the model though: the fitting procedure learns the features for itself.
Rather than expand further, some of the successful Netflix competitors wrote an accessible
article, fleshing out the details: Matrix factorization techniques for recommender systems,
Koren et al., IEEE Computer 42(8):30–37, 2009.

2 More methods and the winning ensemble

Restricted Boltzmann Machines (RBMs) are a type of neural network, although unlike the
feedforward networks we have covered in this course, they have random values for hidden
units. A prediction system based on RBMs, along with many other methods, were also tried
on the Netflix challenge. They didn’t beat tuned combinations of SVD models. However,

MLPR:w9b Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

they made different mistakes. Averaging predictions from RBM- and SVD-based systems
did better than either system alone.
In fact, the final winning entry used a large ensemble (collection) of many different methods,
and an elaborate way to form the ensemble. Netflix did adopt variants of SVD and RBMs,
but the performance gains of the full ensemble system weren’t worth adopting in their
production system. Effort was better spent addressing the needs of their new streaming
services.
The Netflix challenge nicely demonstrated how far you can get with the right baseline system.
It also demonstrated that there is a lot of room for creativity in machine learning: there
were many different approaches to the same problem, none of which are fundamentally and
uniquely “right”.
Competitions and well-defined shared tasks can be useful for validating methods: there was
a lot of skepticism from some parts of the community about RBMs — but an independent
competition showed they really worked. Although there are also limits to the usefulness of
challenges. When reaching the limits of what’s possible, highly-engineered solutions are
required to make small improvements, and the resulting large systems may not actually be
practical.
Some commentators suggested that the challenge was a “failure” because Netflix didn’t
adopt the final winning entry. I doubt Netflix saw it that way — they rapidly got a lot of
immediately-useful “free” consulting and publicity. Although Netflix probably does have
one regret. . .

3 The privacy fall-out

The Netflix data gave an anonymous unique identifier to each customer, and the real names
and release dates of each movie. Apart from that, the data just defined a sparse matrix with
ratings for a subset of customer-movie pairs. I’ll admit that in 2006 I didn’t see any reason
why releasing this data could be a problem. This was what Netflix said in their FAQ:

"Is there any customer information in the dataset that should be kept private?
No, all customer identifying information has been removed; all that remains are ratings
and dates. This follows our privacy policy, which you can review here. Even if, for
example, you knew all your own ratings and their dates you probably couldn’t identify
them reliably in the data because only a small sample was included (less than one-tenth
of our complete dataset) and that data was subject to perturbation. Of course, since you
know all your own ratings that really isn’t a privacy problem is it?"
— https://www.netflixprize.com/faq.html

I’ll also admit that (in 2006) I didn’t understand why they were bothering to “perturb the
dataset”, which meant they randomly changed some of the ratings from their true values.
How could this data possibly be deemed sensitive?
Firstly, anonymizing data doesn’t work if an adversary might have access to other data that
they can correlate with your data. In this case the publicly-available Internet Movie Database
(IMDB) contains movie ratings. It was possible to identify some people in the Netflix prize
data, by comparing patterns of ratings to those found in IMDB.
The IMDB ratings are public already though, so how has any harm been done? Well,
users of IMDB know that it is public, and so may choose not to rate movies with certain
political or sexual leanings. However, the Netflix dataset did contain ratings of political and
pornographic movies. By matching users to IMDB, some of these sensitive ratings were
attached to identifiable individuals. Not surprisingly, the relevant people were upset, and
some of them took legal action. The wikipedia article on the Netflix Prize has more details
and references.

MLPR:w9b Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

Netflix are not alone in making this mistake. In August 2016, Australia’s federal Department
of Health published medical billing records of about 2.9 million Australians online. It was
possible to re-identify some of the patients.
Any scientific study (including your projects) should consider ethical issues, including
privacy, before it begins. Anything involving potentially-sensitive data (amongst other issues)
requires some thought and oversight. Companies also have a duty of care to their customers
and wider society. The Netflix prize shows how easy it is to incorrectly dismiss data as not
possibly sensitive.
For keen students: A seminal paper on privacy is: Differential privacy, Cynthia Dwork, Pro-
ceedings of the 33rd International Colloquium on Automata, Languages and Programming,
2006. — Differential privacy is a theoretical framework for data-releasing mechanisms that
are resistant to all possible attacks from adversaries with arbitrary side-information. There is
recent and ongoing work on applying it to machine learning. For those interested in theory,
differential privacy is also interesting as a principled mechanism for regularization: if a
system is provably insensitive to individual details in the training set, it can’t have overfitted
to them.
Whether Netflix could have really run their competition under the constraints of differential
privacy is unclear. It’s possible they could have explored social rather than technological
solutions: getting people to donate their data. But that approach would not be without its
challenges either.

4 Check your understanding

Fitting the SVD-like model to a sparse customer–movie ratings matrix isn’t a convex opti-
mization problem. But is the square error a convex function of the model parameters when
fitting a dense customer–movie matrix, where every customer rated every movie? Justify
your answer.
[The website version of this note has a question here.]
Optional (but recommended):
Work out the details for stochastic gradient descent for the low-rank approximation (or
“SVD”) approach to modelling the sparse customer–movie ratings matrix. Given a rating for
a customer–movie pair, which parameters of the model get updated and how? Use the form
of these updates to explain why it is important to randomly initialize the parameters of the
model.
To think about: What data do companies and governments have about you?1 Are you
comfortable with the organizations themselves having this data, and the predictions they
might make from it? What if it was shared with other organizations, and might that happen?
How much of this data would it be acceptable to be posted on a public website? How many
of these organizations can be trusted to have good enough security that the data will always
remain secret? How much consent do you feel you have given for this data collection and
the use of the resulting data? How does the ability of these organizations to build useful
services with this data trade-off against these privacy issues?

1. Starting points: do you use the web, a credit/debit or loyalty card, a smart-phone, or use any services?

MLPR:w9b Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

Assignment 1: Artwork: Personalisation at Netflix
No ratings yet
Assignment 1: Artwork: Personalisation at Netflix
7 pages
Complete Bundle Discovering Psychology The Science of Mind 4th Edition Cacioppo HQ File
100% (1)
Complete Bundle Discovering Psychology The Science of Mind 4th Edition Cacioppo HQ File
408 pages
15 - Matrix Factorization
No ratings yet
15 - Matrix Factorization
55 pages
Case Study On Netflix
No ratings yet
Case Study On Netflix
20 pages
Statistical Significance of The Netflix Challenge: Andrey Feuerverger, Yu He and Shashi Khatri
No ratings yet
Statistical Significance of The Netflix Challenge: Andrey Feuerverger, Yu He and Shashi Khatri
30 pages
How To Break Anonymity of The Netflix Prize Dataset: Abstract
No ratings yet
How To Break Anonymity of The Netflix Prize Dataset: Abstract
17 pages
Collaborative Filtering & Recommendation System
No ratings yet
Collaborative Filtering & Recommendation System
17 pages
Chapter4 - Web Based Personalization Systems - Part3 - Collaborative Filtering - SVD
No ratings yet
Chapter4 - Web Based Personalization Systems - Part3 - Collaborative Filtering - SVD
14 pages
0 KDD Cup Workshop 2007
No ratings yet
0 KDD Cup Workshop 2007
2 pages
Appm 3310 Final Project
No ratings yet
Appm 3310 Final Project
13 pages
Building Industrial - Scale Real - World Recommender Systems
No ratings yet
Building Industrial - Scale Real - World Recommender Systems
82 pages
Data Science Case Study
No ratings yet
Data Science Case Study
8 pages
Netflix Case Study - Information Management
No ratings yet
Netflix Case Study - Information Management
4 pages
Netflix Case Study Clustering
No ratings yet
Netflix Case Study Clustering
2 pages
If You Liked This, You're Sure To Love That
No ratings yet
If You Liked This, You're Sure To Love That
6 pages
Netflix - Big Data Implementations
No ratings yet
Netflix - Big Data Implementations
6 pages
Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
No ratings yet
Probabilistic Matrix Factorization: Ruslan Salakhutdinov and Andriy Mnih
8 pages
Netflix: Business Analytics in Entertainment
No ratings yet
Netflix: Business Analytics in Entertainment
2 pages
13jay Chotaliya
No ratings yet
13jay Chotaliya
119 pages
Q.1 Define The Need For Contents Selection. Enlist The Principles For Selecting The Curriculum Contents. Answer
No ratings yet
Q.1 Define The Need For Contents Selection. Enlist The Principles For Selecting The Curriculum Contents. Answer
11 pages
Matrix Factorization
No ratings yet
Matrix Factorization
18 pages
Netflix HD
No ratings yet
Netflix HD
21 pages
The Programming: Million Dollar Prize
No ratings yet
The Programming: Million Dollar Prize
6 pages
Rasmussen 2024 Friction in The Netflix Machine How Screen Workers Interact With Streaming Data
No ratings yet
Rasmussen 2024 Friction in The Netflix Machine How Screen Workers Interact With Streaming Data
18 pages
Introduction of Ai (Bft3)
No ratings yet
Introduction of Ai (Bft3)
21 pages
MINIPROJECT
No ratings yet
MINIPROJECT
11 pages
Movie Recommendation System Using SVD
No ratings yet
Movie Recommendation System Using SVD
1 page
Cas Netflix - Leveraging Big Data To Predict Entertainment Hits
100% (1)
Cas Netflix - Leveraging Big Data To Predict Entertainment Hits
19 pages
Personal and Big
No ratings yet
Personal and Big
6 pages
Netflix: How To Keep A Continued Success: Yuehan Wang
No ratings yet
Netflix: How To Keep A Continued Success: Yuehan Wang
5 pages
Netflix-Changing The Rules of The Game
No ratings yet
Netflix-Changing The Rules of The Game
17 pages
The Bellkor 2008 Solution To The Netflix Prize
No ratings yet
The Bellkor 2008 Solution To The Netflix Prize
21 pages
Aca 2
No ratings yet
Aca 2
6 pages
Big Data Management - Assessment 4 - Answer Template - Computing BDM
No ratings yet
Big Data Management - Assessment 4 - Answer Template - Computing BDM
14 pages
KWMS7SL830.BI21598794-Assingment 01
No ratings yet
KWMS7SL830.BI21598794-Assingment 01
23 pages
NETFLIX
No ratings yet
NETFLIX
13 pages
Module 3. Netflix - Leveraging Big Data
No ratings yet
Module 3. Netflix - Leveraging Big Data
19 pages
Netflix Algorithm For Movie Suggestions: Abstract
No ratings yet
Netflix Algorithm For Movie Suggestions: Abstract
6 pages
Asm 2 DBD
No ratings yet
Asm 2 DBD
9 pages
Survey On Cinematics Recommendation System
No ratings yet
Survey On Cinematics Recommendation System
10 pages
Netflix Case Study Yash
No ratings yet
Netflix Case Study Yash
10 pages
Recommender System
No ratings yet
Recommender System
45 pages
Netflix Big Data Analytics
No ratings yet
Netflix Big Data Analytics
9 pages
Zoya Parasher - 2152916 - Big Data
No ratings yet
Zoya Parasher - 2152916 - Big Data
6 pages
Technical Documenetflix Technicalnt
No ratings yet
Technical Documenetflix Technicalnt
15 pages
How Netflix Uses AI (AutoRecovered)
No ratings yet
How Netflix Uses AI (AutoRecovered)
8 pages
Data Driven Business Report
No ratings yet
Data Driven Business Report
5 pages
Case Study Netflix
No ratings yet
Case Study Netflix
3 pages
Letter On Intent For Turkey Burslari Scholarship
94% (18)
Letter On Intent For Turkey Burslari Scholarship
2 pages
IY2593 Managerial Economics: Assignment 1
No ratings yet
IY2593 Managerial Economics: Assignment 1
5 pages
Module-5 Part-2: Exception and Interrupt Handling
No ratings yet
Module-5 Part-2: Exception and Interrupt Handling
23 pages
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
No ratings yet
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
268 pages
Ep 400
No ratings yet
Ep 400
15 pages
Procurement Key Performance Indicators and Metrics
100% (1)
Procurement Key Performance Indicators and Metrics
10 pages
Jb-Wd-Dse 6110 Mkii - 200 (1506) - 650
No ratings yet
Jb-Wd-Dse 6110 Mkii - 200 (1506) - 650
2 pages
Parallel Circuits
No ratings yet
Parallel Circuits
30 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
Fembot
No ratings yet
Fembot
7 pages
Triumph Herald
No ratings yet
Triumph Herald
84 pages
纸张研究
100% (2)
纸张研究
12 pages
Brosur HE-43 T
No ratings yet
Brosur HE-43 T
2 pages
Bizhub 43 - Service Manual - Ver.253363295-B PDF
No ratings yet
Bizhub 43 - Service Manual - Ver.253363295-B PDF
232 pages
Fema Hazus 5.1 Earthquake Model User Guidance
No ratings yet
Fema Hazus 5.1 Earthquake Model User Guidance
238 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
SHR 1040
No ratings yet
SHR 1040
23 pages
X937 - e CAB II Blower Regenerated Dryer With ADC Controls
No ratings yet
X937 - e CAB II Blower Regenerated Dryer With ADC Controls
92 pages
Solution of MCQ - Ch1,2 &3
No ratings yet
Solution of MCQ - Ch1,2 &3
4 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
【New message】TP-Link WiFi Range Extender - Dual Band 1200Mbps - TL-WDA6332RE
No ratings yet
【New message】TP-Link WiFi Range Extender - Dual Band 1200Mbps - TL-WDA6332RE
30 pages
Inventory Management, Intro To Materials Management
No ratings yet
Inventory Management, Intro To Materials Management
27 pages
TS Part2
No ratings yet
TS Part2
62 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
ASUS A7v266 Motherboard Manual
No ratings yet
ASUS A7v266 Motherboard Manual
110 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
IEEE 802.11ax - Technology Introduction White Paper
No ratings yet
IEEE 802.11ax - Technology Introduction White Paper
36 pages
Master-of-Science-in-Renewable-Energy-and-Management
No ratings yet
Master-of-Science-in-Renewable-Energy-and-Management
1 page
Junior Engineer (Civil, Mechanical, Electrical and Quantity Surveying & Contracts) Examination, 2020 (Paper-I)
No ratings yet
Junior Engineer (Civil, Mechanical, Electrical and Quantity Surveying & Contracts) Examination, 2020 (Paper-I)
53 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Part 3
No ratings yet
Part 3
29 pages
Part 4
No ratings yet
Part 4
24 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
SSRN Id4032020
No ratings yet
SSRN Id4032020
27 pages
MDA3S
No ratings yet
MDA3S
22 pages
Part 5
No ratings yet
Part 5
31 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
TechCrunch - The Rise of AI 'Reasoning' Models Is Making Benchmarking More Expensive
No ratings yet
TechCrunch - The Rise of AI 'Reasoning' Models Is Making Benchmarking More Expensive
4 pages
Unit 4 1
No ratings yet
Unit 4 1
7 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Installation Guide
No ratings yet
Installation Guide
19 pages
Wa0017.
No ratings yet
Wa0017.
4 pages
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Sai Opssb - Admit Card
No ratings yet
Sai Opssb - Admit Card
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Solution of Homework#2
No ratings yet
Solution of Homework#2
8 pages
How Software Works: The Magic Behind Encryption, CGI, Search Engines, and Other Everyday Technologies
From Everand
How Software Works: The Magic Behind Encryption, CGI, Search Engines, and Other Everyday Technologies
V. Anton Spraul
4/5 (4)
Data Science with .NET and Polyglot Notebooks: Programmer's guide to data science using ML.NET, OpenAI, and Semantic Kernel
From Everand
Data Science with .NET and Polyglot Notebooks: Programmer's guide to data science using ML.NET, OpenAI, and Semantic Kernel
Matt Eland
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Microprediction: Building an Open AI Network
From Everand
Microprediction: Building an Open AI Network
Peter Cotton
No ratings yet
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
From Everand
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
Eileen McNulty-Holmes
4/5 (5)
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
ASP.NET Core 1.0 High Performance
From Everand
ASP.NET Core 1.0 High Performance
James Singleton
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
NETFLIX: THE SECRET OF THE ALGORITHM THAT ANTICIPATES CULTURAL TRENDS EVEN BEFORE CONSUMERS
From Everand
NETFLIX: THE SECRET OF THE ALGORITHM THAT ANTICIPATES CULTURAL TRENDS EVEN BEFORE CONSUMERS
MAX EDITORIAL
No ratings yet
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

w9b Netflix Prize

Uploaded by

w9b Netflix Prize

Uploaded by

Netflix Prize

2 More methods and the winning ensemble

MLPR:w9b Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

3 The privacy fall-out

MLPR:w9b Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

4 Check your understanding

MLPR:w9b Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.