DataMining Fall2020
DataMining Fall2020
Office hours: Thursday mornings New York time. Sign up for a slot on
https://calendar.google.com/calendar/selfsched?sstoken=
UUM0UUpEc0pMRjlWfGRlZmF1bHR8ZmE4YzUzYmQ4NmQyYjk0ZWM3MmM2ZmYwODZhNjgzNzM
Course Description
The class is roughly divided into two parts:
1. programming best practices, exploratory data analysis (EDA), and unsupervised learning
Prerequisites
Any QMSS student is presumed to have sufficient background. Any non-QMSS students interested in taking this
course should have sufficient background in quantitative methods.
Grading
• 20% homeworks (done in pairs)
1
Books
• Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, 2013, Introduction to Statistical Learning
with Applications in R, Springer-Verlag. Available from http://www.columbia.edu/cgi-bin/cul/
resolve?clio10415714.
• Garrett Grolemund and Hadley Wickham, 2016, R for Data Science, O’Reilly. Available from http://r4ds.
had.co.nz/.
• Max Kuhn and Kjell Johnson, Applied Predictive Modeling, 2013, Springer. Available from http://www.
columbia.edu/cgi-bin/cul/resolve?clio10413027.
CampusWire
CampusWire is a beta version of a tool that is available https://campuswire.com/p/G8030F3A2 using code 2982. Make
sure to sign up for the 2020 version of the course. Rather than emailing questions directly to the professor or TAs,
you should post on CampusWire. That way, other students can answer your question, benefit from an answer that the
professor or TA provides, ask follow-up questions, etc. There is also Reddit-style upvoting and the statistics collected
by CampusWire go into the participation portion of your grade. Students should not ask questions in office hours that
have not first be posted on CampusWire.
If your question pertains to an ongoing homework assignment, your grades, or similar, then you should click on
the option to make your post only visible to “Instructors and TAs”. Otherwise, you should post to “Everyone in the
class” and avoid direct messaging the instructor and TAs. There is an option to post in Stealth Mode, in which case no
one will know it was you that asked the question, but doing so obviously cannot count toward the class participation
component of your course grade.
There are Notification options under User Settings (click on your picture in the bottom left) where you can control
how often you receive emails about activity on CampusWire. You can turn some or all of those off but are still
responsible for reading posts by other students.
Outline
The following outline describes the topics that will be covered along with anticipated associated readings.
Week 2: Introduction to R
• Grolemund and Wickham, chapters 1, 2, 4, 26, 27, 29, 30
Week 3: Intermediate R
• Grolemund and Wickham, chapters 5, 6, 9, 10, 11, 12, 15
2
• APM chapter 3 (excluding section 3.3)
• ISLAR, chapter 10
• Shira Mitchell, Eric Potash, Solon Barocas (2018) “Prediction-Based Decisions and Fairness: A Catalogue of
Choices, Assumptions, and Definitions” arXiv:1811.07867 Available here