Fall 2023 DS-GA 1002 Probability and Statistics (Website)
Fall 2023 DS-GA 1002 Probability and Statistics (Website)
This required course for the MS in Data Science should be taken in the first year of
study. It covers fundamental concepts in probability and statistics from a data-science
perspective.
Syllabus
Grading policy
Exams
Midterm
The midterm will take place on Friday October 27 in the same room as the lecture at
11:30 am (instead of noon). The material included in the midterm is up to and including
the first part of Multiple discrete variables, covered on October 13. The second part
of Multiple discrete variables is not included. You will be allowed to bring 2 sheets of A4
paper with formulas or whatever else you want to write on them. We recommend you
hold on to them after the midterm because you can also bring them to the final (together
with 2 extra sheets). No other material is allowed. Pocket calculators are allowed. No
other electronic device is allowed (including cellphones). Practice problems with
solutions are posted on Brightspace (Schedule section). Homework 6 will be due on
November 5, so you can do it after the midterm.
Homework
Homework will be posted each Friday and will be due on Gradescope 10 days later on
Sunday at 11 pm. Late homework will not be allowed. The assignment with the worst
grade will be dropped to account for eventualities. By policy we never give personal
extensions, unless for a justified medical reason.
We encourage you to discuss the homework with your peers, but make sure that you
write your assignment yourself. Always explain your thought process. If you use results
from the notes or a book, please reference them adequately.
Lecture
Lecture material is provided through videos and notes posted in the Resources section,
as well as through an in person lecture that will be recorded. Please read the notes
before the lecture!
In person lecture:
Friday 12:00pm-1:40pm
Location: Meyer, Room 121
Recitations
Monday 5:55pm-6:45pm
Location: Meyer, Room 121
Office hours
Instructor:
Tuesday 2pm
Location: Room 206 at CDS, 60 5th Ave
Section Leaders:
(Taro taro@nyu.edu)
Wednesday 9am
Location: Virtual (resuming in-person next week)
(Jack jackzhu@nyu.edu)
Wednesday 4pm
Location: Room 244 at CDS, 60 5th Ave
Ed Discussions
Ask all your questions on Ed Discussions instead of through email, and try to answer
other students' questions. We will reward active participation!
References
We will follow the lecture notes posted in the Resources section. Videos covering the
material are also posted there.
Probability
Sep 8 Overview Overview
Probability
Probability spaces Probability spaces
(Sections
1.1 and 1.2) Lecture Lecture
Sep Probability:
15 conditional probability, Probability Conditional Conditional
estimating probability probability
probabilities from (Remaining
data, independence, sections) Estimating Estimating
conditional probabilities from probabilities from
independence, the data data
Monte Carlo method
Independence Independence
Conditional Conditional
independence independence
Lecture Lecture
Sep Discrete variables:
22 discrete random Discrete Overview Overview
variables, the variables
probability mass Mathematical Mathematical
function, the definition definition
Bernoulli, binomial,
Probability mass Probability mass
geometric and
function function
Poisson distributions,
maximum likelihood, Binomial, Binomial, geometric,
parametric vs geometric, Poisson Poisson
nonparametric
models Maximum Maximum likelihood
likelihood
Parametric vs
Parametric vs nonparametric
nonparametric models
models
Lecture
Lecture (last year's
because I forgot to record,
sorry!)
Sep Continuous
29 variables: continuous Continuous Overview Overview
random variables, the variables
cumulative Mathematical Mathematical
distribution function (Sections definition of definition of
and quantiles, 3.1-3.5) continuous random continuous random
probability density, variables variables
functions of random
The cumulative The cumulative
variables,
distribution distribution function
nonparametric density
function
estimation The probability
The probability density function
density function
Functions of
Functions of continuous random
continuous random variables
variables
Nonparametric
Nonparametric density estimation
density estimation
Lecture
Lecture
Oct 6 Continuous
variables: The Continuous The exponential The exponential
exponential variables distribution distribution
distribution, the
Gaussian distribution, (Sections The Gaussian The Gaussian
ML estimation, 3.6-3.8) distribution distribution
inverse-transform
Maximum-likelihoo Maximum-likelihood
sampling
d estimation estimation
Inverse-transform Inverse-transform
sampling sampling
Lecture Lecture
Lecture
Multiple discrete
Oct variables: the curse Multiple The curse of The curse of
20 of dimensionality, discrete dimensionality and dimensionality and
naive Bayes, Markov variables naive Bayes naive Bayes
chains (Sections
4.7-4.9) Markov chains Markov chains
Convergence of Convergence of
Markov chains Markov chains
Lecture
Oct Midterm
27
Lecture
Regression Regression
Dec 8
Additional topics (not Discrete and Classification via Classification via
included in final): continuous Gaussian Gaussian
classification via variables discriminant discriminant
Gaussian discriminant analysis analysis
analysis, clustering (Sections
via Gaussian mixture 6.5, 6.6) Clustering via Clustering via
models Gaussian mixture Gaussian mixture
models models
Lecture
Final
Dec
15