0% found this document useful (0 votes)
103 views9 pages

Fall 2023 DS-GA 1002 Probability and Statistics (Website)

Uploaded by

musa.iwantout
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views9 pages

Fall 2023 DS-GA 1002 Probability and Statistics (Website)

Uploaded by

musa.iwantout
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Fall 2023 DS-GA 1002 Probability and

Statistics for Data Science


Course Information

This required course for the MS in Data Science should be taken in the first year of
study. It covers fundamental concepts in probability and statistics from a data-science
perspective.

Syllabus

Probability, conditional probability, independence, conditional independence, the Monte


Carlo method, discrete random variables, the probability mass function, the Bernoulli,
binomial, geometric and Poisson distributions, maximum likelihood, parametric vs
nonparametric models, continuous random variables, the cumulative distribution
function and quantiles, probability density, nonparametric density estimation, the
exponential distribution, the Gaussian distribution, inverse-transform sampling,
multivariate discrete random variables, causal inference (confounding factors and
Simpson's paradox), the curse of dimensionality, naive Bayes, Markov chains,
multivariate continuous random variables, Gaussian random vectors, discrete and
continuous random variables, Bayesian parametric modeling, the mean, the mean
square, the variance, the conditional mean, the average treatment effect.

Grading policy

Homework (40%) + Midterm (30%) + Final (30%)

Up to 5% extra credit may be awarded for participation on Ed Discussion at the


discretion of the instructors. This extra credit will only be applied at the end of the
semester if it makes a difference in the final letter grade.

Exams
Midterm

The midterm will take place on Friday October 27 in the same room as the lecture at
11:30 am (instead of noon). The material included in the midterm is up to and including
the first part of Multiple discrete variables, covered on October 13. The second part
of Multiple discrete variables is not included. You will be allowed to bring 2 sheets of A4
paper with formulas or whatever else you want to write on them. We recommend you
hold on to them after the midterm because you can also bring them to the final (together
with 2 extra sheets). No other material is allowed. Pocket calculators are allowed. No
other electronic device is allowed (including cellphones). Practice problems with
solutions are posted on Brightspace (Schedule section). Homework 6 will be due on
November 5, so you can do it after the midterm.

Homework

Homework will be posted each Friday and will be due on Gradescope 10 days later on
Sunday at 11 pm. Late homework will not be allowed. The assignment with the worst
grade will be dropped to account for eventualities. By policy we never give personal
extensions, unless for a justified medical reason.

We encourage you to discuss the homework with your peers, but make sure that you
write your assignment yourself. Always explain your thought process. If you use results
from the notes or a book, please reference them adequately.

Lecture

Lecture material is provided through videos and notes posted in the Resources section,
as well as through an in person lecture that will be recorded. Please read the notes
before the lecture!

In person lecture:

Friday 12:00pm-1:40pm
Location: Meyer, Room 121
Recitations

Monday 5:55pm-6:45pm
Location: Meyer, Room 121

Office hours

Instructor:
Tuesday 2pm
Location: Room 206 at CDS, 60 5th Ave
Section Leaders:
(Taro taro@nyu.edu)
Wednesday 9am
Location: Virtual (resuming in-person next week)
(Jack jackzhu@nyu.edu)
Wednesday 4pm
Location: Room 244 at CDS, 60 5th Ave

Ed Discussions

Ask all your questions on Ed Discussions instead of through email, and try to answer
other students' questions. We will reward active participation!

We won't be monitoring Ed Discussions on Sunday, so please make sure you ask


any questions you have about the homework in advance.

References

We will follow the lecture notes posted in the Resources section. Videos covering the
material are also posted there.

Additional References (not required)

A First Course in Probability by Ross


Introduction to Probability by Bertsekas and Tsitsiklis
Introduction to Probability by Blitzstein and Hwang
Introduction to Mathematical Statistics by Hogg, McKean and Craig
Statistical Inference by Casella and Berger
All of Statistics by Wasserman
Probability and Statistics by DeGroot and Schervish

Schedule, Resources and Assignments


Please read the relevant material before the lecture.

Datasets and references

Week Topic Notes Videos Slides

Probability
Sep 8 Overview Overview
Probability
Probability spaces Probability spaces
(Sections
1.1 and 1.2) Lecture Lecture

Sep Probability:
15 conditional probability, Probability Conditional Conditional
estimating probability probability
probabilities from (Remaining
data, independence, sections) Estimating Estimating
conditional probabilities from probabilities from
independence, the data data
Monte Carlo method
Independence Independence

Conditional Conditional
independence independence

The Monte Carlo The Monte Carlo


method method

Lecture Lecture
Sep Discrete variables:
22 discrete random Discrete Overview Overview
variables, the variables
probability mass Mathematical Mathematical
function, the definition definition
Bernoulli, binomial,
Probability mass Probability mass
geometric and
function function
Poisson distributions,
maximum likelihood, Binomial, Binomial, geometric,
parametric vs geometric, Poisson Poisson
nonparametric
models Maximum Maximum likelihood
likelihood
Parametric vs
Parametric vs nonparametric
nonparametric models
models
Lecture
Lecture (last year's
because I forgot to record,
sorry!)

Sep Continuous
29 variables: continuous Continuous Overview Overview
random variables, the variables
cumulative Mathematical Mathematical
distribution function (Sections definition of definition of
and quantiles, 3.1-3.5) continuous random continuous random
probability density, variables variables
functions of random
The cumulative The cumulative
variables,
distribution distribution function
nonparametric density
function
estimation The probability
The probability density function
density function
Functions of
Functions of continuous random
continuous random variables
variables
Nonparametric
Nonparametric density estimation
density estimation
Lecture
Lecture

Oct 6 Continuous
variables: The Continuous The exponential The exponential
exponential variables distribution distribution
distribution, the
Gaussian distribution, (Sections The Gaussian The Gaussian
ML estimation, 3.6-3.8) distribution distribution
inverse-transform
Maximum-likelihoo Maximum-likelihood
sampling
d estimation estimation

Inverse-transform Inverse-transform
sampling sampling

Lecture Lecture

Oct Multiple discrete


13 variables: Multiple Overview Multivariate discrete
multivariate discrete discrete random variables
random variables, variables Multivariate
marginal and (Sections discrete random Marginal
conditional 4.1-4.6) variables distributions
distributions,
Marginal Conditional
independence and
distributions distributions
conditional
independence, causal Conditional Causal inference:
inference distributions potential outcomes,
confounders,
Causal inference: Simpson's paradox
potential
outcomes, Lecture
confounders,
Simpson's paradox

Lecture
Multiple discrete
Oct variables: the curse Multiple The curse of The curse of
20 of dimensionality, discrete dimensionality and dimensionality and
naive Bayes, Markov variables naive Bayes naive Bayes
chains (Sections
4.7-4.9) Markov chains Markov chains

Convergence of Convergence of
Markov chains Markov chains

Lecture

Oct Midterm
27

Nov 3 Multiple continuous


variables: Multiple Multiple Multiple continuous
multivariate continuous continuous variables (overview)
continuous random variables variables
variables, joint cdf (overview) Multivariate
and pdf, marginal and continuous random
conditional Multivariate variables
distributions, continuous random
variables Joint pdf
independence and
conditional Marginal
Joint pdf
independence, distributions
simulating a joint Marginal
distribution, Gaussian distributions Conditional
random vectors distributions
Conditional
distributions Simulating multiple
random variables
Simulating multiple
random variables Gaussian random
vectors, marginal
Gaussian random and conditional
vectors, marginal distributions
and conditional
distributions Lecture
Nov Discrete and
10 continuous Discrete and Discrete and Discrete and
variables: discrete continuous continuous random continuous random
and continuous variables variables variables
random variables,
Bayesian parametric (Sections Dependence and Dependence and
modeling 6.1, 6.2, 6.3, independence independence
6.4, 6.7)
Bayesian models Bayesian models

How not to predict How not to predict


an election an election

Lecture

Nov Averaging: the mean,


17 properties of Averaging The mean The mean
expectation, the mean
and the median, the (Sections Sensitivity to Sensitivity to
mean square, the 7.1-7.7) extreme values extreme values
variance
Properties Properties

Mean of Mean of parametric


parametric distributions
distributions
The variance,
The variance, standard deviation
standard deviation and mean square
and mean square
Lecture

Dec 1 Averaging: the


conditional mean, Averaging The conditional The conditional
iterated expectation, mean function mean function
regression, estimation (Sections
of causal effects 7.8 and 7.9) The conditional The conditional
mean and iterated mean and iterated
expectation expectation

Regression Regression

Average treatment Average treatment


effect effect
Lecture

Dec 8
Additional topics (not Discrete and Classification via Classification via
included in final): continuous Gaussian Gaussian
classification via variables discriminant discriminant
Gaussian discriminant analysis analysis
analysis, clustering (Sections
via Gaussian mixture 6.5, 6.6) Clustering via Clustering via
models Gaussian mixture Gaussian mixture
models models

Lecture

Final
Dec
15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy