DSE - Course Outline
DSE - Course Outline
WEEK – 1
ITP
DAY-1
Intro to Python –
Basic Commands - Hello World
Variables
Basic Arithmetic & logical operators (int, float)
Data Types - int, float, strings
Concat, Subset, Position, length etc.
Appreciation of programming using Pseudo Code (Introduction) - If-else, loops (Deck)
DAY-2
List, Tuples, Dictionaries
Indexing
Arithmetic operators
Logical operators
Comparison operators
DAY-3
psuedo codes into programs using Loops and if-else
List Comprehension
Use cases
vs Loops
DAY-4
Understanding the concept of functions
Exploring commonly used built in functions (min, max, sort etc.)
Programming user defined functions
Working with functions with and without arguments
Functions with return items
Understanding lambda functions
Overview of map, reduce and filter functions
DAY - 2
DAY -3
DAY-4
DAY - 2
Univariate
Visualising and understanding the data at hand, Summary statistics Moments, distributions
Data transformation – z-score, normalisation,
Label Encoding, One hot encoding,Replacing data.
Coefficient of correlations, Skewness and kurtosis.
Scaling Vs Normalization
DAY - 3
Bivariate
Feature to feature relationships
Correlation and Frequency tables
Seasonality and looking at trended data
Multi variate analysis
DAY - 4
Wrangling
Various ways of treating missing value’s / Missing value Treatment?
Various ways of outlier treatment?
Data Imbalance treatment
Feature engineering, Introduction to Test and Train
- 2 Normal Formal
ND
- 3 Normal Formal
RD
Types of Constraints.
DAY -2
Functions in SQL.
Types of Single Row Functions
Explain how to use single-row functions in SQL
Explain how to use group functions in SQL
In Class Lab Exercise: - Sql Single-Row Functions, Group Functions Expressions.
Take Home Lab Exercise: - Sql Single-Row Functions, Group Functions Expressions.
DAY-3
DAY-4
Subqueries Within Subqueries using any clause, using ‘ALL’ clause, ‘IN’, ‘NOT IN ‘ clause ,
Operator.
Joining sub-queries.
DAY - 2:
Introduction to hypothesis testing
Defining a null and alternate hypothesis
Types of alternate hypothesis - One tail vs two tail test
Type 1 and type 2 error
Hypothesis testing applications using the z test
Interpreting test results
P-value vs confidence interval approach
Testing hypothesis for sample - the t distribution
Testing joint hypothesis - One-way ANOVA and F-test
DAY - 3:
Examining causal relationship between variables
Introduction to the concept of regression
Review - Equation of a straight line
Visualizing regression line as an average line
Error variance and minimization
The methodology behind OLS estimation
Linear regression with a single independent variable
Fitting a regression line to a dataset
Sample vs Population regression function
Hypothesis testing in the context of regression
Interpreting ANOVA for a regression model
DAY - 4:
Extending linear regression to more than one independent variable
Assumptions in linear regression –
- Multicollinearity - effects and detection, VIF
- Autocorrelation - Why is it a problem, Durbin-Watson test
- Heteroscedasticity - What to do when error terms are unequally distributed
Hypothesis testing in Multiple regression
Testing individual coefficients vs joint hypothesis
ANOVA for multiple regression - F test
The coefficient of determination – R^2
The adjusted R^2
Interpreting summary results after fitting a regression model
Regression + Feature Engineering
Day 1: -
Introduction to machine learning
Supervised vs unsupervised learning
Looking at regression through the perspective of machine learning
Accuracy scores as a metric of model performance
Measuring the importance of individual variables in a regression model
Review - testing for individual significance vs joint significance
Using the adjusted R^2 to compare model with different number of independent variables
Approaches to feature selection
Forward and backward selection
Parameter tuning and Model evaluation
Day 2: -
Extending linear regression
Data transformations and normalization
Log transformation of dependent and independent variables
Case study: -
Dealing with categorical independent variables
One hot encoding vs dummy variable regression
Case study on linear regression
Day 3: -
Modelling probabilistic dependent variables
The sigmoid function and odds ratio
The concept of logit
The failure of OLS in estimating parameters for a logistic regression
Introduction to the concept of Maximum likelihood estimation
Advantages of the maximum likelihood approach
Modelling a logistic regression problem with a case study
Making predictions and evaluating parameters
Day 4: -
Extending the logistic model to multi-class predictions
The one vs all approach
Multiclass classification
Case study on Logistic regression –
Binary classification and Multiclass one vs all
SUPERVISED LEARNING CLASSIFICATION
DAY - 1:
DAY - 2:
CART - Extending decision trees to regressing problems.
Advantages of using CART.
Multiple decision trees - Introduction to the Random forests algorithm.
The KNN Algorithm and its workings
Choosing the optimum number of neighbour’s. Iterating over various values for K .
Measures of distance used in KNN.
Evaluating a KNN Model - Accuracy metrics and k-fold cross validation.
DAY - 3:
DAY - 4:
Using the 2 classification case studies and working through all the classication algorithms i.e.,
CART, KNN and Naive Bayes Algorithms sequentially.
Comparing and evaluating various classification algorithms.
Feature selection for classification algorithms.
Advantages and pitfalls of each algorithm. When to use which type of model?
UNSUPERVISED LEARNING
DAY - 1:
DAY - 2:
DAY - 3:
DAY - 4: