Specilization in Data Science
Specilization in Data Science
L T P C
SDS301 INFORMATION MANAGEMENT
3 0 0 3
Contents Hours
Unit 1 DATABASE MODELLING, MANAGEMENT AND DEVELOPMENT
Database design and modelling - Business Rules and Relationship; Java
database Connectivity (JDBC), Database connection Manager, Stored 8
Procedures. Trends in Big Data systems including NoSQL - Hadoop
HDFS, MapReduce, Hive, and enhancements.
Unit 2 DATA SECURITY AND PRIVACY
Program Security, Malicious code and controls against threats; OS level
8
protection; Security – Firewalls, Network Security Intrusion detection
systems. Data Privacy principles. Data Privacy Laws and compliance.
Unit 3 INFORMATION GOVERNANCE
Master Data Management (MDM) – Overview, Need for MDM, Privacy,
8
regulatory requirements and compliance. Data Governance –
Synchronization and data quality management.
Unit 4 INFORMATION ARCHITECTURE
Principles of Information architecture and framework, Organizing
8
information, Navigation systems and Labelling systems, Conceptual
design, Granularity of Content.
Unit 5 INFORMATION LIFECYCLE MANAGEMENT
Data retention policies; Confidential and Sensitive data handling, lifecycle
management costs. Archive data using Hadoop; Testing and delivering
big data applications for performance and functionality; Challenges with
data administration.
Suggested Readings :
1. Data Science For Cyber-security, by Adams Niall M, Heard Nicholas A, Rubin-delanchy Patrick,
Turcotte Mellisa
2. Research Methods for Cyber Security, by Thomas W. Edgar, David O. Manz
3. Cybersecurity: The Beginner's Guide: A comprehensive guide to getting. by Erdal Ozkaya
L T P C
SDS401 SCALABLE DATA SCIENCE
3 0 0 3
Contents Hours
Unit 1 Background: Introduction Probability: Concentration inequalities Linear algebra:
PCA, SVD Optimization: Basics, Convex, GD Machine Learning: Supervised, 8
generalization, feature learning, clustering.
Unit 2 Memory-effi¬cient data structures: Hash functions, universal / perfect hash
families Bloom lters Sketches for distinct count Misra-Gries sketch Statistical 8
Mechanics an overview.
Unit 3 Memory-e¬cient data structures (contd.): Count Sketch, Count-Min Sketch
|Approximate near neighbors search: Introduction, kd-trees etc LSH families, 8
MinHash for Jaccard, SimHash for L2
Unit 4 Randomized Numerical Linear Algebra CUR Decomposition Sparse RP, Subspace
8
RP, Kitchen Sink.
Unit 5 Map-reduce and related paradigms Map reduce - Programming examples -
(page rank, k-means, matrix
Multiplication) Big data: computation goes to data. + Hadoop ecosystem
Suggested Readings:
1. Data Science from Scratch: First Principles with Python, By Joel Grus.
2. Python for Data Science For Dummies, By John Paul Mueller, Luca Massaron
3. Data Analytics , by Anil Maheshwari
L T P C
SDS501 DATA SCIENCE FOR ENGINEERS
3 0 0 3
Contents Hours
Unit 1 Linear algebra for data science (algebraic view - vectors, matrices, product of
matrix & vector, rank, null space, solution of over-determined set of 8
equations and pseudo-inverse)
Unit 2 Linear algebra for data science (geometric view - vectors, distance, projections,
10
eigenvalue decomposition).
Unit 3 Statistics (descriptive statistics, notion of probability, distributions,
8
mean, variance, covariance, covariance matrix).
Unit 4 Optimization; Typology of data Science problems and a solution
framework, Univariate and multivariate linear regression Model assessment 10
(including cross validation).
Unit 5 Verifying assumptions used in linear regression, assessing importance of
different variables, subset selection, Introduction to classification and
9
classification using logistics regression, Classification using various clustering
techniques
Suggested Readings:
1. Data Science and Big Data Analytics: ACM-WIR 2018 (Lecture Notes on Data Engineering and
Communications Technologies) , by Durgesh Kumar Mishra, Xin-She Yang, et al.
2. Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools , by Davy
Cielen, Arno D.B. Meysman,
3. Data Science and Big Data Analytics: ACM-WIR , by Durgesh Kumar Mishra, Xin-She Yang.
Contents Hours
Unit 1 General Overview of Data Mining and its Components Introduction and Data
Mining Process Introduction to RBasic Statistical Techniques. Data Preparation 8
and Exploration Visualization Techniques.
Unit 2 Data Preparation and Exploration Visualization Techniques Dimension
Reduction Techniques Principal Component Analysis, Performance Metrics and 12
Assessment Performance Metrics for Prediction and Classification.
Unit 3 Supervised Learning Methods Multiple Linear Regression, Supervised Learning
Methods Naïve Bayes, Supervised Learning Methods Classification & 8
Regression Trees, Supervised Learning Methods Logistic Regression
Unit 4 Supervised Learning Methods Logistic Regression Artificial Neural Networks.
Supervised Learning Methods and Wrap Up Artificial Neural Networks. 8
Discriminate Analysis Conclusion
Suggested Readings:
1. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data , by EMC
Education Services.
2. Practical Data Science with R Paperback, by Nina Zumel
3. Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools , by Davy Cielen, Arno
D.B. Meysman.
L T P C
SDS701 DATA-VISUALIZATION
3 0 0 3
Contents Hours
Unit 1 Overview of Data Visualization, Introduction to Web Technologies
Why Visualize Data, Introduction to SVG and CSS, Introduction to 10
JavaScript, Introduction to VizHub, Making a Face with D3.
Unit 2 The Shapes of Data, Marks and Channels
Input for Visualization: Data and Tasks, Loading and Parsing Data with
D3.js, Encoding Data with Marks and Channels, Rendering Marks and 12
Channels with D3.js and SVG, Introduction to D3 Scales, Creating a
Scatter Plot with D3.
Unit 3 Common Visualization Idioms and Visualization of Spatial Data,
Networks, and Trees
Reusable Dynamic Components using the General Update Pattern:
Reusable Scatter Plot, Common Visualization Idioms with D3.js, Bar
10
Chart, Vertical & Horizontal, Pie Chart and Coxcomb Plot, Line Chart,
Area Chart.
Unit 4 Using Color and Size in Visualization
Encoding Data using Color, Encoding Data using Size, Stacked &
Grouped Bar Chart, Stacked Area Chart & Stream graph, Line Chart with
8
Multiple Lines.
Unit 5 Interaction Techniques and Multiple Linked Views
Adding interaction with Unidirectional Data Flow, Using UI elements to
control a scatter plot, Panning and Zooming on a Globe, Adding tooltips,
Small Multiples, Linked Highlighting with Brushing, Linked Navigation:
Bird's Eye Map.
Suggested Readings :
1. Data Science and Big Data Analytics: ACM-WIR , by Durgesh Kumar Mishra, Xin-She Yang.
2. Practical Data Science with R Paperback, by Nina Zumel
3. Data Science from Scratch: First Principles with Python, By Joel Grus.