0% found this document useful (0 votes)
39 views10 pages

CU MSDS All Semesters Syllabus

The Master's degree program in Data Science at Chandigarh University is a two-year online program that provides theoretical and practical skills in data engineering and analysis. The curriculum covers a wide range of topics, including programming, statistics, machine learning, and data visualization, across four semesters with a total of 80 credits. Students will engage with industry professionals and subject matter experts to prepare for careers in data science.

Uploaded by

khuddush89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views10 pages

CU MSDS All Semesters Syllabus

The Master's degree program in Data Science at Chandigarh University is a two-year online program that provides theoretical and practical skills in data engineering and analysis. The curriculum covers a wide range of topics, including programming, statistics, machine learning, and data visualization, across four semesters with a total of 80 credits. Students will engage with industry professionals and subject matter experts to prepare for careers in data science.

Uploaded by

khuddush89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

COURSE GRID – MSC DATA SCIENCE

The Master's degree program in Data Science offered by Chandigarh University is a two-year online higher
education program. It has been uniquely positioned to provide you with theoretical knowledge and practical
exposure to the fast- transforming field of Data Science.

The courses are designed to deliver in-depth and most-relevant skills in conventional areas of data engineering
and analysis such as in businesses, finances, and industrial and scientific research, as well as in relatively newer
areas such as in social data analysis, big data, and more. So, whether you want to be a data engineer who
handles data in its most raw form, a data scientist who works with different types of data models, or a data
analyst who derives and interprets new insights from data; this is the program for you. Our subject matter
experts and industry professionals will keep you engaged with their domain knowledge, relatable examples,
simulations, and practical projects. This will guide you through the next two years and groom you into a well-
trained data science professional. Are you ready to get started?

Semester I Semester II
Subjects Credits Subjects Credits
Calculus and Linear Algebra for
Python Programming 4 4
Data Scientists
Applied Probability and Statistics 4 Data Analysis and Visualization 4
Communication and Soft Skills 3 Machine Learning 4
SQL Programming 4 Advanced Machine Learning 4
Advanced Database Management
3 Deep Learning 4
Systems
Total 18 Total 20

Semester III Semester IV

Subjects Credits Subjects Credits


Optimization 4 Natural Language Processing 4
Java Programming 3 Data Engineering 4
Data Structures and Algorithms 5 Data Mining and Warehousing 4
Web Technologies 4 Applied Business Analytics 4
Cloud Native Development 4 Major Project 4
Minor Project (Software Dev) 2
Total 22 Total 20

Total No. of credits for the Program 80


Outline for Semester I Courses
● Python Programming (4 Credits)
o Introduction to Python: Basics, Data Structures, Control Structures, Functions
o Object Oriented Programming (OOP) in Python: Understand the basics of OOP, Main
features of OOP, Concept of Exceptional Handling, Intro to regular expression, Intro to
lambda function, Intro to generator function and file handling
o Python for Data Science: Numpy, Pandas
o Programming in Python-I: Basic Coding, Lists, Strings, Other Data Structures
o Programming in Python-II: Time Complexity, Searching, Sorting, Two pointers, Recursion
Significance:
o Understand the syntax, Data Structures, Functions, OOP features used in Python
Programming
o Implement file handling, various data structures and algorithms with Python
o Learn about the different libraries associated with Python

● Applied Probability and Statistics (4 Credits)

The following topics will be covered:


○ Data in Python as data tables and data frames
○ Deriving statistics from data and interpreting the meaning of various statistical measures
○ Basic visualizations such as bar plots and histograms for the given data and meaningful
insights derived from these charts
○ Categorical, discrete and continuous features
○ Calculating important statistics for the given data such as mean, median, mode, standard
deviation and various quantiles
○ Distributions of data and their basic properties such as skewness and centering
○ Probability and various properties associated with it
○ Various common types of probability distributions
○ Sampling and the various types of sampling methods
○ Creating hypotheses for a given problem statement in order to test them
○ Various metrics used to judge the validity of a hypothesis such as p-values
○ Calculating confidence intervals for various metrics derived from the given data
○ Various types of hypothesis tests such as t-test and A/B tests
Tools used:
○ Python
Significance:
○ As a data scientist, you will be required to work with large amounts of data and derive
meaningful conclusions from them. Statistics help to understand which types of statistical
measures to use in various situations. These measures will allow you to test certain claims
through the process of statistical inference, in which you will frame hypotheses and then
test them. Commented [1]: @alfiya.shaikh@upgrad.com - We
can use this format for syllabus note for each
module/course
● Communication and Soft Skills (3 Credits) CC: @mahima.prasad@upgrad.com
○ Effective Communications Skills- Basics of Communication, Effective Listening, Reading, _Assigned to Deleted user_
Writing and Speaking, Interpersonal Skills and EQ Commented [2R1]: @sudarshanan.raghavan@upgrad.
○ Corporate Communication- Basics of Business Communication, Business Meetings, Basic and com - Hope it is similar to the one shared by Prof
Kislaya.
Complex Interpersonal Communication, Adapting to changing scenarios _Reassigned to Sudarshanan Raghavan_
○ Building your personal brand- Networking, Dos and Don'ts of Networking, Making a lasting Commented [3R1]: I think
First Impression, Presentation Skills @mahima.prasad@upgrad.com should have a look
once. Though I feel it shouldn't be that different.
Significance: _Reassigned to M Prasad_
○ Developing logical thinking and analytical skills to critically analyse problems
○ Choosing the appropriate ‘quality’, ‘quantity’, ‘relevance’, and ‘manner’ of communication
○ Internalise the impact of cultural differences on the communication process
○ Develop skills of communication in writing effective reports and emails
○ Employ verbal and non-verbal aspects of communication during calls, meetings and
presentations
○ Deliver ‘high-impact’ communication with ‘internal’ and ‘external’ stakeholders during crisis
situations

● SQL Programming (4 Credits)


The following topics will be covered:
○ Why databases and types of databases
○ RDBMS and its features
○ Basics of SQL
○ Aggregate and inbuilt functions
○ Joins
○ Three level architecture and ER diagrams
○ Keys and subqueries
○ Normalization
○ PL/SQL basics
○ Procedures and functions
○ Cursors
○ Exceptions
○ Triggers

● Advanced Database Management Systems (3 Credits)


The following topics will be covered:
○ Big Data
○ NoSQL
○ MongoDB
○ HBase
○ Cassandra
○ Neo4j

Outline for Semester II Courses


● Calculus and Linear Algebra for Data Scientists (4 Credits)
The following topics will be covered:
○ Data tables in terms of vectors and matrices and various mathematical operations
performed on them to derive meaningful insights
○ Linear transformations and their impact on vectors
○ The need for eigendecomposition and how it helps to simplify the process of applying
multiple linear transformations on vectors
○ Various types of functions that are used to model different types of real world data
○ Finding derivatives of functions and interpreting the meaning of the value of the derivatives
○ Calculating maxima and minima of various functions using derivatives
○ Calculating the area bounded by curves using integration
○ Multivariable functions and the common ways of visualising their graphs
○ Calculating the partial derivatives of a multivariable function and interpreting the meaning
of their values
○ Understanding how the gradient descent algorithm works using multivariable calculus
○ Understanding how principal components analysis works using linear algebra
Tools used:
○ Python
Significance:
○ As a data scientist, you will be required to work with large amounts of data, usually in
tabular formats. Hence, it is easier to treat the data as linear algebraic objects and work with
them as such. Techniques such as dot products, norms and transformations become more
interpretable in this way and can help in deriving insights from the data. Also, various
machine learning models have linear algebraic systems as their backbone. Hence, it will aid
your understanding of the models.

● Data Analysis and Visualization (4 Credits)


The following topics will be covered:
○ Various types of data visualizations such as bar charts, histograms, line charts, pie charts,
scatter plots, box plots, donut charts, pareto charts, control charts, level-of-detail expression
charts, motion charts, bullet charts, Gantt charts, hexbin charts and other commonly used
visualizations
○ Working with data using Excel and performing basic analyses and visualizations on it
○ Working with Python’s data visualization libraries to perform analyses on data
○ Working with Tableau and conducting data analysis and visualization
○ Working with Power BI and conduct data analysis and visualization
○ Interpreting meaningful insights derived from these visualizations
○ Advantages and disadvantages of these data visualization tools
○ Dashboards and other visualizations using these tools
○ Report generation using these tools
○ Best practices when using these tools
Tools used:
○ Excel, Python, Tableau, Power BI
Significance:
○ As a data scientist, you will be required to derive insights from a variety of large amounts of
data. This requires analyzing the data, both statistically and visually. In many cases,
visualizations help summarize a good portion of the data analysis. In general, one of the
tasks of a data scientist is to visualize complex data tables and derive meaningful conclusions
from the analysis, which then can be shared with colleagues and clients. You will also be
required to perform these tasks on a daily basis, sometimes with a wide range of
visualization tools. Hence, a strong working knowledge of data visualization tools is required.

● Machine Learning (4 Credits)


The following topics will be covered:
○ Fundamental concepts behind machine learning
○ Supervised and unsupervised machine learning models
○ Feature engineering on given data to prepare it for machine learning
○ Creating linear regression models for the given data and interpreting the results
○ Creating logistic regression models for the given data and interpreting the results
○ Various clustering techniques such as K-Means, BSCAN and Hierarchical Clustering
○ Choosing the appropriate machine learning model for the given data
○ Assessing the performance of these machine learning models in terms of various metrics
such as precision, accuracy, recall, mean absolute error and root mean squared error
○ Understanding how to approach a real-world problem based on insights gained from the
data analysis and the machine learning tasks
Tools used:
○ Python
Significance:
○ As a data scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. Machine learning is the most commonly used technique in
today’s world to perform such automated tasks of drawing inferences and relationships in
the data. These techniques help in predicting critical values beforehand and also help in
classifying entities into similar groups, which can then be used to perform further analysis.

● Advanced Machine Learning (4 Credits)


The following topics will be covered:
○ Creating decision tree models for the given data and interpreting the results
○ Creating random forest models for the given data and interpreting the results
○ Differentiating between model parameters and model hyperparameters
○ Model selection and model evaluation
○ Variations of linear regression models such as LASSO and Ridge models
○ Generalized linear regression models
○ Model enhancement techniques such as bagging and boosting
○ Creating boosted decision tree models for the given data and interpreting the results
○ Creating bootstrap aggregated random forest models for the given data and interpreting the
results
○ Common variations of boosting methods such as adaptive boosting and gradient boosting
○ Time series analysis
Tools used:
○ Python
Significance:
○ As a Data Scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. Machine learning is the most commonly used technique in
today’s world to perform such automated tasks of drawing inferences and relationships in
the data. Furthermore, it is important for a data scientist to be able to differentiate between
various machine learning models and also to evaluate them fairly. Hence, advanced machine
learning concepts are critical to being a successful data scientist.

● Deep Learning (4 Credits)


The following topics will be covered:
○ Fundamental concepts behind artificial neural networks
○ Structure of artificial neural networks
○ Working methodology behind artificial neural networks including feed-forward and
backpropagation
○ Hyperparameter tuning for artificial neural networks
○ Common modifications of artificial neural networks such as convolutional neural networks
and recurrent neural networks
○ Creating CNNs for the given data and interpret the results
○ Style transfer and object detection
○ Variations of RNNs such as bidirectional RNNs and long short term memories
○ Creating RNNs for the given data and interpret the results
Tools used:
○ Python
Significance:
○ As a data scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. Machine learning is the most commonly used technique in
today’s world to perform such automated tasks of drawing inferences and relationships in
the data. Artificial neural networks are a special type of machine learning model that draw
inspiration from the functioning of the human brain. In many cases, the traditional machine
learning methods fall short and artificial neural networks tend to perform better. Hence, in
today’s world, it is important for a data scientist to have a strong working knowledge of
artificial neural networks.

Outline for Semester III Courses


● Optimization (4 Credits)
The following topics will be covered:
○ Fundamental concepts behind optimization such as objective functions and decision
variables
○ Various types of optimization problems such as the warehouse problem, the assignment
problem and the knapsack problem
○ Understanding the various constraints in optimization problems
○ Fundamental concepts behind linear programming
○ Creating solutions for linear programming problems using the simplex method and its
variants
○ Creating network analyses for optimization problems and interpreting the results
Tools used:
○ Excel, Python
Significance:
○ As a data scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. One of the most commonly used techniques in business analysis
is operations research. It is important for a data scientist to be able to understand the
business framework behind many common real-world problems such as the warehouse
problem and the assignment problem. Hence, it is important for a data scientist to
understand the various optimization techniques used on business data and solve the various
business problems.

● Java Programming (3 Credits) Commented [4]: @maganjot.singh@upgrad.com kindly


help me add highlights on what will be covered in this
course. 5-6 bullet points will work. Thnanks
● Data Structures and Algorithms (5 Credits) _Assigned to Deleted user_

● Web Technologies (4 Credits)


● Basics of Computer Networks - Socket Programming, LAN, IP, Transport Layer and other concepts
● HTML, CSS and Bootstrap: Basics and its applications
● Basic Javascript: Variables and Datatypes, Conditionals and Loops, Arrays, Objects and Functions
● ES-6 and DOM Manipulation: Intro to next-gen javascript, Intro to DOM, Making websites
responsive, Data Storage and Libraries, Graded Questions
● Advanced Javascript: Basics
● React.js-I: Concepts
● Fundamentals of Node.js: Concepts

Learning Objectives

● Understand the basic functionalities of Computer Networks


● Learn about basic front-end technologies such as HTML and CSS
● Understand the applications of HTML and CSS through a front-end project
● Learn the basics of JavaScript
● Understand the relationship between DOM and JavaScript
● Manipulate the DOM object of a webpage
● Understand various advanced concepts of JavaScript
● Understand the meaning of asynchronous programming
● Understand how JavaScript works asynchronously

● Cloud Native Development (4 Credits)


● Introduction to Cloud Computing
● Cloud Platforms and related Services
● Enterprise Applications
● Evolution of Deployment: Inside the Enterprise, the Web, the Data Center / Cloud
● Characteristics and Structure of a Cloud-Native Application
● Services and Microservices
● Microservices Architecture
● Composability and Decomposing solutions into Microservices
● Building Microservices
● Orchestration / Choreography of Microservices
● Shared data and communication
● Patterns and Best Practices for Cloud-Native Development
● Unit Testing Microservices
● Porting Monolithic applications to Microservices
● Overview of DevOps
● Continuous Integration and Continuous Delivery / Deployment
● Deploying Microservices via Containers
● CI/CD Pipeline for microservices and automation
● Deployment platform: automated operation/management of microservices
● Exercises on building applications using microservices
● Exercises on Containerization using Dockers
● Exercises on Kubernetes for deploying and managing microservices
● CI/CD exercises

Learning Objectives

● Understand basics of Cloud Computing and related Services


● Understand how to develop Applications on Cloud
● Understand Microservices Architecture
● Understand best practices in CI/CD pipelining
● Understand how to use Container Technologies
● Learn how to deploy on Cloud

● Minor Project - Software Development (2 Credits)


Design an application using UML, JUnit and Mockito and use TDD methodology to drive its development.
Learning Objectives

● Design an application using UML


● Implement the application using JUnit and Mockito
● Implement additional features using TDD methodology

Outline for Semester IV Courses


● Data Engineering (4 Credits)
● Concepts retailed to distributed computing
● Hadoop Distributed File System
● MapReduce Programming in Python
● Enterprise Data Management
● Relational Database Modelling
● Normal Forms and ER Diagrams
● Concepts of NoSQL Databases
● Introduction to Apache HBase
● HBase Python API
● Comparison of NoSQL Databases
● Spark Architecture
● RDD, DataFrame API,SparkSQL
● Exploratory Data Analysis with PySpark
● Predictive Analysis with Spark MLlib

Learning Objectives

● Understand the concept of distributed data processing


● Understand the methods of distributed storage
● Understand how Hadoop achieves distributed computing and storage
● Write MapReduce jobs in Python
● Understand the concepts of Data Management
● Execute Data Modeling from a Relational Database
● Understand concepts of NoSQL databases
● Understand the working of Apache HBase
● Learn about Apache Spark and how it achieves data processing
● Execute EDA using PySpark
● Execute Predictive Analysis using Spark’s ML Library

● Data Mining and Warehousing (4 Credits)


● Understanding Data: Data and Attributes – Nominal, Ordinal, Interval, and Ratio. Measures of
Similarity and Dissimilarity
● Understanding Datasets: Types of Datasets, Quality of Data in Datasets, Dimensions of data, Data
Pre-processing including dimensionality reduction; Visualization: Review of basic techniques (from
Probability & Statistics), Visualizing Spatial Temporal Data, Visualizing Higher Dimensional Data
● Association Rule Mining: Frequent Itemset Mining- The Apriori Principle and the Apriori Algorithm;
Rule Generation in the Apriori Algorithm; Compact Representation of the Frequent Itemsets; FP-
Growth algorithm and Frequent Itemset generation
● Clustering: Different types of clusters (resulting out of clustering); K-means algorithm,
Agglomerative Hierarchical Clustering, Density-based Clustering, , Subspace Clustering, Graph-
based Clustering, Self-Organizing Maps, Evaluation of Clusters
● Anomaly Detection: Causes of and approaches to detection
● Text Mining: Similarity Computation for Text data; Clustering Methods for text
● 4Vs of Big Data
● Big Data: Industry Case Studies
● Introduction to Data Warehouse and Data Lakes
● Designing Data Warehousing for an ETL Data Pipeline
● Designing Data Lake for an ETL Data Pipeline
● Fundamentals of Apache Hive
● Writing HQL for Data Analysis
● Partitioning and Bucketing with Hive
● Data warehousing with Redshift
● Analyze data with RedShift
● Running Spark on Multi Node cluster
● Spark Memory & Disk optimisation
● Optimising Spark Cluster environment
● Introduction to Apache Flink
● Batch Data Processing with Flink
● Stream Processing with Apache Flink
● SQL API
● Intro to real-time data processing architectures
● Fundamentals of Apache Kafka
● Setting up Kafka Producer and Consumer
● Kafka Connect API & Kafka Streams
● Spark Streaming Architecture
● Spark Streaming APIs
● Building Stream Processing Application with Spark
● Comparison between Spark Streaming and Flink
● Fundamentals of Airflow
● Workflow Management with Airflow
● Automating an entire Data Pipeline with Airflow

Learning Objectives

● Understand the meaning of big data and its various characteristics


● Classify data as big data based on its determining factors
● Learn about the various sources of big data
● Learn about the wide range of big data applications in different industries such as retail, healthcare,
and finance
● Understand the intricacies behind designing a data warehouse and a data lake for use case/s.
● Manage and query a data warehouse with Apache Hive
● Create optimized HQL for large scale data analysis
● Deploy a Redshift cluster and use it for querying data
● Create large scale data processing applications using PySpark
● Create stream processing applications using DataStream API
● Learn about the real-time data processing architecture of Apache Spark
● Build Spark Streaming applications to process data in real-time
● Automate Data Pipelines with Airflow

● Natural Language Processing (4 Credits)


The following topics will be covered:
○ Basic concepts behind deriving insights from semi-structured data such as documents
○ Syntax and semantics with respect to linguistic data such as documents and articles
○ Creating basic N-gram models for the given textual data and interpret the results
○ Part-of-speech tagging and sequence labeling for the given textual data
○ Syntactic parsing
○ Techniques used in lexical and compositional semantics
○ Named entity recognition and relation extraction
Tools used:
○ Python
Significance:
○ As a data scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. One of the most commonly used types of data in today’s world
is linguistic data, in the form of speechor in the form of text. There are large amounts of
textual data that can be used to derive meaning from, which can be used to enhance
decision making processes. Hence, a working knowledge of natural language processing is
important to succeed as a data scientist in today’s world.

● Applied Business Analytics (4 Credits)


The following topics will be covered:
○ Approaching a business problem in a structured way and frame concrete problem
statements and project plans to solve the same
○ The 5Ws and 5WHYs business framework
○ The SPIN business framework
○ Business model canvas and issue tree frameworks
○ Understanding certain specialized business frameworks such as 7Ps and 5Cs
○ Applying these learnings to solve a real-world business problem
○ Important components of data story-telling such as levels of detail and stakeholder empathy
○ Creating a compelling data story in order to convince colleagues and clients
Tools used:
○ Excel, Python
Significance:
○ As a data scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. Additionally, you will be required to present your findings to a
larger audience including stakeholders such as colleagues and clients. Creating a measurable
business plan and presenting your project in a convincing manner is of utmost importance in
today’s world, as a data scientist. Hence, it is important to understand the various concepts
and terminologies used in businesses and align your data analyses in those terms.

● Major Project (4 Credits)


The following topics will be covered:
○ Building a machine learning pipeline for the given data, starting from data preprocessing up
to model deployment
○ Building an automated data pipeline and using it to build a real-time data processing
application
Tools used:
○ Python, Apache
Significance:
○ As a data scientist, you will be required to derive meaningful predictions and inferences
from large amounts of data. There are two major kinds of tasks required from a data
scientist. One is focused on conducting rigorous data analysis and creating resilient machine
learning models and deploying them so that others may benefit from them. The other is to
automate data pipelining so that manual effort is saved. Many tasks such as data
preprocessing can be integrated into the data pipelining process. Hence, successfully
completing any one of these projects will be critical to your success as a data scientist in the
current industry environment.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy