HIT2203 Course Outline
HIT2203 Course Outline
Facilitators:
Mr. T. Butsa
Mr. S. Chaputsira
Ms. L. Amos
With the ongoing explosion in availability of large and complex business datasets ("Big Data"),
Machine Learning ("ML") algorithms are increasingly being used to automate the analytics process
and better manage the volume, velocity and variety of Big Data. This course teaches how to apply the
growing body of ML algorithms to various Big Data sources in a business context. By the end of this
course students will have a better understanding of processes, methodologies and tools used to
transform the large amount of business data available into useful information and support business
decision making by applying ML algorithms. The focus of the course is less on the technical aspects of
ML algorithms and more on the application of ML algorithms to Big Data available in different
domain. The course will use R as the primary data analysis platform and Microsoft Azure as cloud
platform for execution and deployment of ML projects. Prior experience with R or Microsoft Azure is
not required. Students are assumed to be familiar with basic statistics.
2.0 PREAMBLE
In this course, you will gain practical foundation level training that enables immediate and effective
participation in big data and other analytics projects. You will cover basic and advanced analytic
methods and big data analytics technology and tools, including MapReduce and Hadoop. Extensive
labs throughout the course provide you with the opportunity to apply these methods and tools to real
world business challenges using a technology-neutral approach. In a final lab, you will address a big
data analytics challenge by applying the concepts taught in the course to the context of the Data
Analytics Lifecycle. You will prepare for the Data Scientist Associate (EMCDSA) certification exam
and establish a baseline of Data Science skills.
Learn to apply Python, SQL, Spark, NLP, Supervised Machine Learning, Amazon Web
Services (AWS) and more to data analytics and data science problems.
Translate business objectives into data mining opportunities and implement predictive
analytics solutions to address business objectives.
Install, run and apply statistical machine learning tools on classification and regression
problems involving structured and unstructured data.
Acquire, process and analyze large data sets using cloud-based data mining methods for data
exploration, pattern discovery, prediction and answering business questions.
Develop the confidence to succeed as a data analyst/data scientist by building a professional
portfolio that includes end-to-end project experience on practical data analytics/data science
projects.
2 HIT2203 HIT
4.0 COURSE OUTLINE
Unit Topic A L T
Modules - SEMESTER II
1 Introduction to Big Data analytics
2 Data Analytics Lifecycle Assignment I Test 1
3 Advanced analytics theory and methods Assignment II Lab I
4 Model Performance Assessment. Lab II
5 Data Visualization Lab III Test II
Total number of modules to be done for Big Data & Data Analytics:
Semester
Sl. II
Category Semester
No.
3 HIT2203 HIT
Unit Course Title
2 Data Analytics Lifecycle
Total contact hours – 12
Prerequisite : Nil
PURPOSE
This course introduces the cyclic process which explains, in six stages, how information in made,
collected, processed, implemented, and analyzed for different objectives. The Data Analytics
Lifecycle defines analytics process best practices from discovery to project completion.
INSTRUCTIONAL OBJECTIVES
1. Data analytics lifecycle overview
2. Discovery phase
7. Operationalize phase
INSTRUCTIONAL OBJECTIVES
Understanding the basic concepts of Python:
Setting up Python Environment
Data Types and variables
Exploratory Data Analysis using Pandas
Machine Learning: Introduction, Supervised Learning, Unsupervised Learning,
Collaborative Filtering.
1. Introduction to advanced analytics—theory and methods
a. Supervised Learning:
Classification
Logistic regression
4 HIT2203 HIT
Naïve Bayes
Regression
Linear Regression
b. Unsupervised
Clustering:
K-Means
3. Association rules:
Apriori
6. Text analysis:
Sentiment analysis
Audio to text
INSTRUCTIONAL OBJECTIVES
1. Perception and Visualization
2. Visualization of Multivariate Data
Excel
Power BI
5 HIT2203 HIT
PRACTICALS:
REFERENCES:
1. Introduction to Data Science. A Python Approach to Concepts, Techniques and
Applications - Laura Igual · Santi Seguí
2. Multimedia Big Data Computing for IoT Applications Concepts, Paradigms and
Solutions - Sudeep Tanwar • Sudhanshu Tyagi • Neeraj Kumar
3. Introduction to Deep Learning From Logical Calculus to Artificial Intelligence -
Sandro Skansi
ASSESSMENT STRATEGY
Assessment tasks are linked to the learning outcomes of each module and are completed before the end
of the module. Module assessments typically involve written coursework, project work, oral
assessments and portfolios. Formative Assessment, which does not contribute to the final mark, is given
to help the student improve their work in future. Feedback for summative assessment, which does
contribute towards the final result, is normally given in writing to the student, with the opportunity for
the student to receive more detailed verbal explanation.
Assessment Methods
Written exams 40%
Coursework 60%:
Theory Assignments
Lab Assignments
In-class tests
Total 100%
6 HIT2203 HIT