0% found this document useful (0 votes)
63 views6 pages

HIT2203 Course Outline

This document provides an outline for a course on Big Data and Data Analytics. The course aims to teach students how to apply machine learning algorithms to large and complex business datasets. It will cover topics like the data analytics lifecycle, advanced analytics theory and methods, model performance assessment, and data visualization. Students will learn to use tools like R, Python, SQL, Spark and AWS. The course assessments will involve coursework, projects, tests and portfolio work. The overall goal is for students to gain skills in applying data science techniques to address real-world business problems.

Uploaded by

sanyengere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views6 pages

HIT2203 Course Outline

This document provides an outline for a course on Big Data and Data Analytics. The course aims to teach students how to apply machine learning algorithms to large and complex business datasets. It will cover topics like the data analytics lifecycle, advanced analytics theory and methods, model performance assessment, and data visualization. Students will learn to use tools like R, Python, SQL, Spark and AWS. The course assessments will involve coursework, projects, tests and portfolio work. The overall goal is for students to gain skills in applying data science techniques to address real-world business problems.

Uploaded by

sanyengere
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Big Data & Data Analytics

Synopsis & Course Outline


HIT2203

Facilitators:
Mr. T. Butsa
Mr. S. Chaputsira
Ms. L. Amos

SCHOOL OF INDUSTRIAL SCIENCES AND TECHNOLOGY


SCHOOL OF ENGINEERING AND TECHNOLOGY

HARARE INSTITUTE OF TECHNOLOGY


1.0 COURSE SYNOPSIS

With the ongoing explosion in availability of large and complex business datasets ("Big Data"),
Machine Learning ("ML") algorithms are increasingly being used to automate the analytics process
and better manage the volume, velocity and variety of Big Data. This course teaches how to apply the
growing body of ML algorithms to various Big Data sources in a business context. By the end of this
course students will have a better understanding of processes, methodologies and tools used to
transform the large amount of business data available into useful information and support business
decision making by applying ML algorithms. The focus of the course is less on the technical aspects of
ML algorithms and more on the application of ML algorithms to Big Data available in different
domain. The course will use R as the primary data analysis platform and Microsoft Azure as cloud
platform for execution and deployment of ML projects. Prior experience with R or Microsoft Azure is
not required. Students are assumed to be familiar with basic statistics.

2.0 PREAMBLE

In this course, you will gain practical foundation level training that enables immediate and effective
participation in big data and other analytics projects. You will cover basic and advanced analytic
methods and big data analytics technology and tools, including MapReduce and Hadoop. Extensive
labs throughout the course provide you with the opportunity to apply these methods and tools to real
world business challenges using a technology-neutral approach. In a final lab, you will address a big
data analytics challenge by applying the concepts taught in the course to the context of the Data
Analytics Lifecycle. You will prepare for the Data Scientist Associate (EMCDSA) certification exam
and establish a baseline of Data Science skills.

3.0 LEARNING OUTCOMES

 Learn to apply Python, SQL, Spark, NLP, Supervised Machine Learning, Amazon Web
Services (AWS) and more to data analytics and data science problems.
 Translate business objectives into data mining opportunities and implement predictive
analytics solutions to address business objectives.
 Install, run and apply statistical machine learning tools on classification and regression
problems involving structured and unstructured data.
 Acquire, process and analyze large data sets using cloud-based data mining methods for data
exploration, pattern discovery, prediction and answering business questions.
 Develop the confidence to succeed as a data analyst/data scientist by building a professional
portfolio that includes end-to-end project experience on practical data analytics/data science
projects.

2 HIT2203 HIT
4.0 COURSE OUTLINE

Unit Topic A L T

Modules - SEMESTER II
1 Introduction to Big Data analytics
2 Data Analytics Lifecycle Assignment I Test 1
3 Advanced analytics theory and methods Assignment II Lab I
4 Model Performance Assessment. Lab II
5 Data Visualization Lab III Test II

Total number of modules to be done for Big Data & Data Analytics:
Semester

Sl. II
Category Semester
No.

1 Core courses ( 5 Units)

2 Group Project work 3

Unit Course Title


1 Introduction to Big Data analytics
Total contact hours - 12
Prerequisite : Nil
PURPOSE
Modern scientific, engineering, and business applications are increasingly dependent on data, existing
traditional data analysis technologies were not designed for the complexity of the modern world. Data
analytics has emerged as a new, exciting, and fast-paced discipline that explores novel statistical,
algorithmic, and implementation challenges that emerge in processing, storing, and extracting
knowledge from Big Data.
INSTRUCTIONAL OBJECTIVES
1. Big Data and its characteristics Lesson
2. Business value from Big Data
3. Responsibilities of a Data scientist

4 Tools for Data Analytics :


 Hadoop Distributed File System
 The MapReduce Framework
 Kafka and Spark

3 HIT2203 HIT
Unit Course Title
2 Data Analytics Lifecycle
Total contact hours – 12

Prerequisite : Nil
PURPOSE
This course introduces the cyclic process which explains, in six stages, how information in made,
collected, processed, implemented, and analyzed for different objectives. The Data Analytics
Lifecycle defines analytics process best practices from discovery to project completion.
INSTRUCTIONAL OBJECTIVES
1. Data analytics lifecycle overview
2. Discovery phase

3. Data preparation phase

4. Model planning phase

5. Model building phase

6. Communicate results phase

7. Operationalize phase

Unit Course Title


3 Advanced analytics theory and methods

Total contact hours - 12


Prerequisite : Nil
PURPOSE
This course gives a comprehensive coverage of algorithms specially meant for analyzing data at an
in-depth level.

INSTRUCTIONAL OBJECTIVES
Understanding the basic concepts of Python:
Setting up Python Environment
Data Types and variables
Exploratory Data Analysis using Pandas
Machine Learning: Introduction, Supervised Learning, Unsupervised Learning,
Collaborative Filtering.
1. Introduction to advanced analytics—theory and methods
a. Supervised Learning:
Classification
 Logistic regression

4 HIT2203 HIT
 Naïve Bayes
Regression
 Linear Regression
b. Unsupervised
Clustering:
 K-Means

3. Association rules:
 Apriori

6. Text analysis:
 Sentiment analysis
 Audio to text

Unit Course Title


4 Model Performance Assessment
Total contact hours - 12
Prerequisite: Programming knowledge in any OO language.
PURPOSE
Metrics like accuracy, precision, recall are good ways to evaluate classification models for balanced
datasets, but if the data is imbalanced and there's a class disparity, then other methods like ROC/AUC,
Gini coefficient perform better in evaluating the model performance.
INSTRUCTIONAL OBJECTIVES
1. Evaluation strategies for binary, categorical, and continuous outcomes
2. Confusion matrices quantifying classification and prediction accuracy
3. Visualization of algorithm performance and ROC

Unit Course Title


5 Data Visualization
Total contact hours - 12
Prerequisite: Programming knowledge in any OO language.
PURPOSE
Data visualization is the discipline of trying to understand data by placing it in a visual context so that
patterns, trends and correlations that might not otherwise be detected can be exposed. Python offers
multiple great graphing libraries that come packed with lots of different features.

INSTRUCTIONAL OBJECTIVES
1. Perception and Visualization
2. Visualization of Multivariate Data
 Excel
 Power BI

5 HIT2203 HIT
PRACTICALS:

1. Install Anaconda and libraries


2. Data exploration using Pandas
3. Spam Detection using Logistic Regression
4. Disease Identification using Naïve Bayes/Logistic Regression
7. K-Means clustering
8. Audio to Text conversion using python
8. Data Visualization using Power BI and Excel

REFERENCES:
1. Introduction to Data Science. A Python Approach to Concepts, Techniques and
Applications - Laura Igual · Santi Seguí
2. Multimedia Big Data Computing for IoT Applications Concepts, Paradigms and
Solutions - Sudeep Tanwar • Sudhanshu Tyagi • Neeraj Kumar
3. Introduction to Deep Learning From Logical Calculus to Artificial Intelligence -
Sandro Skansi

ASSESSMENT STRATEGY
Assessment tasks are linked to the learning outcomes of each module and are completed before the end
of the module. Module assessments typically involve written coursework, project work, oral
assessments and portfolios. Formative Assessment, which does not contribute to the final mark, is given
to help the student improve their work in future. Feedback for summative assessment, which does
contribute towards the final result, is normally given in writing to the student, with the opportunity for
the student to receive more detailed verbal explanation.

Assessment Methods
Written exams 40%
Coursework 60%:
 Theory Assignments
 Lab Assignments
 In-class tests
Total 100%

6 HIT2203 HIT

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy