2019 Spring Syllabus ISM6562 Muma 1
2019 Spring Syllabus ISM6562 Muma 1
Course Description:
With the advent of social media and IoT (internet of things), the data volume in organizations
has increased rapidly in recent years. With the increased in data volume to terabytes and even
petabytes, traditional database and analytical techniques are not sufficient. In this course
students will learn various big data technologies and how they can be used for data
management and data analytics purposes to handle such massive dataset. The first half of the
course will focus on big data storage technologies such as No-SQL database and distributed file
system. The second half of the course will focus on big data computational platforms such as
Hadoop map-reduce and Spark. The course will cover in-depth spark programming on big data
platform.
Course Objectives:
Learning Outcomes:
Perquisites:
Relational Database
Java or Python
Web application development
Willingness to work hard with Big data technology stack
Course Materials:
Students do not need to purchase any software or book for this class. However, the instructor
will direct students to various online materials, books, manuals, videos and software to
supplement the class lecture.
Book
1. Frank Kane's Taming Big Data with Apache Spark and Python - eBook freely available
from the library
Cassandra Quiz 5%
Assignment (Individual) 20%
eCommerce Project (Group) 25%
In Class Quizzes 10%
Final Exam 40%
The students will be given an “Incomplete” grade only as per the university policy without any
exception.
95% A/A+
90 to 95% A-
80-90 B+
70-80 B
60-70 B-
Below 60% F
Please note that this is a tentative grading scale and can change based on the performance of
the students and overall performance of the class. However, the faculty reserves the right to
change the grading scale as he deems appropriate based on overall performance of the class.
The purpose of this study is to evaluate the new design of a secured cloud storage
system. The experiment will take approximately 15-20 minutes of your time.
Contact Vivek Kumar Singh at: vivek4@mail.usf.edu or call at 813 580 9131
Once you have participated by 17th February, Vivek will assign the bonus point in the
Canvas.
Attendance Policy:
Students are required to attend all classes. Failure to attend classes will result in losing grades.
If a students is absent in more than 3 classes the student may fail the class in variant of his/her
performance in other components of grading.
Cassandra Tutorial
https://academy.datastax.com/courses/d
s201-cassandra-core-concepts
NoSQL DB
http://arxiv.org/ftp/arxiv/papers/1307/13
07.0191.pdf
No-SQL Database Basics and concepts
2 Various No-SQL database systems
DynamoDB -
Cassandra
http://dl.acm.org/citation.cfm?id=12942
81
Big Table –
http://dl.acm.org/citation.cfm?id=13658
16
HDFS Tutorials
Distributed File Systems – HDFS, Cloudera
4 http://hortonworks.com/products/horton
Platform
works-sandbox/
Big Data Search System Distributed Cache –
5
Solr, ElasticSearch
Spark -
http://static.usenix.org/legacy/events/ho
tcloud10/tech/full_papers/Zaharia.pdf
Distributed Computing – Map-Reduce, Hadoop
Map-Reduce, MongoDB Map-Reduce, Hive, MapReduce –
6
Kudu, Impala
http://dl.acm.org/citation.cfm?id=17739
22
Ullman – Chapter 2
Spark -
7 Spark Platform http://static.usenix.org/legacy/events/ho
tcloud10/tech/full_papers/Zaharia.pdf
The final project will be a group project, with each group comprising of 4 members. There are
two options on the project.
1. Each group is expected to create a fully functional, end to end eCommerce application
using available hardware and software resources. The development and implementation
can be done in a single workstation of multiple workstations connected in a distributed
manner.
Some of the components which are required (but not restricted to) for this application
are:
A suitable NoSQL database
A cache system
A CDN
A textual search system.
2. Each group is expected to take publicly available dataset, do the analysis, visualization,
model creation on the dataset using Jupyter Notebook and HDFS system on Cloudera
platform.
COURSE POLICIES
There are no make-up opportunities for in-class exams. Topical assignments turned in
late will be assessed a penalty of 20% for each for each day the assignment is late.
Assignments will not be accepted if late by more than 5 days.
There are no opportunities for extra credit in this course. Students’ focus should be on
the primary work in the course.
Grades of "Incomplete":
An “I” grade may be awarded to a student when 1) arrangements are made prior to the
end of the semester, 2) in the judgment of the instructor a valid reason is offered for
granting an Incomplete, 3) a clear path to a standard grade is agreed to by the
instructor and the student which will result in successful completion of course
requirements by the end of the succeeding semester. Offer specifics about your policy
on incomplete grades. “I” grades not removed by the end of the next semester will be
changed to “IF”.
Email:
The primary means of communication between instructor and students between live
class meetings will be email. “Blast emails” will occasionally be sent by the instructor to
all students via Canvas. Students can feel free to email their instructor with questions at
any time. Please anticipate a response time of 24 hours to email queries.
Canvas:
Canvas will be used in this course to disseminate materials turn in weekly assignments,
and return graded assignments. If you need help learning how to perform various tasks
related to this course or other courses being offered in Canvas, please view the
following videos or consult the Canvas help guides. You may also contact USF's IT
department at (813) 974-1222 or help@usf.edu.
Laptop Usage:
Laptop/Tablet usage is encouraged in this course given the nature of the material.
Classroom Recording:
Audio and/or video recordings of lectures are prohibited, as is the live streaming of
lectures or dissemination of lectures via conference calling technologies. Instructor will
provide the recording time to time as and when necessary and technically feasible.
Phone Usage:
Students are asked to place their mobile phones on “silent” and to step outside the
classroom to take any important calls.
Academic Integrity:
Disruptive students in the academic setting hinder the educational process. Disruption
of the academic process is defined as the act, words, or general conduct of a student
in a classroom or other academic environment which in the reasonable estimation of
the instructor: (a) directs attention away from the academic matters at hand, such as
noisy distractions, persistent, disrespectful or abusive interruption of lecture, exam,
academic discussion, or general University operations, or (b) presents a danger to the
health, safety, or well-being of self or other persons.
Disability Access:
Students with disabilities are responsible for registering with Students with Disabilities
Services (SDS) in order to receive academic accommodations. SDS encourages
students to notify instructors of accommodation needs at least 5 business days prior to
needing the accommodation. A letter from SDS must accompany this request.
Attendance Policy:
Students are expected to exhibit professionalism through regular attendance and on-
time arrivals to class lectures.
Religious Observances:
All students have a right to expect that the University will reasonably accommodate
their religious observances, practices and beliefs. If you observe religious holidays, you
should plan your allowed absences to include those dates.