0% found this document useful (0 votes)
11 views

Big Data-2

The Big Data (Theory) course consists of 40 lectures and covers fundamental concepts, tools, and technologies related to Big Data, including advanced processing techniques and distributed systems. The course is structured into five modules, addressing topics such as data storage, real-time analytics, and security, along with case studies and research trends. Assessments include mid-semester and end-semester exams, assignments, and class participation.

Uploaded by

Imon Nomi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Big Data-2

The Big Data (Theory) course consists of 40 lectures and covers fundamental concepts, tools, and technologies related to Big Data, including advanced processing techniques and distributed systems. The course is structured into five modules, addressing topics such as data storage, real-time analytics, and security, along with case studies and research trends. Assessments include mid-semester and end-semester exams, assignments, and class participation.

Uploaded by

Imon Nomi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Course Title: Big Data (Theory)

Course Duration: 40 Lectures (50-60 minutes each)


Credits: 4

Course Objectives:

1. Understand the fundamental concepts of Big Data, its tools, and technologies.

2. Explore advanced techniques for processing, analyzing, and managing Big Data.

3. Develop knowledge of distributed systems and scalable architectures.

4. Learn theoretical foundations and algorithmic approaches for Big Data applications.

Course Outline

Module 1: Introduction to Big Data, Architecture and Technologies (11 Lectures)

1. Evolution and Definition of Big Data

2. Characteristics of Big Data: Volume, Velocity, Variety, Veracity, and Value

3. Importance and Applications in Various Domains

4. Challenges in Big Data Management

5. Big Data Ecosystem: Hadoop, Spark, and Beyond

6. Distributed File Systems: HDFS Overview and Architecture

7. NoSQL Databases: Characteristics and Examples (MongoDB, Cassandra, etc.)

8. Comparison of Batch vs. Real-Time Processing

9. Lambda and Kappa Architectures

10. Scalability and Fault Tolerance

Module 2: Data Storage, Processing Frameworks, Analysis and Machine Learning on Big Data (11
Lectures)

1. MapReduce: Concepts, Workflow, and Optimization Techniques

2. Apache Spark: RDDs, DAGs, and Lazy Evaluation

3. Data Partitioning and Shuffling in Distributed Systems

4. Columnar Storage and Query Optimization (e.g., Hive, Impala)


5. Case Studies on Storage Frameworks

6. Statistical Analysis and Data Preprocessing at Scale

7. Machine Learning Algorithms for Big Data (using MLlib, Mahout)

➢ Clustering: K-Means, DBSCAN

➢ Classification: Logistic Regression, Decision Trees

➢ Recommendation Systems

8. Feature Engineering on Big Data Platforms

9. Large-Scale Model Training: Stochastic Gradient Descent, Parameter Servers

10. Distributed Graph Processing: Pregel, GraphX

Module 3: Big Data Streaming, Real-Time Analytics, and Security (8 Lectures)

1. Real-Time Data Processing: Apache Kafka and Apache Flink

2. Stream Processing Models: Micro-batching and Continuous Processing

3. Event Processing Systems

4. Applications of Real-Time Analytics

5. Security Challenges in Big Data Systems

6. Data Anonymization and Masking Techniques

7. GDPR and Compliance Issues

Module 4: Advanced Topics in Big Data (5 Lectures)

1. Big Data and IoT Integration

2. Edge Computing for Big Data

3. Cloud-Based Big Data Services (AWS, Azure, Google Cloud)

4. Emerging Trends: Quantum Computing for Big Data, Blockchain Integration

Module 5: Case Studies and Research Trends (5 Lectures)

1. Case Studies in Healthcare, Finance, Retail, and Smart Cities

2. Research Directions: Federated Learning, Heterogeneous Data Integration

3. Open Challenges in Big Data Research

4. Industry Insights and Future Prospects


Assessments

• Mid-Semester Exam: 20%

• End-Semester Exam: 40%

• Assignments and Projects: 30%

• Class Participation: 10%

Textbooks and References

1. Big Data by Anil Maheshwari

2. Big Data: Principles and Best Practices of Scalable Real-Time Data Systems by Nathan Marz

3. Hadoop: The Definitive Guide by Tom White

4. Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman

5. Research Papers and Case Studies (to be provided by the instructor)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy