0% found this document useful (0 votes)

7 views7 pages

IA Big Data Lab Works

The document outlines a series of lab works for a Master 1 course in Cloud Computing & Big Data at Mohamed Khider University, focusing on practical skills in database management, data analysis, and big data processing. Each lab work involves tasks such as creating relational databases, implementing intelligent query processing, exploring database indexing, building recommendation systems, and working with Hadoop and NoSQL databases like MongoDB and Cassandra. Students are expected to utilize various technologies and techniques to analyze and manage large datasets effectively.

Uploaded by

Anis Dab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views7 pages

IA Big Data Lab Works

Uploaded by

Anis Dab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Mohamed Khider University - Biskra 2024/2025

Department of Computer Science Level: Master 1

Module: Cloud Computing & Big Data Option : IA

Lab Work 1

The objective of this practical work is to design, build, and manage a large-scale relational
database using an open dataset from Kaggle. You will import data, establish relationships
between tables, and execute advanced SQL queries.

1. Dataset Selection

 Choose a large and structured dataset from Kaggle.com that can be organized into
multiple related tables.
 Examples of suitable datasets:
o E-commerce transactions
o Movie ratings and reviews
o Financial transactions
o Healthcare records
o Social media interactions

2. Database Creation & Data Import

 Use PostgreSQL, MySQL, or SQLite to create your database.

 Write SQL scripts to define tables with appropriate data types, keys, and constraints.
 Import data from CSV files into the corresponding tables.

3. Data Analysis

Execute SQL queries to analyze the data, including:

 Aggregations: SUM, AVG, COUNT, MAX, MIN.

 Implement indexing on large tables to improve query performance.

4. Web Interface Development

 Design a web-based interface using HTML, CSS, and JavaScript to interact with
the database.
 Implement basic CRUD operations (Create, Read, Update, Delete) to allow users to
manage records.
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 2

Intelligent Query Processing

The goal of this practical work is to implement intelligent query processing techniques to
enhance user interactions with databases. You will explore:

1. Levenshtein Distance for auto-correction of misspelled queries.

2. Autocomplete using Trees to suggest relevant queries based on user input.
3. BK-Tree (Burkhard-Keller Tree) for efficient fuzzy searching in large datasets.

Instructions :

1. Create SQL Database with big datasets

2. Create a web-based interface using HTML, CSS, and JavaScript to interact with the

SQL query .

3. Use the Levenshtein algorithm to detect and correct

4. Implement Autocomplete using a Trie (Prefix Tree). Example: If the user types "SEL",

the system suggests "SELECT", "SELF", etc.

5. Implement a BK-Tree to efficiently handle approximate matching in large datasets.

This structure is useful for quickly finding the closest matches to a given input.

Example: If searching for "Biksra", the system finds similar names like "Biskra",.
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 3

Database Indexing and TF-IDF for Efficient Search

The goal of this practical work is to explore database indexing techniques to optimize query
performance and implement TF-IDF (Term Frequency - Inverse Document Frequency)
for text search relevance. You will:

 Create and use indexes to speed up SQL queries.

 Implement TF-IDF to rank search results based on relevance.
 Compare performance between indexed and non-indexed queries.

1. Create a Database and Load Dataset (e.g., articles, product reviews, or

customer transactions).
2. Create Indexes for Faster Queries
3. Calculate Term Frequency (TF) : Compute the frequency of a word in a document
4. Calculate Inverse Document Frequency (IDF) : Compute the importance of a word across
all documents
5. Implement a Query using TF-IDF Ranking
6. Compare indexed vs. non-indexed queries and measure execution time.
7. Display ranked search results in Interface
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 4

Recommendation System & Product Comparison

The aim of this practical work is to build a recommendation system using TF-IDF (Term

Frequency - Inverse Document Frequency) to compare product descriptions and suggest

1. Extract textual features from product descriptions.

2. Compute TF-IDF scores to measure word importance.

3. Use cosine similarity to compare and recommend similar products.

4. Evaluate the effectiveness of TF-IDF for recommendations.

Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 5
Big Data Processing with Hadoop

The objective of this practical work is to introduce students to Hadoop, a powerful framework
for distributed storage and processing of large datasets. Students will set up a Hadoop
environment, process data using HDFS (Hadoop Distributed File System), and perform
MapReduce operations to analyze a dataset.

1. Download and install Hadoop (Single-node), Configure core-site.xml, hdfs-site.xml, and

mapred-site.xml.
2. Download a dataset (e.g., a Kaggle dataset like movie reviews, stock market data, or
web logs).
3. Word Count Example in Java : Implement a MapReduce job that counts word
occurrences in a dataset.
4. Download and process a large dataset (e.g., customer reviews, social media posts).
a. Use HDFS to store the dataset.
b. Implement a MapReduce job to analyze trends (e.g., most common words, user
activity).
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 6
NoSQL Database Management with MongoDB

The objective of this practical work is to introduce students to MongoDB, a NoSQL database
used for handling large amounts of unstructured and semi-structured data. Students will learn
how to:

 and Install and configure MongoDB

 Create and manage collections and documents
 Perform CRUD (Create, Read, Update, Delete) operations
 Execute complex queries using MongoDB’s aggregation framework

1. Install MongoDB on your system (MongoDB Download)

2. Create a Database
3. Manage Collections & Documents :
i. Insert Data into a Collection
ii. Retrieve Data
iii. Delete Documents
iv. Update Documents
4. Integrating MongoDB with a Web Application
Mohamed Khider University - Biskra 2024/2025
Department of Computer Science Level: Master 1
Module: Cloud Computing & Big Data Option : IA

Lab Work 7
Big Data Storage and Processing with Cassandra

The goal of this practical work is to introduce students to Cassandra, a distributed,

scalable, and NoSQL database designed for handling large amounts of data across
multiple nodes with high availability. Students will learn how to :

 Set up an Cassandra environment

 Create and manage tables
 Perform CRUD (Create, Read, Update, Delete) operations

 Execute advanced queries using CQL (Cassandra Query Language) and Java API

Instructions

1. Download and Install Cassandra (Standalone )

2. Create a Table in Cassandra Shell

a. Insert Data into the Table

b. Retrieve Data from the Table

c. Update Data

d. Delete Data

3. Set Up a Java Project with Cassandra

Bigdataspark Manual (MR-22)
No ratings yet
Bigdataspark Manual (MR-22)
106 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
BDmanish
No ratings yet
BDmanish
25 pages
Termproject
No ratings yet
Termproject
5 pages
ME CSE Sem 1
No ratings yet
ME CSE Sem 1
9 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
Internship Report (Data Science)
No ratings yet
Internship Report (Data Science)
32 pages
Big Data Lab File
No ratings yet
Big Data Lab File
49 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
MongoDB MSEC
No ratings yet
MongoDB MSEC
18 pages
Woodmizer LT15 Parts
No ratings yet
Woodmizer LT15 Parts
39 pages
COS221 Assignment 1 2025
No ratings yet
COS221 Assignment 1 2025
3 pages
BAD601 Important Question
No ratings yet
BAD601 Important Question
2 pages
Big Daa R18 Manual
No ratings yet
Big Daa R18 Manual
84 pages
PCAC2009
No ratings yet
PCAC2009
3 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
Eti 2 - Compressed
No ratings yet
Eti 2 - Compressed
11 pages
Introduction of Subject
No ratings yet
Introduction of Subject
28 pages
12 IP Splitup Syllabus XII IP 2024 25
No ratings yet
12 IP Splitup Syllabus XII IP 2024 25
2 pages
Mscit Sem3and4
No ratings yet
Mscit Sem3and4
11 pages
Mongo DB
No ratings yet
Mongo DB
24 pages
Manual Mango
No ratings yet
Manual Mango
17 pages
001-2023-0921 DLMDSBDT01 Course Book
No ratings yet
001-2023-0921 DLMDSBDT01 Course Book
124 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
No ratings yet
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
5 pages
Techno 101 - Presentation
No ratings yet
Techno 101 - Presentation
58 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
Wa0037.
No ratings yet
Wa0037.
3 pages
Resume ICICI
No ratings yet
Resume ICICI
3 pages
Unit 1
No ratings yet
Unit 1
19 pages
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
No ratings yet
Course Code: Course Title: TPC Version No. Course Pre-Requisites/ Co-Requisites Anti-Requisites (If Any) - Objectives
4 pages
MCA Syllabus
No ratings yet
MCA Syllabus
76 pages
Big Data Analytics - Sem 7 CVMU
No ratings yet
Big Data Analytics - Sem 7 CVMU
4 pages
536C3A
No ratings yet
536C3A
2 pages
DSA Practical Index
No ratings yet
DSA Practical Index
3 pages
CSE511 CourseBrief
No ratings yet
CSE511 CourseBrief
2 pages
Gravimetic Feeders
100% (1)
Gravimetic Feeders
26 pages
IIT Jodhpur Postgraduate Diploma in Data Engineering & Cloud Computing
No ratings yet
IIT Jodhpur Postgraduate Diploma in Data Engineering & Cloud Computing
18 pages
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
No ratings yet
Installation: Order No.: Customer: Equipment: Converter Type: Document: 3BHS213774E01 ACS 1000 W
73 pages
Probability Mass Function & Density Function
No ratings yet
Probability Mass Function & Density Function
34 pages
Cse 4-1 4-2
No ratings yet
Cse 4-1 4-2
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
DBMS File
No ratings yet
DBMS File
96 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
9 pages
2CS702-CPD-Odd 23 24
No ratings yet
2CS702-CPD-Odd 23 24
9 pages
DRM Steps
100% (3)
DRM Steps
30 pages
05 RSB Cluster
No ratings yet
05 RSB Cluster
14 pages
Big Data Technologies Course Outline
No ratings yet
Big Data Technologies Course Outline
2 pages
Cse 511
No ratings yet
Cse 511
7 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
MS 02 230
No ratings yet
MS 02 230
58 pages
Int 421
No ratings yet
Int 421
2 pages
Standard Truss Garage Plan
No ratings yet
Standard Truss Garage Plan
12 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
.Trashed-1742732428-Abstraction in Java - GeeksforGeeks
No ratings yet
.Trashed-1742732428-Abstraction in Java - GeeksforGeeks
11 pages
Trade Ultra Brochure Web
No ratings yet
Trade Ultra Brochure Web
11 pages
Big Data
No ratings yet
Big Data
4 pages
Product Supplement For Planning Space: Access To This Documentation (" ")
No ratings yet
Product Supplement For Planning Space: Access To This Documentation (" ")
6 pages
Ss # SDT - 065: Site Acceptance Test Report For 22Kv Transformer
No ratings yet
Ss # SDT - 065: Site Acceptance Test Report For 22Kv Transformer
6 pages
Kel 5. Impact of Renewable Energy Utilization and Artificial Intelligence in Achieving Sustainable Development Goals
No ratings yet
Kel 5. Impact of Renewable Energy Utilization and Artificial Intelligence in Achieving Sustainable Development Goals
15 pages
Pavani Profile (Salesforce Developer)
No ratings yet
Pavani Profile (Salesforce Developer)
3 pages
Regulation of Streams in The Skopje Region With Measures For Regulation and Rehabilitation of The River Beds
No ratings yet
Regulation of Streams in The Skopje Region With Measures For Regulation and Rehabilitation of The River Beds
29 pages
The Business of Intellectual Property A Literature Review of IP Management Research
No ratings yet
The Business of Intellectual Property A Literature Review of IP Management Research
20 pages
Data Engineering Nanodegree Program Syllabus PDF
No ratings yet
Data Engineering Nanodegree Program Syllabus PDF
5 pages
202550876663IF Chibuzor
No ratings yet
202550876663IF Chibuzor
1 page
Be Form 2 School Work Plan
100% (1)
Be Form 2 School Work Plan
3 pages
Analysing Descriptive, Prescriptive, Predictive & Diagnostic Framework at Workplace
No ratings yet
Analysing Descriptive, Prescriptive, Predictive & Diagnostic Framework at Workplace
11 pages
HC Vibration 1
No ratings yet
HC Vibration 1
9 pages
CSI 4500 Datasheet PDF
No ratings yet
CSI 4500 Datasheet PDF
16 pages
Programming Unit Vocabulary 1
No ratings yet
Programming Unit Vocabulary 1
4 pages
Anusha Vattam Resume 2017
No ratings yet
Anusha Vattam Resume 2017
2 pages
DEA 5TT2 Quiz
No ratings yet
DEA 5TT2 Quiz
4 pages
Product, Design and Development
No ratings yet
Product, Design and Development
11 pages
B.Tech Jntuh DWDM Course Description
No ratings yet
B.Tech Jntuh DWDM Course Description
6 pages
Advanced ATM Crime Prevention System by Using Wireless Communication
No ratings yet
Advanced ATM Crime Prevention System by Using Wireless Communication
6 pages
Lowongan Pekerjaan - Employee Referral Program (10022021)
No ratings yet
Lowongan Pekerjaan - Employee Referral Program (10022021)
5 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
Extensometer: Types, How It Works, Applications: What Is An Extensometer?
No ratings yet
Extensometer: Types, How It Works, Applications: What Is An Extensometer?
4 pages
Creating A New Label: Style Bar Set The Label Setup Options
No ratings yet
Creating A New Label: Style Bar Set The Label Setup Options
3 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
From Everand
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
Aniruddha Deswandikar
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
From Everand
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
Manish Soni
No ratings yet
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
From Everand
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
Georgio Daccache
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

IA Big Data Lab Works

Uploaded by

IA Big Data Lab Works

Uploaded by

Mohamed Khider University - Biskra 2024/2025

Department of Computer Science Level: Master 1

2. Database Creation & Data Import

 Use PostgreSQL, MySQL, or SQLite to create your database.

Execute SQL queries to analyze the data, including:

 Aggregations: SUM, AVG, COUNT, MAX, MIN.

4. Web Interface Development

Intelligent Query Processing

1. Levenshtein Distance for auto-correction of misspelled queries.

1. Create SQL Database with big datasets

3. Use the Levenshtein algorithm to detect and correct

the system suggests "SELECT", "SELF", etc.

5. Implement a BK-Tree to efficiently handle approximate matching in large datasets.

Database Indexing and TF-IDF for Efficient Search

 Create and use indexes to speed up SQL queries.

1. Create a Database and Load Dataset (e.g., articles, product reviews, or

Recommendation System & Product Comparison

Frequency - Inverse Document Frequency) to compare product descriptions and suggest

similar items. You will:

1. Extract textual features from product descriptions.

2. Compute TF-IDF scores to measure word importance.

3. Use cosine similarity to compare and recommend similar products.

4. Evaluate the effectiveness of TF-IDF for recommendations.

1. Download and install Hadoop (Single-node), Configure core-site.xml, hdfs-site.xml, and

 and Install and configure MongoDB

1. Install MongoDB on your system (MongoDB Download)

The goal of this practical work is to introduce students to Cassandra, a distributed,

 Set up an Cassandra environment

1. Download and Install Cassandra (Standalone )

2. Create a Table in Cassandra Shell

a. Insert Data into the Table

b. Retrieve Data from the Table

3. Set Up a Java Project with Cassandra

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.