0% found this document useful (0 votes)
71 views2 pages

Assignment - Big Data Management

This document outlines an assignment with two parts: 1) Analyze the use of a NoSQL database by a business and write a report critically analyzing their goals, methodology, outcomes, and recommendations. 2) Write pseudo-code and a Python program to implement a MapReduce algorithm to find association rules in customer transaction data, with a minimum support of 20%, and test it on a dataset of at least 300 transactions. Submissions should include the analysis report, pseudo-code, Python program, and input transaction data.

Uploaded by

Shwetank Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views2 pages

Assignment - Big Data Management

This document outlines an assignment with two parts: 1) Analyze the use of a NoSQL database by a business and write a report critically analyzing their goals, methodology, outcomes, and recommendations. 2) Write pseudo-code and a Python program to implement a MapReduce algorithm to find association rules in customer transaction data, with a minimum support of 20%, and test it on a dataset of at least 300 transactions. Submissions should include the analysis report, pseudo-code, Python program, and input transaction data.

Uploaded by

Shwetank Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data Management

ePGD ABA 2020-21

Assignment: NoSQL Databases and Map Reduce


This individual assignment consists of two parts – analysis of NoSQL database application and map
reduce implementation. You are expected to answer both the parts.

Part A – Application of NoSQL databases (35%)

Identify any ONE business that uses one or more NoSQL databases (simple KV, column family,
document or graph databases). Critically analyse their use of NoSQL database using secondary data
sources. Prepare a report outlining their business goals, methodology adopted, realized outcomes
along with your insights and recommendations in not more than 4 pages (about 1200-1400 words).

Part B – Map Reduce pseudo-code and Implementation (65%)

Suppose that you are given a set of customer purchase transactions. Each transaction contains a
basket identifier and a set of items. Assume that the items in individual transactions are not
repeated and occur only once. A subset of customer transactions is stored in the data nodes of the
Hadoop cluster. You are expected to compute the support, and confidence of rules of the form X =>
Y, where X and Y are individual items in the transaction database. Generate all rules with a support
value greater than or equal to 20%. Assume that the total number of transactions (N) is known in
advance and is available to all the data nodes in the cluster. A sample input, output and formulas are
provided below. The samples are provided only for illustrative purpose and your solution should
handle any large-scale transactional database.

Sample transactions

Basket Id Transactions
1 Bread, Diaper, Milk
2 Beer, Bread
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Milk, Diaper, Coke
Formulas

contain
Support(X) =

( ) ( ⇒ )
Support(X => Y) = Confidence(X => Y) =
( )

Page 1 of 2
Sample output

Rule Support Confidence


Bread => Milk 2/5 2/3
Milk => Bread 2/5 2/4
Diaper => Bread 2/5 2/4
Beer => Diaper 2/5 2/3
Note: Only sample rules are provided for illustration

You are expected to answer the following questions:


1. Write a map-reduce pseudo-code for the above problem with an illustrative example.
2. Write a map-reduce program in python programming language. Evaluate/test your program
on a database with at least 300 transactions or records. You may synthetically generate
transactions or use any publicly available transactional database.

Submission Instructions

Your submission should consist of the following components (in a single zip file):

Part A – Your secondary data analysis report.

Part B – (1) map-reduce pseudo-code, (2) map-reduce python program – python notebook with
display of execution results of individual steps, and (3) input transactional database files used in your
program evaluation.

Submit your individual assignment in Moodle on or before 10 Nov 2020, 23:59:59hrs.

Page 2 of 2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy