0% found this document useful (0 votes)

23 views12 pages

B43 BDA Exp7

Uploaded by

Nikhil Aher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

B43 BDA Exp7

Uploaded by

Nikhil Aher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

LAB MANUAL

PART A
(PART A : TO BE REFERRED BY STUDENTS)

Experiment No. 07
A.1 Aim:
To implement DGIM algorithm using java/Python.

A-2 Prerequisite
Java setup

A.3 OutCome
Students will be able to interpret business models and scientific computing paradigms, and
apply software tools for big data analytics..

A.4 Theory:

S MINING DATA STREAMS

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid
data records.
A data stream is an ordered sequence of instances that in many applications of data stream
mining can be read only once or a small number of times using limited computing and
storage capabilities.

TYPES OF QUERIES

Ad-Hoc query- You ask a query and there is an immediate response. E.g: What is the
maximum value seen so far in the stream S?

Standing queries- You are asking a query to the system say “Anytime you have an answer
to this query send me the response” , here you don't get the answer immediately .

Now let us suppose we have a window of length N (say N=24) on a binary system, We want
at all times to be able to answer a query of the form “ How many 1’s are there in the last K
bits?” for K<=N.

Here comes the DGIM Algorithm into picture.

COUNTING THE NUMBER OF 1’s IN THE DATA STREAM

DGIM algorithm {Datar-Gionis-Indyk-Motwani Algorithm)

Designed to find the number 1’s in a data set. This algorithm uses O(log 2N) bits to represent
a window of N bit, allows to estimate the number of 1’s in the window with and error of no
more than 50%.

So this algorithm gives a 50% precise answer.

In DGIM algorithm, each bit that arrives has a timestamp, for the position at which it arrives.
if the first bit has a timestamp 1, the second bit has a timestamp 2 and so on.. the positions
are recognized with the window size N (the window sizes are usually taken as a multiple of
2).The windows are divided into buckets consisting of 1’s and 0's.

RULES FOR FORMING THE BUCKETS:

1. The right side of the bucket should always start with 1. (if it starts with a 0,it is to be
neglected) E.g-. right end. 1001011 —› a bucket of size 4 ,having four 1’s
and starting with 1 on it's

2. Every bucket should have at least one 1, else no bucket can be formed.

3. All buckets should be in powers of 2.

4. The buckets cannot decrease in size as we move to the left. (move in increasing order
towards left)

Let us take an example to understand the algorithm.

Estimating the number of 1’s and counting the buckets in the given data stream.
This picture shows how we can form the buckets based on the number of ones by following the
rules.

In the given data stream let us assume the new bit arrives from the right. When the new bit = 0

After the new bit ( 0 ) arrives with a time stamp 101, there is no change in the
buckets. But what if the new bit that arrives is 1, then we need to make changes..
- Create a new bucket with the current timestamp and size 1.

If there was only one bucket of size 1, then nothing more needs to be done. However, if
there are now three buckets of size 1( buckets with timestamp 100,102, 103 in the second
step in the picture) We fix the problem by combining the leftmost(earliest) two buckets of
size 1. (purple box)

To combine any two adjacent buckets of the same size, replace them by one bucket of twice
the size. The timestamp of the new bucket is the timestamp of the rightmost of the two
buckets.

Now, sometimes combining two buckets of size 1 may create a third bucket of size 2. If so,
we combine the leftmost two buckets of size 2 into a bucket of size 4. This process may
ripple through the bucket sizes.

How long can you continue doing this...

You can continue if current timestamp- leftmost bucket timestamp of window < N (=24 here)
E.g. 103-87=16 < 24 so I continue, if it greater or equal to then I stop.

Finally the answer to the query.

How many 1’s are there in the last 20 bits?

Counting the sizes of the buckets in the last 20 bits, we say, there are 11 ones.
PART B
(PART B: TO BE COMPLETED BY STUDENTS)
Roll. No.: B43 Name: Nikhil Aher
Class: Fourth Year (B) Batch: B3
Date of Experiment: 05/09/24 Date of Submission: 12/09/24
Grade:

B.1.DGIM algorithm Write a program in java by considering any stream to implement

DGIM algorithm.

#c1ass Bucket stores no of 1's as its size and the rightmost 1 as its
timestamp class Bucket:
def init (self, size, time
stamp): self.size = size
self.time stamp = time stamp

def input bin stream(Input):

binary file = open("Binary Input.txt", "w") #this file stores the 0/1
stream bin stream = []

# converting the letters to 0/1 stream

for i in Input:
if(ord(i) in range(65, 91) or ord(i) in range(97,
123)): if(ord(i) % 2 0):
bin stream.append(0)
else:
bin stream.append(1)
# storing the binary stream into the Binary_Input file
count = 0
for x in bin stream:
binary file.write("%i" %
x) count = count + 1
if(count == 32):
binary file.write(”\n")
count = 0
binary
file.close() return
bin stream

def intial buckets(bin stream):

output file = open("Final Output.txt",
"w") a c fit = open("Actual count.txt”,
"w") bucketList = []

# Counting the no of 1's for first 32 bits (accurate

count) oc = 0 # one count
c=0
n=0
bucket counter = 0
for x in
bin_stream:
if(x == 1):
c=c+1
oc = oc +
1
if(c == 8 or c == 12 or c == 16 or c == 18 or c == 19):
bucketList.append(Bucket(oc, n + 1)) # creating initial buckets for first 32
bits oc = 0
bucket_counter = bucket_eounter + 1
output file.write("%i " % c)
a c file.write("%i " % c)
else:
output file.write("%i " % c)
a c file.write("%i " % c)

n=n+1
if(n == 32):
output file.write("\
n") a c file.write("\
n") break
merge_and_estimate(bucketList, bin_stream, bueket_counter,
output_file) actual count(bin stream, a c file)

#counts the actual no of 1's for every bit entering into the last 32
bits def actual count(bin stream, a c file):
ac = 0
d=0
j=1
for x in range(32, 1en(bin stream)):
for y in range(j, x + 1):
if(bin stream[y] == 1):
ac = ac + 1
a c file.write("%i " % ac)
ac = 0
d=d+
1j=j+
1
if(d == 32):
a c file.write("\n")
d=0

#counts no of 1's for every bit entering into the last 32 bit stream using
buckets def merge_and_estimate(bucketList, bin_stream, bucket_counter,
output_file):
z = 32 # sliding window
size d = 0
for x in range(32, 1en(bin stream)):
sum = 0
if(bin stream[x] == 1):
bucketList.append(Bucket(1, x + 1))
bucket counter = bucket counter + 1
bc = bucket counter
while (be != 0):
if(bc - 3 >= 0):
if(bucketList[bc - 3].size == bucketList[be - 1].size): #checks if the size appears 3rd
time size = bucketList[bc - 2].size
bucketList[be - 2].size = size * 2
del bucketList[bc - 3]
bucket counter = bucket counter -
1 bc = be - 1
b s = bucket counter
while (b_s > 0): #estimate for every new bit into the last 32 bit
stream 1 = bucketList[bucket counter - 1].time stamp - z
k = bucketList[b s - 1].time stamp
if(k > 1):
sum += bucketList[b s - 1].size
else:
su
m
+
=
in
t(
b
uc
ke
tL
ist
[b
s-
1]
.si
ze
/
2)
br
ea
k
bs=bs-1
d=d+1
output file.write("%i " %
sum) if(d == 32):
output file.write("\n")
d=0

text “""In the 1990’s “data mining” was an exciting and popular new concept. Around 2010,
people instead started to speak of “big data.” Today, the popular term is “data science.”
However, during all this time, the concept remained the same: use the most powerful
hardware, the most powerful programming systems, and the most efficient algorithms to
solve problems in science, commerce, healthcare, government, the humanities, and many
other fields of human endeavor. To many, data mining is the process of creating a model
from data, often by the process of machine learning, which we mention in Section 1.1.3 and
discuss more fully in Chapter 12. However, more generally, the objective of data mining is
an algorithm. For instance, we discuss locality-sensitive hashing in Chapter 3 and a number
of stream-mining algorithms in Chapter 4, none of which involve a model. Yet in many
important applications, the hard part is creating the model, and once the model is available,
the algorithm to use the model is straightforward. Consider the problem of detecting emails
that are phishing attacks. The most common approach is to build a model of phishing emails,
perhaps by examining emails that people have recently reported as"""

bin stream = input bin stream(text)

intial buckets(bin stream)
B.2 Input and Output:
Input:
Output:

B.3 Observations and learning:

The DGIM algorithm is designed for efficient approximate counting of events
over a sliding window in data streams, using logarithmic bucketing to reduce memory usage
while maintaining reasonable accuracy. It balances memory constraints with an acceptable
error margin, making it ideal for real-time analytics, such as monitoring traffic or keyword
trends in social media. While highly efficient, DGIM's accuracy may be challenged by bursty
data
patterns, requiring recalibration. Despite these limitations, it remains a key tool for stream processing
in applications that demand fast, scalable insights.

B.4 Conclusion:
In conclusion, the DGIM algorithm provides an effective solution for
approximate counting in data streams, offering a memory-efficient way to handle real-time
data over sliding windows. Its use of logarithmic bucketing strikes a balance between
accuracy and resource constraints, making it ideal for high-frequency applications like
network monitoring or social media analysis. However, its performance can be affected by
data spikes, and periodic recalibration may be needed to maintain accuracy. Overall, DGIM
remains a powerful tool for scalable, real-time data processing where precision can be traded
for efficiency.

DSA Problem Solving Patterns
No ratings yet
DSA Problem Solving Patterns
16 pages
Worksheet Class Xii Ip 2023-24
No ratings yet
Worksheet Class Xii Ip 2023-24
143 pages
Literature Review Ieee Format Example
100% (3)
Literature Review Ieee Format Example
6 pages
FAQs ICTO
100% (1)
FAQs ICTO
3 pages
Địa Danh Và Tài Liệu Lưu Trữ Về Làng Xã Bắc Kì
50% (2)
Địa Danh Và Tài Liệu Lưu Trữ Về Làng Xã Bắc Kì
749 pages
DGIM Example
No ratings yet
DGIM Example
4 pages
Unit 4 - Lecture 3 - DGIM Algorithm Notes
100% (1)
Unit 4 - Lecture 3 - DGIM Algorithm Notes
8 pages
Module 4
No ratings yet
Module 4
20 pages
Streams 1
No ratings yet
Streams 1
33 pages
72 Soham Naik BDA EXP7
No ratings yet
72 Soham Naik BDA EXP7
3 pages
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Counting Ones in A Window: The Cost of Exact Counts
100% (1)
Counting Ones in A Window: The Cost of Exact Counts
13 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
27 pages
BDA Experiment 7
No ratings yet
BDA Experiment 7
7 pages
DGIM Algorithm Theory Explanation
0% (1)
DGIM Algorithm Theory Explanation
2 pages
Module 2 Session 7 Counting of Ones in A Window Decaying Windows
No ratings yet
Module 2 Session 7 Counting of Ones in A Window Decaying Windows
3 pages
Decaying Window
No ratings yet
Decaying Window
16 pages
Mining Data Streams
No ratings yet
Mining Data Streams
34 pages
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
No ratings yet
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
90 pages
Big Dta Analytics
No ratings yet
Big Dta Analytics
7 pages
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
DGIM
No ratings yet
DGIM
90 pages
Manual Bda 6 7 8
No ratings yet
Manual Bda 6 7 8
6 pages
Bda PT 2
No ratings yet
Bda PT 2
35 pages
Unit 3
No ratings yet
Unit 3
49 pages
Streaming Algorithms Complete
No ratings yet
Streaming Algorithms Complete
10 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
11 pages
02 StreamsAlgorithms
No ratings yet
02 StreamsAlgorithms
93 pages
Mmd04A Streams
No ratings yet
Mmd04A Streams
78 pages
Bda A4
No ratings yet
Bda A4
10 pages
BigdataFinal
No ratings yet
BigdataFinal
13 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Mining Data Streams (Part 1)
No ratings yet
Mining Data Streams (Part 1)
46 pages
Data Mining
No ratings yet
Data Mining
7 pages
Exp 7 BDA
No ratings yet
Exp 7 BDA
1 page
Ch05a Streams1
No ratings yet
Ch05a Streams1
48 pages
Awesome Big Data Algorithms
No ratings yet
Awesome Big Data Algorithms
37 pages
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
No ratings yet
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
47 pages
Unit 3 DA
No ratings yet
Unit 3 DA
6 pages
Bda Unit - 2
No ratings yet
Bda Unit - 2
12 pages
Dynamic Hashing Techniques: Presented By: Anila Sahar Butt MSIT-8
No ratings yet
Dynamic Hashing Techniques: Presented By: Anila Sahar Butt MSIT-8
22 pages
4 Bda Chapter4 Answer
No ratings yet
4 Bda Chapter4 Answer
6 pages
Mining Data Stream
No ratings yet
Mining Data Stream
31 pages
Don Bosco Institute of Technology: ITDO8011 Big Data Analytics
No ratings yet
Don Bosco Institute of Technology: ITDO8011 Big Data Analytics
6 pages
Gorilla - A Fast, Scalable, In-Memory Time Series Database - The
No ratings yet
Gorilla - A Fast, Scalable, In-Memory Time Series Database - The
13 pages
BDA Questions
No ratings yet
BDA Questions
20 pages
Module 4
No ratings yet
Module 4
10 pages
2022 Dec Bda 53151
No ratings yet
2022 Dec Bda 53151
2 pages
2022 Dec Bda 53151
No ratings yet
2022 Dec Bda 53151
2 pages
Big Data Analytics, NLP, Game Theory and Deep Learning
No ratings yet
Big Data Analytics, NLP, Game Theory and Deep Learning
13 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Glouwie Mae Nachon Exercise 1
No ratings yet
Glouwie Mae Nachon Exercise 1
6 pages
DA Unit 3
No ratings yet
DA Unit 3
12 pages
Sample Questions
No ratings yet
Sample Questions
8 pages
Lecture Notes On Bucket Algorithms - Luc Devroye
No ratings yet
Lecture Notes On Bucket Algorithms - Luc Devroye
154 pages
Blooms Filter
No ratings yet
Blooms Filter
15 pages
Bda Exp8
No ratings yet
Bda Exp8
4 pages
Be Computer Engineering Semester 7 2023 May Big Data Analysis Rev 2019 C Scheme
No ratings yet
Be Computer Engineering Semester 7 2023 May Big Data Analysis Rev 2019 C Scheme
2 pages
Experiment No 8
No ratings yet
Experiment No 8
7 pages
Module 3 Mining Data Streams
No ratings yet
Module 3 Mining Data Streams
96 pages
BDA CIA II Key
No ratings yet
BDA CIA II Key
8 pages
Presentation On Counting Frequent Itemsets
No ratings yet
Presentation On Counting Frequent Itemsets
13 pages
SPMF - A Java Open-Source Data Mining Library
No ratings yet
SPMF - A Java Open-Source Data Mining Library
1 page
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
B43 BC Exp7 PDF
No ratings yet
B43 BC Exp7 PDF
11 pages
B43 BDA Exp8
No ratings yet
B43 BDA Exp8
13 pages
B43 BC Exp8 PDF
No ratings yet
B43 BC Exp8 PDF
11 pages
Java 02
No ratings yet
Java 02
3 pages
B48 Exp2 CN
No ratings yet
B48 Exp2 CN
5 pages
CN Ass1 B48
No ratings yet
CN Ass1 B48
24 pages
B48 Exp5 CN
No ratings yet
B48 Exp5 CN
9 pages
B48 Java Exp2
No ratings yet
B48 Java Exp2
7 pages
Fire Alarm System - Notifier PDF
No ratings yet
Fire Alarm System - Notifier PDF
19 pages
Business Continuity Specialist Exam
No ratings yet
Business Continuity Specialist Exam
45 pages
Xtremax Company Profile 2009
100% (2)
Xtremax Company Profile 2009
13 pages
CS8661 - IP Lab Manual Final
No ratings yet
CS8661 - IP Lab Manual Final
86 pages
PTD Lab Manual
No ratings yet
PTD Lab Manual
16 pages
03 Task Performance 1
No ratings yet
03 Task Performance 1
9 pages
n670x Series Datasheet
No ratings yet
n670x Series Datasheet
3 pages
101521-Report On The Physical Count of Property, Plant - Equipment-RPCPPE
No ratings yet
101521-Report On The Physical Count of Property, Plant - Equipment-RPCPPE
4 pages
Natural General Intelligence How Understanding The Brain Can Help Us Build Ai 1nbsped 0192843885 9780192843883 Compress
No ratings yet
Natural General Intelligence How Understanding The Brain Can Help Us Build Ai 1nbsped 0192843885 9780192843883 Compress
341 pages
Architectural Lighting and LED Drivers Ebook FINAL
No ratings yet
Architectural Lighting and LED Drivers Ebook FINAL
14 pages
RDBMS - Muj
No ratings yet
RDBMS - Muj
34 pages
SSH Cheat Sheet
No ratings yet
SSH Cheat Sheet
1 page
Common Recruitment Process For Participating Organizations/Public Sector Banks Frequently Asked Questions (Faqs)
No ratings yet
Common Recruitment Process For Participating Organizations/Public Sector Banks Frequently Asked Questions (Faqs)
8 pages
Nmap
No ratings yet
Nmap
2 pages
Softdot Hi - Tech Educational & Training Institute Unit-1 Operating System Overview
No ratings yet
Softdot Hi - Tech Educational & Training Institute Unit-1 Operating System Overview
67 pages
APS Master Interface User Manual V5.0.0
No ratings yet
APS Master Interface User Manual V5.0.0
42 pages
7 PHP Manual
No ratings yet
7 PHP Manual
55 pages
Practical List
No ratings yet
Practical List
7 pages
Siprotec 5: Protection, Control, Automation, Monitoring, Power Quality - Basic Catalog - Edition 7
No ratings yet
Siprotec 5: Protection, Control, Automation, Monitoring, Power Quality - Basic Catalog - Edition 7
13 pages
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
No ratings yet
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
2 pages
Communication
No ratings yet
Communication
3 pages
Ab Initio Session1
100% (1)
Ab Initio Session1
21 pages
Chapter 1 Overview 2020 Fundamentals of Telemedicine and Telehealth
No ratings yet
Chapter 1 Overview 2020 Fundamentals of Telemedicine and Telehealth
8 pages
TFX Power 3 Data Sheet en
No ratings yet
TFX Power 3 Data Sheet en
3 pages
Cambridge IGCSE™: Computer Science 0478/21
No ratings yet
Cambridge IGCSE™: Computer Science 0478/21
18 pages
Lesson Agenda 24, October 8th, 2020
No ratings yet
Lesson Agenda 24, October 8th, 2020
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

B43 BDA Exp7

Uploaded by

B43 BDA Exp7

Uploaded by

LAB MANUAL

S MINING DATA STREAMS

Here comes the DGIM Algorithm into picture.

COUNTING THE NUMBER OF 1’s IN THE DATA STREAM

So this algorithm gives a 50% precise answer.

RULES FOR FORMING THE BUCKETS:

3. All buckets should be in powers of 2.

Let us take an example to understand the algorithm.

How long can you continue doing this...

Finally the answer to the query.

How many 1’s are there in the last 20 bits?

B.1.DGIM algorithm Write a program in java by considering any stream to implement

def input bin stream(Input):

# converting the letters to 0/1 stream

def intial buckets(bin stream):

# Counting the no of 1's for first 32 bits (accurate

bin stream = input bin stream(text)

B.3 Observations and learning:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.