0% found this document useful (0 votes)

26 views53 pages

Swe2011 Bda - III

Uploaded by

junkmailpavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views53 pages

Swe2011 Bda - III

Uploaded by

junkmailpavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

Introduction to Stream Computing

1
4vs of Big Data

2
Infinite Data

High dim. Graph Infinite Machine

Apps
data data data learning

Locality Filtering Recommen

PageRank,
sensitive data SVM der
SimRank
hashing streams systems

Community Queries on Decision Association

Clustering
Detection streams Trees Rules

Dimension Duplicate
ality Spam Web Perceptron, document
reduction Detection advertising kNN detection

3
Data at Rest Vs. Data in Motion
• Data at Rest
– Data that is placed in storage rather than used
in real time
• Data in Motion
– Data that is moving across a network or in
memory for processing in real time

4
Data at rest vs Data in motion
• Data has been collected from • The collection process for data
that
various sources and is then analyzed in motion is similar to that of data at
after the event occurs. rest; however, the difference lies in
• The point where the data is the analytics.
analyzed and the point where action is • Analytics occur in real-time as the event
taken on it occur at two separate times. happens.
• For example, a retailer • An example here would be a theme
analyzes a previous month’s sales data park that uses wristbands to collect
and uses it to make strategic decisions data about their guests.
about the present month’s business • These wristbands would
activities. constantly record data about the guest’s
• The action takes place after the data- activities, and the park could use this
creating event has occurred. information to personalize the guest
• This data is meaningful to the visit with special surprises or
retailer and allows them to create suggested activities based on their
marketing campaigns and send behavior.
customized coupons based on customer • This allows the business to customize
purchasing behavior and other variables. the guest experience during the visit.
• Batch processing method • Real-time processing method

5
Data Streams - Terms
• Data Tsunami
• A data stream is a (potentially unbounded) sequence of tuples
• Each tuple consist of a set of attributes, similar to a row in database
table
– Continuous input, often in high-volume
– Does not end
– Impossible to process / analyze in real-time with traditional
relational database systems
• Transactional data streams: log interactions between entities
– Credit card: purchases by consumers from merchants
– Telecommunications: phone calls by callers to dialed parties
– Web: accesses by clients of resources at servers
• Measurement data streams: monitor evolution of entity states
– Sensor networks: physical phenomena, road traffic
– IP network: traffic at router interfaces
– Earth climate: temperature, moisture at weather stations
6
Data Streams: Characteristics
The characteristic of continually arriving data points introduces an important
property of data streams which also poses the greatest challenge: the size of a
data stream is potentially unbounded.

This leads to the following requirements for data stream processing algorithms:
• Bounded storage: The algorithm can only store a very limited amount of
data to summarize the data stream.
• Single pass: The incoming data points cannot be permanently stored and
need to be processed at once in the arriving order.
• Real-time: The algorithm has to process data points on average at least as
fast as the data is arriving.
• Concept drift: The algorithm has to be able to deal with a data generating
process which evolves over time (e.g., distributions change or new
structure in the data appears).

7
Why Data Stream Analysis?
• Must analyze the massive data:
– Scientific research (monitor
species) environment,
– System
failures) management (spot faults,
– Business
offers) drops, intelligence (marketing
– For revenue protection (phone fraud,
service abuse)
rules, new

8
Data Stream Management System
Ad-Hoc
Queries

. . . 1, 5, 2, 7, 0, Standing
9, 3 Queries
Output
. . . a, r, v, t, y, Processor
h, b

. . . 0, 0, 1, 0, 1,
Streams
1, 0 Entering.
Each stream is
time
composed of
elements/tuples
Limited
Working
Storage Archival
Storage

9
Analytic With Data-In-Motion
Data Ingest
01011001100011101001001001001
Opportunity Cost Starts Here

110100101010011100101001111001000100100010010001000100101 11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
Bootstrap 01100100101001001010100010010
Enrich 01100100101001001010100010010
01100100101001001010100010010

Forecast
11000100101001001011001001010
Nowcast

01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
Adaptive 01100100101001001010100010010
01100100101001001010100010010
Analytics 11000100101001001011001001010
01100100101001001010100010010
Model 01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010

10
DBMS Vs. DSMS
DBMS DSMS
• Persistent relations • Transient streams
• One-time queries • Continuous queries
• Random access • Sequential access
• “Unbounded” disk store • Bounded main memory
• Only current state matters • History/arrival-order is critical
• Passive repository • Active stores
• Relatively low update rate • Possibly multi-GB arrival rate
• No real-time services • Real-time requirements
• Assume precise data • Data stale/imprecise
• Access plan determined by • Unpredictable/variable data
query processor, physical DB arrival and characteristics
design

11
Real Time Data Streams
Social media: posts, pictures
Sensors gathering information: and videos
e.g. Climate, traffic etc.

Digital satellite
images

Purchase transaction records

Mobile phone GPS signals

High volume
administrative &
transactional records

12
Real time Data Streams
• Sensor Data Streams
• Streaming trending topics on Twitter
• Share Market Streams
• Streaming video in TV Shows
• Blog Data
• Telecommunication calling records
• Credit card transaction flows
• Network monitoring and traffic engineering
• Web logs and Web page click streams
• Satelite data flow

13
Applications (1)
• Mining query streams
– Google wants to know what queries are
more frequent today than yesterday

• Mining click streams

– Yahoo wants to know which of its pages are getting
an unusual number of hits in the past hour

• Mining social network news feeds

– E.g., look for trending topics on Twitter, Facebook

Dr.SMK, AVCCE 14
Applications (2)
• Sensor Networks
– Many sensors feeding into a central controller
• Telephone call records
– Data feeds into customer bills as well as
settlements between telephone companies
• IP packets monitored at a switch
– Gather information for optimal routing
– Detect denial-of-service attacks

Dr.SMK, AVCCE 15
Issues in Stream Processing
• Streams often deliver elements very rapidly.
– Process elements in real time
– The stream-processing algorithm is executed in main
memory, without access to secondary storage or
with only rare accesses to secondary storage.
– Even when streams are “slow,” Process should be fast
– Even if each stream by itself can be processed using a
small amount of main memory, the requirements
of all the streams together can easily exceed the
amount of available main memory.

16
Points to Ponder….
1. Process an example at a time, and inspect it
only once (atmost)
2. Use a limited amount of memory
3. Work in a limited amount of time
4. Be ready to predict at any point

17
Managing Data Streams

18
Kinds of Stream Processing Techniques
• [

• Sampling data in a Stream

– To create a sample of a stream that is usable for a class of queries
• Filtering Data Stream
– To allow particular set of elements by filtering the stream arrival
• Counting distinct elements in a Stream
– To estimate the number of different elements appearing in a stream
• Estimating moments
– Involves the distribution of frequencies of different elements in a
stream
• Counting Ones in a Window
– Counting the number of 1’s in the binary stream

19
Sampling Streams
• Stream sampling is the process of collecting a representative
sample of the elements of a data stream.

• Since we can not store the entire stream, one obvious approach is
to store a sample.

• The sample is usually much smaller than the entire stream, but can
be designed to retain many important characteristics of the stream.

• One can select the subset of stream such a way that queries about
the selected subset should have the answer which are statistically
representative of the stream as a whole.

• Unlike sampling from a stored data set, stream sampling must be

performed online, when the data arrives.

20
Sampling Streams
• Two different approaches
– (1) Sample a fixed proportion of elements
in the stream (say 1 in 10)
– (2) Maintain a random sample of fixed
size
over a potentially infinite stream
• At any “time” k we would like a random sample
of s elements
– What is the property of the sample we want to
maintain?
For all time steps k, each of k elements seen so far has
equal prob. of being sampled
21
Sampling Data Stream
• Inputs:
– Sample size k
– Window size n >> k (alternatively, time duration ‘m’)
– Stream of data elements that arrive online
• Output:
– k elements chosen uniformly at random from the
last n elements (alternatively, from all
elements that have arrived in the last ‘m’ time
units)
• Goal:
– maintain a data structure that can produce the
desired output at any time upon request
22
Two Types of Sliding Windows
• Sequence-Based
– The most recent n elements from the data stream
– Assumes a (possibly implicit) sequence number for
each element
• Timestamp-Based
– All elements from the data stream in the last
minutes of
time (e.g. last 1 week)
– Assumes a (possibly implicit) arrival timestamp for
each element
• Sequence-based is the focus for most of the
talk
23
An example problem
• A search engine receives a stream of queries, and it would
like to study the behavior of a typical user.
• Stream consists of tuples (user, query, time). Suppose, one
would like to answer the following query
“What fraction of the typical user’s
queries were repeated over the past month”
• Suppose a user has issued ‘s’ search queries one time in the
past month and ‘d’ queries more than once

• The correct answer: d/(s+d)

24
Sampling based Approach
• If we have 1/10th the sample, an expected s/10 of the
search queries appeared once and d/100 will appear
twice in the sample
d/100 = 1/10 ∙ 1/10 ∙ d
• For the entire stream, one of the two occurrences will be in
the 1/10th of the stream selected, while the other is in
the 9/10th of not selected.
18d/100 = ((1/10 ∙ 9/10)+(9/10 ∙ 1/10)) ∙ d
𝑑
100 �
𝟏𝟎𝒔+𝟏𝟗
=
𝑑+
• Hence the sample-based answer is
𝑠
10 +100
�
18𝑑
𝒅
100

25
Generalized Solution
• Stream of tuples with keys:
– Key is some subset of each tuple’s components
• e.g., tuple is (user, search, time); key is user
– Choice of key depends on application
• To get a sample of a/b fraction of the stream:
– Hash each tuple’s key uniformly into b buckets
– Pick the tuple if its hash value is at most a

Hash table with b buckets, pick the tuple if its hash value is at most
a.
How to generate a 30% sample?
Hash into b=10 buckets, take the tuple if it hashes to one of the
first 3 buckets 26
Filtering Data Streams
• A common process on streams is selection,
or filtering.
• We want to accept those tuples in the stream that
meet a criterion.
• Accepted tuples are passed to another process as a
stream, while other tuples are dropped.
• The problem becomes harder, when the
criterion involves lookup for membership in a Set.
• It is especially hard, when the Set is too large
to store in main memory.

27
Applications
• Email spam filtering
– We know 1 billion “good” email addresses
– If an email comes from one of these, it is NOT spam

• Publish-subscribe systems
– You are collecting lots of messages (news articles)
– People express interest in certain sets of keywords
– Determine whether each message matches
user’s interest

28
Filtering Data Streams : Problem
• Suppose we have a set ‘S’ of one million allowed
email address (Valid emails and not considered as spam)
• Each e-mail will occupy at least of 20 bytes.
• Assume that we have 1GB of main memory
• Hence unable to store the entire ‘S’ in main memory and
need disk access too.
• Prompts the need for adopting a method to perform the
filtering in main memory alone.
• Technique is Bloom Filtering

29
Bloom Filtering
• The underlying concept is to utilize the main memory as a
bit array.
• With 1 GB of main memory. We have a room for 8
billion
bits.
• Device a hash function ‘h’ and hash each member of ‘S’ to a
bit and set the bit as ‘1’. All the other bits of array
remain ‘0’.
• Since there are 1 billion members of ‘S’, approximately
1/8th of the bits will be ‘1’.
• The exact fraction of bit set to ‘1’ will be slightly less than
1/8th (Because it is possible that two members of ‘S’
may hash into the same bit.

30
First Cut Solution
• Given a set of keys S that we want to filter
• Create a bit array B of n bits, initially all 0s
• Choose a hash function h with range [0,n)
• Hash each member of s S to one of
n buckets, and set that bit to 1, i.e., B[h(s)]=1
• Hash each element a of the stream and output
only those that hash to bit that was set to 1
– Output a if B[h(a)] == 1

31
First Cut Solution
Output the item since it may be in S.
Item hashes to a bucket that at
least one of the items in S hashed
Item to.

Hash
func h

00100010110 Bit array B

00
Drop the item.
It hashes to a bucket
set to 0 so it is surely
not in S.

• Creates false positives

but no false negatives
– If the item is in S we surely output it, if not we may32
First Cut Solution
◾ |S| = 1 billion email addresses
|B|= 1GB = 8 billion bits
◾ If the email address is in S, then it surely
hashes to a bucket that has the big set to
1,
so it always gets through (no false
negatives)
◾ Approximately 1/8 of the bits are set to 1, so
about 1/8th of the addresses not in S get
through to the output (false positives)
 Actually, less than 1/8th, because more than one
address might hash to the same bit
Analysis: Throwing Darts
• More accurate analysis for the number of
false positives

• Consider: If we throw m darts into n equally

likely targets, what is the probability that
a target gets at least one dart?

• In our case:
– Targets = bits/buckets
– Darts = hash values of items

34
Analysis: Throwing Darts
• We have m darts, n targets
• What is the probability that a target gets
at least one dart?
Equals 1/e
Equivalent
as n ∞

n( m / n)
1 - (1 – 1/n)
1 – e–m/n
Probability some
target n not hit
Probability at
by a dart
least one dart
hits target n
35
Analysis: Throwing Darts
• Fraction of 1s in the array B =
= probability of false positive = 1 – e-m/n

• Example: 109 darts, 8∙109 targets

– Fraction of 1s in B = 1 – e-1/8 = 0.1175
• Compare with our earlier estimate: 1/8 = 0.125

36
Counting Distinct Problem
• Data stream consists of a universe
of elements chosen from a set of size N
– Maintain a count of the number of
distinct elements seen so far
• Maintain the set of elements seen so far
– That is, keep a hash table of all the
distinct elements seen so far
– Hashing and variety of algorithms are
to be used

37
Applications
• A Web site gathering statistics on how many
unique users it has seen in each given
month.
– The universal set is the set of logins for that site,
and a stream element is generated each time
someone logs in.
– This measure is appropriate for a site like
Amazon, where the typical user logs in with
their unique login name.

38
• Web site like Google that does not
require login to issue a search query
– may be able to identify users only by the IP
address from which they send the query.

– There are about 4 billion IP addresses,

sequences of four 8-bit bytes will serve as the
universal set in this case.

39
Solution
• The obvious way to solve the problem is to keep in main
memory a list of all the elements seen so far in the
stream.
• Adopt an efficient search structure such as a hash table or
search tree, so one can quickly add new elements
and check whether or not the element that just arrived
on the stream was already seen.
• As long as the number of distinct elements is not too great,
this structure can fit in main memory and there is
little problem obtaining an exact answer to the
question how many distinct elements appear in the
stream.
• Approach : Flajolet-Martin Algorithm

40
The Flajolet-Martin Algorithm
• Used to estimate the number of distinct elements by
hashing the elements of the universal set to a bit-
string
• Pick many different hash functions and hash each
element of the stream using these hash functions.
• The important property of a hash function is that
when applied to the same element, it always
produces the same result
• The length of the bit-string must be sufficient that
there are more possible results of the hash function
than there are elements of the universal set.

41
• Whenever we apply a hash function h to a
stream element a, the bit string h(a)
will end in some number of 0’s.
– Call this number the tail length for a and h.
• Let R be the maximum tail length of any a
seen so far in the stream.
• Estimate 2R for the number of distinct
elements seen in the stream.

42
Why It Works: Intuition
• Very very rough and heuristic intuition why
Flajolet-Martin works:
– h(a) hashes a with equal prob. to any of N values
– Then h(a) is a sequence of log2 N bits,
where 2-r fraction of all as have a tail of r zeros
• About 50% of as hash to ***0
• About 25% of as hash to **00
• So, if we saw the longest tail of r=2 (i.e., item hash
ending *100) then we have probably
seen
about 4 distinct items so far
– So, it takes to hash about 2r items before we
see one with zero-suffix of length r 43
Example

44
Estimating Moments
• A generalization of the problem of counting
distinct elements in a stream.
– The problem, called computing “moments,”
• Involves the distribution of frequencies of
different elements in the stream.
• We shall define moments of all orders and
concentrate on computing second
moments, from which the general
algorithm for all moments is a simple
extension.
45
Definition of Moments
• Suppose a stream consists of
elements chosen from a universal set.
• Assume the universal set is ordered so
we can speak of the ith element for any i.
• Let mi be the number of occurrences of the
i th element for any i. Then the k th-order

moment (or just kth moment) of the stream

is the sum over all i of (mi)k.
• Kth Moment
46
Computing Different Moments
• 0th moment - Count the number of different
elements in the stream.
• 1st moment = sum of the numbers of
elements in the stream (length of the
stream)
• 2nd moment = surprise number (a measure
of how uneven the distribution is)

47
Alon Matias Szegedy
Method
• AMS method works for all moments
• Gives an unbiased estimate
• We will just concentrate on the 2nd moment S
• We pick and keep track of many variables X:
– For each variable X we store X.el and X.val
• X.elcorresponds to the item i
• X.valcorresponds to the count of item i
– Note this requires a count in main memory,

• Our goal is to compute 𝑺 =

so number of Xs is limited

∑𝒊 𝒎 𝟐
�
� 48
Example: Surprise Number

• Given Stream : 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9

Surprise number = S = 10 X (1)2 + 9 X(10)2
= 10 +
900
= 910
Compute the Surprise Number for the
following stream :
90, 1, 1, 1, 1, 1, 1, 1 ,1, 1, 1
49
Counting Oneness
• A window of length N on a binary stream.
• We focus on the situation where we can not
afford to store the entire window.
• We want at all times to be able to answer
queries of the form “how many 1’s are there
in the last k bits?” for any k ≤ N.
• Solution proposed through Datar-Gionis-
Indyk- Motwani Algorithm – DGM algorithm

50
DGM Algorithm
• Each bit of the stream has a timestamp,
the position in which it arrives.
• The first bit has timestamp 1, the second
has timestamp 2, and so on.
• Divide the window into buckets, consisting of:
1. The timestamp of its right (most recent) end.
2.The number of 1’s in the bucket. This number
must be a power of 2, and we refer to the number of
1’s as the size of the bucket.

51
DGM Algorithm
• There are five rules that must be followed
when representing stream by buckets :
– The right end of a bucket is always a
position with 1.
– No position is in more than one bucket.
– There are one or two buckets of any given size,
up to some maximum size.
– All sizes must be a power of 2.
– Buckets can not decrease in size as we move to
the left.
52
53

Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
UNIT 3 Notes Data Analytics
No ratings yet
UNIT 3 Notes Data Analytics
136 pages
Big Data Stream Mining
No ratings yet
Big Data Stream Mining
8 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
DBT Unit 4 Slides
No ratings yet
DBT Unit 4 Slides
286 pages
Data Stream MG
No ratings yet
Data Stream MG
528 pages
Unit 3 - BD - Streaming
No ratings yet
Unit 3 - BD - Streaming
42 pages
Unit2 Bda
No ratings yet
Unit2 Bda
293 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
64 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Bda L4
No ratings yet
Bda L4
32 pages
Module 3 - TIME ORIENTED DATA-1
No ratings yet
Module 3 - TIME ORIENTED DATA-1
30 pages
Unit 4
No ratings yet
Unit 4
84 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
33 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
53 pages
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
No ratings yet
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
17 pages
DWDM - Unit - VII
No ratings yet
DWDM - Unit - VII
42 pages
Stream Processing
No ratings yet
Stream Processing
70 pages
Grammar of The Yucatecan Language
75% (4)
Grammar of The Yucatecan Language
412 pages
Isle of The Unknown
100% (3)
Isle of The Unknown
134 pages
Data Analytics and Visualization Unit-III
No ratings yet
Data Analytics and Visualization Unit-III
21 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Lec 01
No ratings yet
Lec 01
17 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
Data Stream Unit4
No ratings yet
Data Stream Unit4
20 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
DBT Unit4 PDF
No ratings yet
DBT Unit4 PDF
152 pages
Data Streams1
No ratings yet
Data Streams1
10 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Bda 2
No ratings yet
Bda 2
16 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
Mining Data Streams
No ratings yet
Mining Data Streams
33 pages
BDA Mod 3
No ratings yet
BDA Mod 3
57 pages
Unit 3
No ratings yet
Unit 3
30 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
Bda M4
No ratings yet
Bda M4
57 pages
BDA GTU Study Material Presentations Unit-4 29092021094703AM
No ratings yet
BDA GTU Study Material Presentations Unit-4 29092021094703AM
33 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
Mining&Data Stream Unit-3 - Removed
No ratings yet
Mining&Data Stream Unit-3 - Removed
50 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
10 pages
5 Unit
No ratings yet
5 Unit
5 pages
Introduction To Stream Data Model
50% (2)
Introduction To Stream Data Model
15 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
C Series Product Guide PDF
No ratings yet
C Series Product Guide PDF
112 pages
Bda Mid Ans
No ratings yet
Bda Mid Ans
18 pages
Data Stream Processing - An Overview: Sangeetha Seshadri Sangeeta@cc - Gatech.edu
No ratings yet
Data Stream Processing - An Overview: Sangeetha Seshadri Sangeeta@cc - Gatech.edu
68 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Unit 2
No ratings yet
Unit 2
10 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
Introduction To Stream Concepts - Stream Data Model and Architecture
No ratings yet
Introduction To Stream Concepts - Stream Data Model and Architecture
8 pages
Date: English (Set - A) Time: 3 Hrs. Class: VII M. M: 70: General Instructions
No ratings yet
Date: English (Set - A) Time: 3 Hrs. Class: VII M. M: 70: General Instructions
6 pages
UNIT-3 (Mining Data Streams)
No ratings yet
UNIT-3 (Mining Data Streams)
50 pages
F2018136003
0% (1)
F2018136003
1 page
Mergers and Acquisitions
No ratings yet
Mergers and Acquisitions
123 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Operative Obstetrics, 4E Joseph J. Apuzzio Download
No ratings yet
Operative Obstetrics, 4E Joseph J. Apuzzio Download
56 pages
All Ges101 Past Questions-1-1
No ratings yet
All Ges101 Past Questions-1-1
55 pages
Real Time Data Stream Processing Engine
No ratings yet
Real Time Data Stream Processing Engine
13 pages
Name N Address Details DEAF Aug 2019
No ratings yet
Name N Address Details DEAF Aug 2019
428 pages
Module II
No ratings yet
Module II
22 pages
Atwood - 1984 - Molten Salt Technology
100% (1)
Atwood - 1984 - Molten Salt Technology
536 pages
Standard American Accent Worksheets
No ratings yet
Standard American Accent Worksheets
10 pages
SC-Mineral Processing Notes
No ratings yet
SC-Mineral Processing Notes
8 pages
Sree Kaala Hastiswara Satakam in Telugu PDF
No ratings yet
Sree Kaala Hastiswara Satakam in Telugu PDF
21 pages
Punctuation Worksheet
No ratings yet
Punctuation Worksheet
4 pages
School-Forms-1-7 1
No ratings yet
School-Forms-1-7 1
18 pages
Science Behind The Construction of A Temple
No ratings yet
Science Behind The Construction of A Temple
3 pages
The Beginnings of British Literature Old English Anglo-Saxon and Medieval Literature
No ratings yet
The Beginnings of British Literature Old English Anglo-Saxon and Medieval Literature
108 pages
Analysis of Soft Drink
No ratings yet
Analysis of Soft Drink
9 pages
Acct Statement - XX6157 - 29012025
No ratings yet
Acct Statement - XX6157 - 29012025
40 pages
Spring-2024 ECE-2020 Syllabus & Schedule v1
No ratings yet
Spring-2024 ECE-2020 Syllabus & Schedule v1
8 pages
Inbound 7418254903065815207
No ratings yet
Inbound 7418254903065815207
78 pages
Paper Eng
No ratings yet
Paper Eng
21 pages
Ans Magnetic Properties
No ratings yet
Ans Magnetic Properties
44 pages
EngMech - Lecture 1.0
No ratings yet
EngMech - Lecture 1.0
20 pages
Withania Coagulans (Solanaceae)
No ratings yet
Withania Coagulans (Solanaceae)
11 pages
Case Daka
No ratings yet
Case Daka
7 pages
Jesse
No ratings yet
Jesse
4 pages
Adobe Scan Oct 20, 2024
No ratings yet
Adobe Scan Oct 20, 2024
1 page
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
No ratings yet
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
1 page
For Billing Enquiry Call Tikona Care at 1800-20-94276: Current Bill Details Amount (RS.)
No ratings yet
For Billing Enquiry Call Tikona Care at 1800-20-94276: Current Bill Details Amount (RS.)
1 page
Normality, Data Transformation & Scaling: Statistics for Lean Six Sigma Simplified with GEN AI, #2
From Everand
Normality, Data Transformation & Scaling: Statistics for Lean Six Sigma Simplified with GEN AI, #2
Sumeet Savant
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Swe2011 Bda - III

Uploaded by

Swe2011 Bda - III

Uploaded by

Introduction to Stream Computing

High dim. Graph Infinite Machine

Locality Filtering Recommen

Community Queries on Decision Association

Purchase transaction records

Mobile phone GPS signals

• Mining click streams

• Mining social network news feeds

• Sampling data in a Stream

• Unlike sampling from a stored data set, stream sampling must be

• The correct answer: d/(s+d)

00100010110 Bit array B

• Creates false positives

• Consider: If we throw m darts into n equally

• Example: 109 darts, 8∙109 targets

– There are about 4 billion IP addresses,

moment (or just kth moment) of the stream

• Our goal is to compute 𝑺 =

• Given Stream : 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.