0% found this document useful (0 votes)

224 views4 pages

Streaming Linear Regression On Spark MLlib and MOA

This document discusses comparing the performance of two frameworks, MOA and Spark MLlib, for streaming linear regression analysis on continuous data streams. It provides background on data streams and challenges in analyzing streaming data. Streaming linear regression is implemented in both MOA and Spark MLlib using the stochastic gradient descent algorithm. Experiments are conducted to compare the frameworks' performance in terms of CPU time cost, supported data types, usability, fault tolerance, and coding standards.

Uploaded by

ravigobi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

224 views4 pages

Streaming Linear Regression On Spark MLlib and MOA

Uploaded by

ravigobi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Streaming Linear Regression on Spark MLlib and

MOA
Bar Akgn ule Gndz dc
Computer Engineering Department Computer Engineering Department
Istanbul Technical University Istanbul Technical University
Istanbul, Turkey Istanbul, Turkey
barisakgun@itu.edu.tr gunduz@cs.itu.edu.tr

AbstractIn recent years, analyzing data streams has linear regression analysis on continuous data streams.
attracted considerable attention in different fields of computer Streaming linear regression is implemented on MLlib with
science. In this paper, two different frameworks, namely MOA Spark Streaming support. Several experiments are carried out
and Spark MLlib, are examined for linear regression on and the comparison is conducted in terms of CPU time cost,
streaming data. The focus is placed on determining how well the data set types they are supporting, usability, fault tolerance and
linear regression techniques implemented in the frameworks that coding standards.
could be used to model the data streams. We also examine the
challenges of massive data streams and how MOA and Spark The rest of the paper is organized as follows. In section II,
Streaming solve these kinds of challenges. As a result of the we present the concept of data streams for understanding the
experiments, we see that although the usage of MOA is more challenges of analyzing the stream data. In section III, we
easier than Spark MLlib, Spark MLlib linear regression shortly introduce data stream mining and this section will
performance on streaming data is better. include linear regression analysis technique that is used by
MOA and Spark MLlib frameworks. We will discuss the usage
Keywords Stream Mining; Spark Streaming; Spark MLlib; of MOA and Spark MLlib for linear regression and show the
MOA; Streaming Linear Regression; Data Streams advantages and disadvantages of these frameworks on
I. INTRODUCTION streaming linear regression in sections IV and V. Section VI
experimentally presents and analyzes the experimental results.
As the world becomes more digital, data are automatically Finally, in Section VII we will conclude.
generated by mobile applications, sensor applications, log
records, email, twitter posts etc. at an increasing rate [1]. Much II. DATA STREAMS
of these massive data is valuable at its time of received, Data stream is a real-time, continuous, ordered sequence of
therefore these types of data are called data stream and has to items [2]. The data streams may be created by transactional
be analyzed in real time. Data mining can be used to analyze systems. These kinds of streams are generated when the
massive volume data. The data stream is a real-time, interaction is occurred between data attributes; such as,
continuous, ordered sequence of items [2]; hence, stream data commercial credit card purchase, market trades, online scoring,
mining involves extracting knowledge from real-time actions. client request to a web server etc. [8]. The other type of data
Mostly these real-time actions produce massive high rate data. stream is machine generated data streams which are
The stream data mining approaches must handle these massive automatically generated by computer systems without the
(big) and high rate data in a very short time. intervention of a human; for example, GPS data records, sensor
Linear regression analysis is one of the widely used data and server performance logs etc. Therefore, these kinds of
techniques in data mining. Implementing a linear regression stream data may be big in volume [8].
model is not complex and it is an efficient algorithm; The data stream analysis techniques have several
therefore, it is a good choice for modelling and predicting the requirements, therefore the most significant challenges for data
behavior of massive stream data. stream analysis are the following [6]:
Several frameworks have been built for large scale analysis x Processing is done at a time.
of evolving data streams; such us, Apache Storm [3], IBM
InfoSphere Streams [4], Apache Samza [5] etc. Some of the x Use a limited amount of memory.
widely used frameworks are Massive Online Analysis (MOA)
[6] and Spark MLlib [7]. MOA is a software environment for x The streaming method has to be ready for data
implementing algorithms and running experiments for online analysis at any time and the arrival rate in the streams
learning from evolving data streams [6]. Machine Learning may be very fast, which may result in crashing if too
Library (MLlib) is Sparks scalable machine learning library many items arrive.
consisting of common learning algorithms [7]. x The volume of data stream may be very big (at most);
This paper aims at comparing the performances of two on the other hand the arriving data streams must be
widely used frameworks, namely MOA and Spark MLlib, for processed in a limited amount of time.

$621$0
$XJXVW3DULV)UDQFH 1244
$&0,6%1
'2,KWWSG[GRLRUJ
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

These above requirements show us that the data stream The shows the differentiation of our error function
algorithms must process data that arrive at high speed under for each data points in data set () and is the learning rate
very strict space and time constraints. Traditional data mining parameter.
techniques cannot handle these kinds of requirements. The new
technologies that are called Data Stream Mining were found Stochastic Gradient Descent (SGD) also updates a set of
for solving these challenges. parameters in an iterative manner to minimize an error function
same as GD. On the other, SGD uses only one data point for
III. DATA STREAM MINING & LINEAR REGRESSION updating the parameters in a particular iteration. Since the
computation time and resource management are in the
Data stream mining is the concept for extracting patterns challenge list of stream mining, the SGD is more suitable for
and information from a sequence of elements that need to be stream mining algorithms. Both Spark MLlib and MOA use
analyzed online as they arrive. Data stream mining started to be SGD algorithm for streaming linear regression.
an increasingly important research area in the last decade, since
many real world applications; such as, web user behavior IV. MOA STREAMING LINEAR REGRESSION
analysis, telecommunication connection analysis, sensor data
analysis, are producing continuous and large data streams. Massive Online Analysis (MOA) is a software environment
Therefore lots of machine learning algorithms have been for implementing algorithms and running experiments for
implemented on it and many techniques are designed for online learning from evolving data streams [6]. This section
passing challenges of stream data analyzing [9]. briefly introduces the usage of MOA for streaming linear
regression. The advantages and disadvantages of MOA
Linear regression is used for finding linear relationships streaming linear regression is also mentioned in the section.
between variables. Due to its simplicity and low complexity
linear regression models are the most fundamental and widely MOA has a graphical user interface (GUI) for configuring
used techniques for modelling and predicting the behavior of and running tasks. The configuration steps for streaming linear
massive data streams. A linear regression is a statistical method regression on MOA is given below:
where a dependent variable y (the target variable) is computed x Choose the learner algorithm. Although, MOA has
from p independent variables that are assumed to have an different learning algorithms for linear regression, we
influence on the target variable [10]. Given a data set use SGD algorithm. We choose SGD, because the
of n data points, the formula for a regression Spark MLlib just supports it and SGD is one of the
of one data point yi (regressand) is as follows [10]. best learning methods for linear regression.
x Set the SGD parameters. Its parameters are lambda
regularization, learning rate and loss function. The
Ej is the regression coefficient that can be calculated using learning rate and the loss function are introduced in the
Least Squares approach, xij (regressor) is the value of the jth section III. Lambda regularization parameter is also
independent variable and Hi is the error term. Linear regression used for protecting the overfitting in regression as the
aims at fitting a straight line, called a regression line, through learning rate parameter.
the set of n data points that minimizes the sum of squared
residuals. The error of a prediction for a point is the difference x Define data set. MOA has lots of predefined stream
between the value of the point yi and the predicted value yi generators for training linear regression. MOA is an
(the value on the line). The error function which measures the open source framework which enables one to
deviation of the predicted values from the true values can be implement a new stream generator on it. It also
calculated as follows: supports bi-directional interaction with WEKA [11],
therefore it accepts the Attribute-Relation File Format
(arff) as input for streaming data.

When working with arff files in MOA, two parameters
The best values of regression coefficients and the error should be set, namely the number of passes (numPasses) and
terms can be found by minimizing the error function in Eq. 2. maximum instances (maxInstances). Number of passes
Gradient Descent (GD) is one of the approaches that is applied parameter indicates the number of passes to do over the data set
to minimize this error function. The GD algorithm starts to where maximum instances parameter sets the maximum
search at any values of regression coefficients () and the error number of instances to train on per pass over the data.
terms. At each iteration the algorithm updates the regression
coefficients and the error terms that yield a lower error than the MOA is a Java based framework with a simple coding
previous iteration. This is accomplished by moving into the structure. We added our own arff reader method that listens the
negative direction of the gradient of the error function [17]. GD data set. If any new instance arrives, regression coefficients
uses a learning rate () parameter which determines how fast or will be automatically updated by SGD algorithm based on our
slow the algorithm updates the optimal regression coefficients. implementation. The data set format for MOA streaming linear
Although GD is one of the best algorithms for minimizing the regression is standard arff file format. The given data sets
error function, it solves the minimization problem using all of instances features and target labels must be numeric format
the data points. and the features must not be null. One important disadvantage
of MOA is that there is no fault tolerance mechanism in MOA.
It has to restart the operation in the event of a failure.

1245
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

V. SPARK MLLIB STREAMING LINEAR REGRESSION This is the most powerful side of Spark MLlib for streaming
Spark MLlib currently supports streaming linear regression linear regression.
using Spark Streaming technology. The regression algorithm VI. EXPERIMENTAL RESULTS
runs on each batch of data, so that the model is continually
updated with new data from the stream [12]. It uses SGD to In our experiments, the aim is to compare the performances
update the regression coefficients. Spark MLlib lets one to of two widely used frameworks MOA and Spark MLlib on the
train and test a linear regression model on streaming data. The streaming linear regression. The experiments are applied to the
streaming data must be in form of a Discretized Stream [13]. Airfoil Self-Noise data set that was processed by NASA in
The configuration steps of Spark MLlib for streaming linear 1989 [15]. The Airfoil Self-Noise data set has 1503 instances
regression is given below: and consists of the five numeric features (Frequency, Angle of
attack, Chord length, Free-stream velocity, Suction side
x Determine training and test (optional) folders. Spark displacement thickness) and one output that shows the scaled
MLlib streaming linear regression listens the given sound pressure level. To produce massive data sets, we create
training and test folders for streaming data, therefore new set of instances by copying the subset of existing data set.
it detects the streaming data when new files are added
to train or test folders. Training and test data can be The MOA and Spark MLlib streaming linear regression
ingested from many sources. Due to Sparks data algorithms are implemented on 8 Core 2.2 GHz Intel i7 PC
parallel paradigm, Spark MLlib streaming linear machine with 16 GB memory and 750 GB disk running on
regression requires a shared file system for training Ubuntu 14.04 operating system.
and test folders. The example shared file systems are The experiments require different settings for two
S3, NFS, HDFS etc. [14]. Training and test data frameworks. Firstly, we will introduce the MOA streaming
instances have to be an RDD of Labeled Points linear regression parameters. As it is mentioned in the section
format. The number of data points per train can vary, IV lambda regularization, learning rate, loss function, learner
but the number of features must be constant [12]. The algorithm, maximum instances and number of passes
input format for Spark streaming linear regression is parameters must be set for running MOA linear regression.
as follows: each line should be a data point formatted Since we used our implemented arff reader, there is no need to
as where y is the label and set maximum instances and number of passes parameters. Our
are the features [12]. Anytime a text file is placed in implemented arff reader shows the current model weights after
training folder, the model will be updated. The each 250 records; on the other hand, it causes the time latency.
features and the labels have to be in numeric format. The settings of parameters can be made through user friendly
Spark MLlib streaming linear regression algorithm GUI of MOA framework. All parameters values that we used
checks the file creation time in training and test in our tests are given in Table I.
folders, therefore if any file is created before Spark
streaming linear regression starting time then the file To build Spark MLlib streaming linear regression, the path
will not be processed by Spark streaming linear of the folders where training and test data sets reside and the
regression. This is one disadvantage of Spark MLlib number of features parameters must be set by the user. The
streaming linear regression. user can also change other parameters that are explained in
section V. Spark MLlib has no GUI for setting parameters or
x Set the streaming linear regression parameters [7]. running tasks; hence, all parameter assignments should be
Spark MLlib streaming linear regression algorithm made with coding. The Spark MLlib parameters are set as
has four parameters. These parameters are step size shown in Table I.
(learning rate), number of iteration (for finishing the
TABLE I. Spark MLlib and MOA parameters for streaming linear regression
gradient descent), initial weights vector and mini
batch fraction time. First three parameters are required Spark MLlib Parameters MOA Parameters
for linear regression as mentioned in Section III and x stepSize: 0.1 x lambdaRegularization: 1
the last parameter is used for batch time. The batch
time parameter sets the time window for spark x numIterations: 1 x learningRate: 0.1
streaming. Spark Streaming linear regression has a x miniBatchFraction: x lossFunc: SQUAREDLOSS
latency of several seconds, because of mini batch 1.0
x learner:
time. On the other hand this mini batch time x initial weights: Vector class.moa.classifier.function
efficiently guarantees that each stream data will be with 0 values s.SGD
processed exactly once.
Although Spark Streaming supports Java, Scala and Python
languages, the streaming linear regression is implemented in The experiments indicate the performance of two
Scala. Spark Streaming linear regression coding flow is simple; frameworks in terms of CPU time costs that are given in Figure
therefore, users can easily add their implementations on the 1. As can be seen from the Figure 1, the Spark MLlib
streaming regression. streaming linear regression is much faster than MOA,
especially the CPU performance differences are more clear in
Spark MLlib streaming linear regression works on memory massive data sets. There are also some challenges which
of the distributed machines; hence, the memory base structure affected us at the development stage. The following table
decreases the execution time of linear regression algorithm.

1246
2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

illustrates comparison points of these two frameworks that are Both of these frameworks are open source, therefore these
determined with these challenges. frameworks supply many advantages to the users for easily
TABLE II. Comparison of Streaming Linear Regression Table implementing massive data stream applications. We used
existing stream linear regression libraries and also implemented
Comparison of Streaming Linear Regression additional strategies to improve MOA framework in order to
Comparison Point Spark MLlib MOA compare two frameworks.
Code Complexity Low Low
Programming Although we used the Spark MLlib in local mode and non-
Scala Java distributed MOA framework for making streaming linear
Languages
Noise Data Set support No No regression, they have handled massive data streams (for local
mode) in reasonable CPU times. On the other hand, Spark
Fault Tolerance High Any
Streaming is one of the best technologies for distributed
Usability Simple Easy (GUI) streaming computing area and SAMOA (Scalable Advanced
Documentation Not much Not much
Massive Online Analysis) distributed form of MOA [16]
provides a collection of distributed streaming algorithms. In
Apache distribution
Version Rate
(Very High)
Not High future work, we plan to use Spark Streaming and SAMOA on
clusters for making these comparisons on very massive data.

9,00
7,72 REFERENCES
8,00
Execution Times(minute)

7,00 [1] A. Bifet, G. Holmes, B.Pfahringer, P.Kranen, H.Kremer, T.Jansen,

T.Seidl MOA: Massive Online Analysis, a Framework for Stream
6,00 5,24 Classification and Clustering, JMLR: Workshop and Conference
5,00 Proceedings, pp. 44-50, 2010.
3,88 [2] L. Golab and M. Ozsu, Issues in data stream management, ACM
4,00 SIGMOD Record,vol. 32 no. 2, pp. 5-14, 2003
3,00 2,60 [3] Apache Storm Website. Available: https://storm.apache.org/
1,93 [4] Real Time Processing with IBM InfoSphere Streams, IBM Data Sheets.
2,00 Available: http://www-03.ibm.com/software/products/en/infosphere-
1,00 0,60 streams
[5] Apache Samza Website. Available: http://samza.apache.org/
0,00
[6] A. Bifet, R. Kirkby G. Holmes, B.Pfahringer MOA: Massive Online
1,2 GB 2,4 GB 4,8 GB Analysis, Journal of Machine Learning Research, pp. 1601-1604, 2010
Data Size [7] Apache Spark MLlib Website. Available: https://spark.apache.org/mllib/
Spark MLlib MOA [8] N. Koudas, D. Srivastava, Data Stream Query Processing: A Tuorial,
Proceedings of the 29th VLDB Conference, pp. 1149-1149, 2003
[9] G. Krempl, I.Zliobalite, DBrzezinski, M. Last et al. Open Challenges
Fig. 1. CPU time costs of Streaming Linear Regression Graph for Data Stream Mining Research, ACM SIGKDD Explorations
Newsletter, July 2014
As a result of performances and development stages [10] C.H. Nadungodage, Y. Xia, F. Li, J. Ge StreamFitter: A Real Time
comparisons, making streaming linear regression on Spark Linear Regression Analysis System for Continuous Data Streams, 16th
International Conference, DASFAA, pp. 458-461, 2011
MLlib produces faster results than MOA; nevertheless,
[11] M. Hall, E.Frank, G.Holmes, B. Pfahringer, P.Reutemann, I.H. Witten
development time with MOA is shorter than that of Spark The WEKA Data Mining Software : An Update, SIGKDD
MLlib. Explorations, vol.11, pp. 10-18, 2009
[12] Spark MLlib Linear Methods Programming Guide
VII. CONCLUSION [13] M. Zaharia, T.Das, H. Li, S. Shenker, I. Stoica Discretized Streams: An
We have presented linear regression on streaming data Efficient and Fault-Tolerant Model for Stream Processing on Large
Clusters, Proceedings of the 4th USENIX conference on Hot Topics in
with using stream mining tools MOA and Spark MLlib with Cloud Ccomputing, pp. 10-10, 2012
Spark Streaming support. This study also aims to guide users in [14] T.S. Morais Survey on Frameworks for Distributed Computing:
implementing streaming linear regression models with MOA Hadoop, Spark and Storm, Doctoral Symposium in Informatics
and Spark MLlib. As a result of our empirical evaluation, the Engineering, pp. 95-105, 2015
key idea of MOA is to provide end to end and simple solutions [15] T. F. Brooks, D.S. Pope, and A.M. Marcolini, Airfoil self-noise and
prediction, Technical Report, NASA RP-1218, 1989
through its user friendly GUI; in fact, a user can easily
implement streaming linear model without coding knowledge. [16] G.D.F. Morales and A. Bifet SAMOA: Scalable Advanced Massive
Online Analysis, Journal of Machine Learning Research 16, pp. 149-
On the other hand, Spark streaming linear regression' key idea 153, 2015
is to handle streaming data as a series of short batch jobs, and [17] Christoper M. Bishop, Neural Networks for Pattern Recognation,
complete these batch jobs in a short time as much as possible. Oxford University Press, Inc. New York, NY, USA, 1995

1247

Suggestions From God - Salah Tul Istikhara
No ratings yet
Suggestions From God - Salah Tul Istikhara
3 pages
Data Mining Unit-V
No ratings yet
Data Mining Unit-V
19 pages
Critical Review Paper of Steam Turbine Blades Corrosion and Its Solutions
100% (2)
Critical Review Paper of Steam Turbine Blades Corrosion and Its Solutions
9 pages
Inference On High-Dimensional Single-Index Models With Streaming Data
No ratings yet
Inference On High-Dimensional Single-Index Models With Streaming Data
68 pages
Arunkumar 2020
No ratings yet
Arunkumar 2020
11 pages
RFP-COM-01-2025-1
No ratings yet
RFP-COM-01-2025-1
59 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
Lecture 2-E-Force Field
No ratings yet
Lecture 2-E-Force Field
76 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
FOR DOWNLOAD
No ratings yet
FOR DOWNLOAD
23 pages
Bird Migration: A New Understanding John H. Rappole - The ebook is ready for download with just one simple click
100% (3)
Bird Migration: A New Understanding John H. Rappole - The ebook is ready for download with just one simple click
31 pages
G9 English Lesson Exemplar 1st Quarter
No ratings yet
G9 English Lesson Exemplar 1st Quarter
87 pages
TECHNICAL COMMUNICATION UNIT 4
No ratings yet
TECHNICAL COMMUNICATION UNIT 4
22 pages
NVPC Sample
No ratings yet
NVPC Sample
12 pages
IED Torino Undergraduate Transportation Design
No ratings yet
IED Torino Undergraduate Transportation Design
19 pages
Presentation For TLE
No ratings yet
Presentation For TLE
17 pages
Linux Commands
No ratings yet
Linux Commands
33 pages
The Country Mouse. Lesson - Basal
No ratings yet
The Country Mouse. Lesson - Basal
13 pages
Is Inductive Machine Learning Just Another Wild Goose
No ratings yet
Is Inductive Machine Learning Just Another Wild Goose
24 pages
CLD Project Files
No ratings yet
CLD Project Files
21 pages
Big O Notations
No ratings yet
Big O Notations
19 pages
A Comparative Runtime Analysis of Heuristic Algorithms
No ratings yet
A Comparative Runtime Analysis of Heuristic Algorithms
18 pages
Definition of Hypothesis in Research
100% (3)
Definition of Hypothesis in Research
5 pages
Kami Export - Niyati Naveen nair - DNA Profiling Lab
No ratings yet
Kami Export - Niyati Naveen nair - DNA Profiling Lab
5 pages
Ph.D. Program in Political Science of The City University of New York
No ratings yet
Ph.D. Program in Political Science of The City University of New York
22 pages
Complexity Measures For Meta-Learning
No ratings yet
Complexity Measures For Meta-Learning
12 pages
Analyzing HardFaults On Cortex-M CPU
No ratings yet
Analyzing HardFaults On Cortex-M CPU
12 pages
Chemical bonds (Chemistry) Grade 7
No ratings yet
Chemical bonds (Chemistry) Grade 7
3 pages
HW02 Sol
No ratings yet
HW02 Sol
11 pages
Summative Test in Community Engagement Solidarity and Citizenship
No ratings yet
Summative Test in Community Engagement Solidarity and Citizenship
2 pages
Capstone Rough Draft
No ratings yet
Capstone Rough Draft
10 pages
Spark On Hadoop Vs MPI OpenMP On Beowulf
No ratings yet
Spark On Hadoop Vs MPI OpenMP On Beowulf
10 pages
Automotive Big Data
No ratings yet
Automotive Big Data
10 pages
Kpi Analysis
No ratings yet
Kpi Analysis
30 pages
Audio On ARM Cortex-M Processors
No ratings yet
Audio On ARM Cortex-M Processors
5 pages
Big O Notation
No ratings yet
Big O Notation
7 pages
(2019) MM5004 Operations Management
No ratings yet
(2019) MM5004 Operations Management
18 pages
Reading Comprehension
No ratings yet
Reading Comprehension
2 pages
8 Ci Sinif Word Definition-2-2024
No ratings yet
8 Ci Sinif Word Definition-2-2024
2 pages
Corrosion
No ratings yet
Corrosion
6 pages
Floyd Warshall
No ratings yet
Floyd Warshall
6 pages
Scratch Resistance Hybrid Sol-Gel Silane Coating
No ratings yet
Scratch Resistance Hybrid Sol-Gel Silane Coating
1 page
Chapter 1 by Ian Stewart Infographic
No ratings yet
Chapter 1 by Ian Stewart Infographic
1 page
Bypassed Audios
No ratings yet
Bypassed Audios
3 pages
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
From Everand
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
From Everand
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Dgraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Dgraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Prisma in Depth: Definitive Reference for Developers and Engineers
From Everand
Prisma in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Load Balancer Technologies and Architectures: Definitive Reference for Developers and Engineers
From Everand
Load Balancer Technologies and Architectures: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lagom Microservices Architecture Guide: Definitive Reference for Developers and Engineers
From Everand
Lagom Microservices Architecture Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Machine Learning with MLlib: Definitive Reference for Developers and Engineers
From Everand
Applied Machine Learning with MLlib: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
GraphX in Practice: Definitive Reference for Developers and Engineers
From Everand
GraphX in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Meteor Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Meteor Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Amazon EMR Solutions in Cloud Computing: Definitive Reference for Developers and Engineers
From Everand
Amazon EMR Solutions in Cloud Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AWS Lambda Essentials: Definitive Reference for Developers and Engineers
From Everand
AWS Lambda Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Practical Dataflow Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Dataflow Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Designing Scalable APIs with AppSync: Definitive Reference for Developers and Engineers
From Everand
Designing Scalable APIs with AppSync: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
From Everand
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LACP Configuration and Implementation Guide: Definitive Reference for Developers and Engineers
From Everand
LACP Configuration and Implementation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Apache Samza: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Apache Samza: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Storm Systems for Real-Time Data Processing: Definitive Reference for Developers and Engineers
From Everand
Storm Systems for Real-Time Data Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Dash Applications: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Dash Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Fetching with SWR: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Fetching with SWR: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
From Everand
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
From Everand
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LightGBM in Practice: Definitive Reference for Developers and Engineers
From Everand
LightGBM in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
KNIME Workflow Design and Automation: Definitive Reference for Developers and Engineers
From Everand
KNIME Workflow Design and Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Stream Processing Techniques and Patterns: Definitive Reference for Developers and Engineers
From Everand
Stream Processing Techniques and Patterns: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ambassador for Cloud Native Ingress Solutions: Definitive Reference for Developers and Engineers
From Everand
Ambassador for Cloud Native Ingress Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Guide to H2O.ai: Definitive Reference for Developers and Engineers
From Everand
Practical Guide to H2O.ai: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Java Streams Explained: A Practical Guide with Examples
From Everand
Java Streams Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Application Performance Management in Modern Systems: Definitive Reference for Developers and Engineers
From Everand
Application Performance Management in Modern Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Serverless Architectures and Applications: Definitive Reference for Developers and Engineers
From Everand
Serverless Architectures and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
From Everand
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
NetFlow Protocols and Applications: Definitive Reference for Developers and Engineers
From Everand
NetFlow Protocols and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SystemTap Essentials: Definitive Reference for Developers and Engineers
From Everand
SystemTap Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Azure Data Demystified: From SQL to Synapse
From Everand
Azure Data Demystified: From SQL to Synapse
Kameron Hussain
No ratings yet
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
From Everand
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Mastering MATLAB: A Comprehensive Journey Through Coding and Analysis
From Everand
Mastering MATLAB: A Comprehensive Journey Through Coding and Analysis
Kameron Hussain
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Streaming Linear Regression On Spark MLlib and MOA

Uploaded by

Streaming Linear Regression On Spark MLlib and MOA

Uploaded by

2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Streaming Linear Regression on Spark MLlib and

7,00 [1] A. Bifet, G. Holmes, B.Pfahringer, P.Kranen, H.Kremer, T.Jansen,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.