0% found this document useful (0 votes)

0 views7 pages

29 PDFsam Apache Spark Tutorial

Uploaded by

mitmak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views7 pages

29 PDFsam Apache Spark Tutorial

Uploaded by

mitmak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Apache Spark

Submit the spark application using the following command:

spark-submit --class SparkWordCount --master local wordcount.jar

If it is executed successfully, then you will find the output given below. The OK letting in
the following output is for user identification and that is the last line of the program. If
you carefully read the following output, you will find different things, such as:

 successfully started service 'sparkDriver' on port 42954

 MemoryStore started with capacity 267.3 MB
 Started SparkUI at http://192.168.1.217:4040
 Added JAR file:/home/hadoop/piapplication/count.jar
 ResultStage 1 (saveAsTextFile at SparkPi.scala:11) finished in 0.566 s
 Stopped Spark web UI at http://192.168.1.217:4040
 MemoryStore cleared

15/07/08 13:56:04 INFO Slf4jLogger: Slf4jLogger started

15/07/08 13:56:04 INFO Utils: Successfully started service 'sparkDriver' on
port 42954.
15/07/08 13:56:04 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@192.168.1.217:42954]
15/07/08 13:56:04 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/07/08 13:56:05 INFO HttpServer: Starting HTTP Server
15/07/08 13:56:05 INFO Utils: Successfully started service 'HTTP file server'
on port 56707.
15/07/08 13:56:06 INFO SparkUI: Started SparkUI at http://192.168.1.217:4040
15/07/08 13:56:07 INFO SparkContext: Added JAR
file:/home/hadoop/piapplication/count.jar at
http://192.168.1.217:56707/jars/count.jar with timestamp 1436343967029
15/07/08 13:56:11 INFO Executor: Adding file:/tmp/spark-45a07b83-42ed-42b3-
b2c2-823d8d99c5af/userFiles-df4f4c20-a368-4cdd-a2a7-39ed45eb30cf/count.jar to
class loader
15/07/08 13:56:11 INFO HadoopRDD: Input split:
file:/home/hadoop/piapplication/in.txt:0+54
15/07/08 13:56:12 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2001
bytes result sent to driver
(MapPartitionsRDD[5] at saveAsTextFile at SparkPi.scala:11), which is now
runnable
15/07/08 13:56:12 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 1 (MapPartitionsRDD[5] at saveAsTextFile at SparkPi.scala:11)
15/07/08 13:56:13 INFO DAGScheduler: ResultStage 1 (saveAsTextFile at
SparkPi.scala:11) finished in 0.566 s
15/07/08 13:56:13 INFO DAGScheduler: Job 0 finished: saveAsTextFile at
SparkPi.scala:11, took 2.892996 s

25
Apache Spark

OK
15/07/08 13:56:13 INFO SparkContext: Invoking stop() from shutdown hook
15/07/08 13:56:13 INFO SparkUI: Stopped Spark web UI at
http://192.168.1.217:4040
15/07/08 13:56:13 INFO DAGScheduler: Stopping DAGScheduler
15/07/08 13:56:14 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
15/07/08 13:56:14 INFO Utils: path = /tmp/spark-45a07b83-42ed-42b3-b2c2-
823d8d99c5af/blockmgr-ccdda9e3-24f6-491b-b509-3d15a9e05818, already present as
root for deletion.
15/07/08 13:56:14 INFO MemoryStore: MemoryStore cleared
15/07/08 13:56:14 INFO BlockManager: BlockManager stopped
15/07/08 13:56:14 INFO BlockManagerMaster: BlockManagerMaster stopped
15/07/08 13:56:14 INFO SparkContext: Successfully stopped SparkContext
15/07/08 13:56:14 INFO Utils: Shutdown hook called
15/07/08 13:56:14 INFO Utils: Deleting directory /tmp/spark-45a07b83-42ed-42b3-
b2c2-823d8d99c5af
15/07/08 13:56:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!

Step 5: Checking output

After successful execution of the program, you will find the directory named outfile in
the spark-application directory.

The following commands are used for opening and checking the list of files in the outfile
directory.

$ cd outfile
$ ls
Part-00000 part-00001 _SUCCESS

The commands for checking output in part-00000 file are:

$ cat part-00000
(people,1)
(are,2)
(not,1)
(as,8)
(beautiful,2)
(they, 7)

26
Apache Spark

(look,1)

The commands for checking output in part-00001 file are:

$ cat part-00001
(walk, 1)
(or, 1)
(talk, 1)
(only, 1)
(love, 1)
(care, 1)
(share, 1)

Go through the following section to know more about the ‘spark-submit’ command.

Spark-submit Syntax
spark-submit [options] <app jar | python file> [app arguments]

Options
The table given below describes a list of options:-

S.No Option Description

1 --master spark://host:port, mesos://host:port, yarn, or local.

Whether to launch the driver program locally

2 --deploy-mode ("client") or on one of the worker machines inside the
cluster ("cluster") (Default: client).

3 --class Your application's main class (for Java / Scala apps).

4 --name A name of your application.

Comma-separated list of local jars to include on the

5 --jars
driver and executor classpaths.

Comma-separated list of maven coordinates of jars to

6 --packages
include on the driver and executor classpaths.

Comma-separated list of additional remote

7 --repositories repositories to search for the maven coordinates
given with --packages.

27
Apache Spark

Comma-separated list of .zip, .egg, or .py files to

8 --py-files
place on the PYTHON PATH for Python apps.

Comma-separated list of files to be placed in the

9 --files
working directory of each executor.

10 --conf (prop=val) Arbitrary Spark configuration property.

Path to a file from which to load extra properties. If

11 --properties-file
not specified, this will look for conf/spark-defaults.

12 --driver-memory Memory for driver (e.g. 1000M, 2G) (Default: 512M).

13 --driver-java-options Extra Java options to pass to the driver.

14 --driver-library-path Extra library path entries to pass to the driver.

Extra class path entries to pass to the driver.

15 --driver-class-path Note that jars added with --jars are automatically
included in the classpath.

16 --executor-memory Memory per executor (e.g. 1000M, 2G) (Default: 1G).

17 --proxy-user User to impersonate when submitting the application.

18 --help, -h Show this help message and exit.

19 --verbose, -v Print additional debug output.

20 --version Print the version of current Spark.

21 --driver-cores NUM Cores for driver (Default: 1).

22 --supervise If given, restarts the driver on failure.

23 --kill If given, kills the driver specified.

24 --status If given, requests the status of the driver specified.

25 --total-executor-cores Total cores for all executors.

Number of cores per executor. (Default: 1 in YARN

26 --executor-cores mode, or all available cores on the worker in
standalone mode).

28
Apache Spark

29
6. ADVANCED SPARK PROGRAMMING Apache Spark

Spark contains two different types of shared variables- one is broadcast variables and
second is accumulators.

 Broadcast variables: used to efficiently, distribute large values.

 Accumulators: used to aggregate the information of particular collection.

Broadcast Variables
Broadcast variables allow the programmer to keep a read-only variable cached on each
machine rather than shipping a copy of it with tasks. They can be used, for example, to
give every node, a copy of a large input dataset, in an efficient manner. Spark also
attempts to distribute broadcast variables using efficient broadcast algorithms to reduce
communication cost.

Spark actions are executed through a set of stages, separated by distributed “shuffle”
operations. Spark automatically broadcasts the common data needed by tasks within
each stage.

The data broadcasted this way is cached in serialized form and is deserialized before
running each task. This means that explicitly creating broadcast variables, is only useful
when tasks across multiple stages need the same data or when caching the data in
deserialized form is important.

Broadcast variables are created from a variable v by calling

SparkContext.broadcast(v). The broadcast variable is a wrapper around v, and its
value can be accessed by calling the value method. The code given below shows this:

scala> val broadcastVar = sc.broadcast(Array(1, 2, 3))

Output:

broadcastVar: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)

After the broadcast variable is created, it should be used instead of the value v in any
functions run on the cluster, so that v is not shipped to the nodes more than once. In
addition, the object v should not be modified after its broadcast, in order to ensure that
all nodes get the same value of the broadcast variable.

Accumulators
Accumulators are variables that are only “added” to through an associative operation
and can therefore, be efficiently supported in parallel. They can be used to implement
counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric
types, and programmers can add support for new types. If accumulators are created
with a name, they will be displayed in Spark’s UI. This can be useful for understanding
the progress of running stages (NOTE: this is not yet supported in Python).
30
Apache Spark

An accumulator is created from an initial value v by calling

SparkContext.accumulator(v). Tasks running on the cluster can then add to it using
the add method or the += operator (in Scala and Python). However, they cannot read
its value. Only the driver program can read the accumulator’s value, using
its value method.

The code given below shows an accumulator being used to add up the elements of an
array:

scala> val accum = sc.accumulator(0)

scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)

If you want to see the output of above code then use the following command:

scala> accum.value

Output

res2: Int = 10

Numeric RDD Operations

Spark allows you to do different operations on numeric data, using one of the
predefined API methods. Spark’s numeric operations are implemented with a streaming
algorithm that allows building the model, one element at a time.

These operations are computed and returned as a StatusCounter object by calling

status() method.

The following is a list of numeric methods available in StatusCounter.

S.No Method & Meaning

count()
1
Number of elements in the RDD.

Mean()
2
Average of the elements in the RDD.

Sum()
3
Total value of the elements in the RDD.

Max()
4
Maximum value among all elements in the RDD.

4.2. Spark Applications
No ratings yet
4.2. Spark Applications
19 pages
Apache Spark Installation and Programming Guide
No ratings yet
Apache Spark Installation and Programming Guide
2 pages
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
Unit V
No ratings yet
Unit V
23 pages
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
No ratings yet
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
11 pages
8 PDFsam Apache Spark Tutorial
No ratings yet
8 PDFsam Apache Spark Tutorial
7 pages
Spark Overview: Security
No ratings yet
Spark Overview: Security
4 pages
22 PDFsam Apache Spark Tutorial
No ratings yet
22 PDFsam Apache Spark Tutorial
7 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Structured Streaming Programming Guide - Spark 3.4.0 Documentation
No ratings yet
Structured Streaming Programming Guide - Spark 3.4.0 Documentation
1 page
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Unit 4 Spark Updated
No ratings yet
Unit 4 Spark Updated
86 pages
Final Note
No ratings yet
Final Note
31 pages
Analytics at Large Scale in Spark
No ratings yet
Analytics at Large Scale in Spark
13 pages
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
No ratings yet
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
45 pages
Sumit Kothari Apache Spark and Scala Practical 17
No ratings yet
Sumit Kothari Apache Spark and Scala Practical 17
18 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Apache Spark Installation
No ratings yet
Apache Spark Installation
4 pages
Configuration - Spark 2.3.2 Documentation
No ratings yet
Configuration - Spark 2.3.2 Documentation
20 pages
Bda 5
No ratings yet
Bda 5
21 pages
U-4 Rem
No ratings yet
U-4 Rem
8 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Lec No 10
No ratings yet
Lec No 10
17 pages
Practical 11cdscds
No ratings yet
Practical 11cdscds
4 pages
Installation Et Configuration de Spark
No ratings yet
Installation Et Configuration de Spark
14 pages
Spark
No ratings yet
Spark
160 pages
Pyspark Tutorial
100% (2)
Pyspark Tutorial
27 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Py Spark
No ratings yet
Py Spark
7 pages
Parallel Processing
No ratings yet
Parallel Processing
38 pages
Features of Apache Spark
No ratings yet
Features of Apache Spark
7 pages
Apache Spark
No ratings yet
Apache Spark
100 pages
Bda Unit 5 - Mam
No ratings yet
Bda Unit 5 - Mam
44 pages
06-Apache Spark
No ratings yet
06-Apache Spark
75 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
Lec - Spark
No ratings yet
Lec - Spark
65 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
4 pages
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Apache Spark and Ignite
No ratings yet
Apache Spark and Ignite
4 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Introduction To Spark For Data Engineers / Data Scientists
100% (3)
Introduction To Spark For Data Engineers / Data Scientists
100 pages
Py Spark 3 Quick Reference Guide
No ratings yet
Py Spark 3 Quick Reference Guide
2 pages
10b SparkSubmit BigData 2x
No ratings yet
10b SparkSubmit BigData 2x
6 pages
Big Data Technology: Vietnam National University of HCMC
No ratings yet
Big Data Technology: Vietnam National University of HCMC
39 pages
Data Lake 1
No ratings yet
Data Lake 1
19 pages
Compare Hadoop and Spark.: Table
No ratings yet
Compare Hadoop and Spark.: Table
10 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
SABDE3G06 Big Data Sparks
No ratings yet
SABDE3G06 Big Data Sparks
57 pages
1 PDFsam Apache Spark Tutorial
No ratings yet
1 PDFsam Apache Spark Tutorial
7 pages
Apache Spark Tutorial
100% (1)
Apache Spark Tutorial
6 pages
Installing Apache Spark and Scala: Windows
No ratings yet
Installing Apache Spark and Scala: Windows
3 pages
13 SparkBuildingAndDeploying
No ratings yet
13 SparkBuildingAndDeploying
53 pages
Unit 5
100% (1)
Unit 5
109 pages
Spark Notes
No ratings yet
Spark Notes
6 pages
HDP Developer Apache Pig and Hive
No ratings yet
HDP Developer Apache Pig and Hive
42 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
What Is Apache Spark?
No ratings yet
What Is Apache Spark?
232 pages
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
21 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
21 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
9 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
9 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
15 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
15 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
25 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
25 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
23 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
23 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
13 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
13 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
1 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
1 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
23 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
23 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
9 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
9 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
5 PDFsam Trigonometry RESULTS For IOQM
No ratings yet
5 PDFsam Trigonometry RESULTS For IOQM
2 pages
11 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
11 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
7 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
7 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
25 PDFsam Trigonometry RESULTS For IOQM
No ratings yet
25 PDFsam Trigonometry RESULTS For IOQM
2 pages
21 Pdfsam Ioqm Important CDF
No ratings yet
21 Pdfsam Ioqm Important CDF
2 pages
21 PDFsam IOQM-BY-FIITJEE
No ratings yet
21 PDFsam IOQM-BY-FIITJEE
10 pages
13 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
13 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
13 PDFsam Trigonometry RESULTS For IOQM
No ratings yet
13 PDFsam Trigonometry RESULTS For IOQM
2 pages
1 Pdfsam Ioqm Important CDF
No ratings yet
1 Pdfsam Ioqm Important CDF
2 pages
281 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
281 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
20 pages
89 - PDFsam - Start Sketching and Drawing Now Simple Techniques For Drawing Landscapes, People and Objects
No ratings yet
89 - PDFsam - Start Sketching and Drawing Now Simple Techniques For Drawing Landscapes, People and Objects
8 pages
1 PDFsam IOQm Theory Vedantu
No ratings yet
1 PDFsam IOQm Theory Vedantu
10 pages
71 PDFsam IOQM-BY-FIITJEE
No ratings yet
71 PDFsam IOQM-BY-FIITJEE
10 pages
11 PDFsam IOQm Theory Vedantu
No ratings yet
11 PDFsam IOQm Theory Vedantu
10 pages
51 PDFsam IOQM-BY-FIITJEE
No ratings yet
51 PDFsam IOQM-BY-FIITJEE
10 pages
41 PDFsam IOQM-BY-FIITJEE
No ratings yet
41 PDFsam IOQM-BY-FIITJEE
10 pages
1 PDFsam IOQM-BY-FIITJEE
No ratings yet
1 PDFsam IOQM-BY-FIITJEE
10 pages
301 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
301 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
6 pages
201 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
201 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
20 pages
121 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
121 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
20 pages
31 PDFsam Mathematical Formulae
No ratings yet
31 PDFsam Mathematical Formulae
10 pages
The Architecture of A Question Answering (QA)
No ratings yet
The Architecture of A Question Answering (QA)
65 pages
FC500 Instruction Manual
No ratings yet
FC500 Instruction Manual
166 pages
87-351 Lecture 11 Notes
No ratings yet
87-351 Lecture 11 Notes
10 pages
Denodo
No ratings yet
Denodo
65 pages
Stationware 2016: Digsilent Announces
No ratings yet
Stationware 2016: Digsilent Announces
2 pages
Ticker Short Name Ind Subgroup Best Div Yld:Y
No ratings yet
Ticker Short Name Ind Subgroup Best Div Yld:Y
15 pages
Snap Control
No ratings yet
Snap Control
3 pages
Cloud Computing: Ii History
100% (1)
Cloud Computing: Ii History
3 pages
Togaf 9 Scenario Questions
No ratings yet
Togaf 9 Scenario Questions
5 pages
Data - Read - Table (" /Kutnerdata/Chapter 1 Data Sets/Ch01Pr19.Dat")
No ratings yet
Data - Read - Table (" /Kutnerdata/Chapter 1 Data Sets/Ch01Pr19.Dat")
2 pages
Ivoclar Digital - Scanner-CAD
No ratings yet
Ivoclar Digital - Scanner-CAD
16 pages
Operation Research & Decision Models Edited
No ratings yet
Operation Research & Decision Models Edited
26 pages
Procedures Questions
No ratings yet
Procedures Questions
11 pages
SKJ2413 Object-Oriented Programming (Lab 5 Sheet) : DR Muhd Zalisham Bin Jali
No ratings yet
SKJ2413 Object-Oriented Programming (Lab 5 Sheet) : DR Muhd Zalisham Bin Jali
17 pages
Rfactor2 - Car Modding Tutorials: Mandatory Installation
No ratings yet
Rfactor2 - Car Modding Tutorials: Mandatory Installation
4 pages
Kalman Filter For Dynamic Weighing System
No ratings yet
Kalman Filter For Dynamic Weighing System
6 pages
Tutorial Sheet 2 Deadlocks
No ratings yet
Tutorial Sheet 2 Deadlocks
2 pages
Cpa Exam Syllabus
No ratings yet
Cpa Exam Syllabus
2 pages
Programming Language:: Swift
No ratings yet
Programming Language:: Swift
4 pages
8dio Requiem Professional 1 1 Read Me
No ratings yet
8dio Requiem Professional 1 1 Read Me
16 pages
3 Persiapan Data Mining
No ratings yet
3 Persiapan Data Mining
83 pages
General Examples Using The Crow Model
No ratings yet
General Examples Using The Crow Model
10 pages
Change Request Log
No ratings yet
Change Request Log
1 page
Zotero User Guide PDF
No ratings yet
Zotero User Guide PDF
16 pages
CSHARP C# Interview Questions & Answers
No ratings yet
CSHARP C# Interview Questions & Answers
91 pages
Debugging Questions
No ratings yet
Debugging Questions
7 pages
Information Bottleneck (Slides) - Boris Epshtein Lena Gorelick PDF
No ratings yet
Information Bottleneck (Slides) - Boris Epshtein Lena Gorelick PDF
114 pages
Manual JKSimMet V5.1
100% (5)
Manual JKSimMet V5.1
414 pages
Evaluation of Blast-Induced Ground Vibration Predictors
No ratings yet
Evaluation of Blast-Induced Ground Vibration Predictors
10 pages
Chap 08
No ratings yet
Chap 08
33 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

29 PDFsam Apache Spark Tutorial

Uploaded by

29 PDFsam Apache Spark Tutorial

Uploaded by

Apache Spark

Submit the spark application using the following command:

spark-submit --class SparkWordCount --master local wordcount.jar

 successfully started service 'sparkDriver' on port 42954

15/07/08 13:56:04 INFO Slf4jLogger: Slf4jLogger started

Step 5: Checking output

The commands for checking output in part-00000 file are:

The commands for checking output in part-00001 file are:

S.No Option Description

1 --master spark://host:port, mesos://host:port, yarn, or local.

Whether to launch the driver program locally

3 --class Your application's main class (for Java / Scala apps).

4 --name A name of your application.

Comma-separated list of local jars to include on the

Comma-separated list of maven coordinates of jars to

Comma-separated list of additional remote

Comma-separated list of .zip, .egg, or .py files to

Comma-separated list of files to be placed in the

10 --conf (prop=val) Arbitrary Spark configuration property.

Path to a file from which to load extra properties. If

12 --driver-memory Memory for driver (e.g. 1000M, 2G) (Default: 512M).

13 --driver-java-options Extra Java options to pass to the driver.

14 --driver-library-path Extra library path entries to pass to the driver.

Extra class path entries to pass to the driver.

16 --executor-memory Memory per executor (e.g. 1000M, 2G) (Default: 1G).

17 --proxy-user User to impersonate when submitting the application.

18 --help, -h Show this help message and exit.

19 --verbose, -v Print additional debug output.

20 --version Print the version of current Spark.

21 --driver-cores NUM Cores for driver (Default: 1).

22 --supervise If given, restarts the driver on failure.

23 --kill If given, kills the driver specified.

24 --status If given, requests the status of the driver specified.

25 --total-executor-cores Total cores for all executors.

Number of cores per executor. (Default: 1 in YARN

 Broadcast variables: used to efficiently, distribute large values.

 Accumulators: used to aggregate the information of particular collection.

Broadcast variables are created from a variable v by calling

scala> val broadcastVar = sc.broadcast(Array(1, 2, 3))

broadcastVar: org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0)

An accumulator is created from an initial value v by calling

scala> val accum = sc.accumulator(0)

scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)

Numeric RDD Operations

These operations are computed and returned as a StatusCounter object by calling

The following is a list of numeric methods available in StatusCounter.

S.No Method & Meaning

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.