Skip to content

mahen-github/spark-scala-gradle-bootstrap

Repository files navigation

spark-scala-gradle-bootstrap

A Spark bootstrap project written in Scala with gradle as build tool.

Prerequisites

Libraries Included

  • JavaVersion=1.11
  • sparkVersion=3.4.1
  • scalaVersion=2.12
  • deltaVersion=2.4.0

Build

java -version

openjdk version "11.0.20" 2023-07-18
OpenJDK Runtime Environment Homebrew (build 11.0.20+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.20+0, mixed mode)

./gradlew clean build

Test

./gradlew check

Run TestCoverage

./gradlew reportTestScoverage

Run

Run sparkSubmit task

Gradle sparkSubmit task is configured to run with class the dev.template.spark.RddCollect

./gradlew sparkSubmit

Running spark-submit in local mode

Run the Wordcount App

${SPARK_HOME}/bin/spark-submit \
	--verbose  \
	--class dev.template.spark.Main \
	--packages io.delta:delta-core_2.12:2.4.0 \
	--master "local[2]" \
	--driver-memory 1g \
	--executor-memory 1g \
	--executor-cores 2 \
	build/libs/spark-scala-gradle-bootstrap-2.12.0-all.jar \
	src/main/resources/people-example.csv \
Run the Main class reads people-example.csv and get the average age
${SPARK_HOME}/bin/spark-submit \
	--verbose \
	--class dev.template.spark.Main \
	--packages io.delta:delta-core_2.12:2.4.0 \
	--master spark://localhost:7077 \
	--driver-memory 1g \
	--executor-memory 1g \
	--executor-cores 2 \
	build/libs/spark-scala-gradle-bootstrap-2.12.0-all.jar \
	src/main/resources/people-example.csv \
Run a simple app RddCollect with spark session in local
${SPARK_HOME}/bin/spark-submit \
	--class dev.template.spark.RddCollect \
	--master spark://localhost:7077 \
	build/libs/spark-scala-gradle-bootstrap-2.12.0-all.jar
Run CovidDataPartitioner app reads covid deaths in US counties and partitioned by reported date
${SPARK_HOME}/bin/spark-submit \
	--class dev.template.spark.CovidDataPartitioner \
	--packages io.delta:delta-core_2.12:2.4.0 \
	--master "local[2]" \
	--driver-memory 1g \
	--executor-memory 1g \
	--executor-cores 2 \
	build/libs/spark-scala-gradle-bootstrap-2.12.0-all.jar \
	src/main/resources/us-counties-recent.csv \
	/tmp/partitioned-covid-data

Coverage

https://github.com/scoverage/gradle-scoverage

Functional Test Examples

https://github.com/scoverage/gradle-scoverage/blob/master/build.gradle#L59C1-L59C52

Useful Links

Issues or Suggestions

https://github.com/mahen-github/spark-scala-gradle-bootstrap/issues

Learn Spark

https://www.databricks.com/wp-content/uploads/2021/06/Ebook_8-Steps-V2.pdf

References

https://github.com/spark-examples/spark-scala-examples

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy