0% found this document useful (0 votes)
33 views18 pages

Sumit Kothari Apache Spark and Scala Practical 17

scala and apache practical

Uploaded by

varunbhingarde1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views18 pages

Sumit Kothari Apache Spark and Scala Practical 17

scala and apache practical

Uploaded by

varunbhingarde1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

R.A.

Podar College of Commerce and


Economics (Autonomous)

Name : Sumit Kothari


Roll number : 17
Class : TY
Division : A
Programme : BSc Data Science and Analytics
Semester : V
Course name : Apache Spark and Scala Practical
Course code : 60506
Academic year : 2024-25
Faculty in-charge : Sita Nadar
Certificate

This is to certify that the journal of course Apache


Spark and Scala Practical has been satisfactorily
completed by Sumit Kothari as a partial fulfilment of
TY BSc(Data Science & Analytics) Semester V during
the academic year 2024 – 2025.

Date: 27th September, 2024

_______________
Faculty in-charge
Apache Spark and Scala Practical
INDEX

Sr Title Date of Signature


No submission
1 To setup and configure Scala 16.07.2024

2 To setup and configure 06.08.2024


Apache Spark
3 To perform basic 13.08.2024
mathematical operations in
Scala
4 To perform basic operations 20.08.2024
on collection and loops.
5 To create functions and 27.08.2024
procedures in Scala.
6. To implement a basic word 24.09.2024
count program using Spark
RDDs.
Name: Sumit Kothari
Roll number: 17
Practical 1
Aim: Write installation steps of scala on Windows.
Steps
*Scala is a very compatible language and thus can very easily be installed into the Windows.
*The most basic requirement is that we must have Java 1.8 or a greater version installed on
our computer.
*Verifying Java Packages: The first thing we need to have is a Java Software Development
Kit(SDK)
installed on the computer. We need to verify this SDK packages and if not installed then
install them. Just
go to the Command line(For Windows, search for cmd in the Run dialog( + R).
*Now run the following command:
java -version
Once this command is executed the output will show the java version and the output will be
as follows:

In case we are not having the SDK installed then download the latest version according to the
computer requirements
from oracle.com and just proceed with the installation.

Downloading and Installing Scala:

Downloading Scala: Before starting with the installation process, you need to download it.
For that, all
versions of Scala for Windows are available on scala-lang.org

Download the Scala and follow the further instructions for the installation of Scala.
Beginning with the Installation:

Getting Started:

Move on to Installing
Installation Process:

Finished Installation:
After completing the installation process, any IDE or text editor can be used to write Scala
Codes and Run them on the
IDE or the Command prompt with the use of command:
scalac file_name.Scala
scala class_name
Name: Sumit Kothari
Roll number: 17
Practical 2
Aim: Write steps to setup and configure Apache Spark
Prerequisites:
• Java Development Kit (JDK): Ensure JDK 8 or later is installed.
• Apache Spark: Download the latest stable version from the official website
(https://spark.apache.org/downloads.html).
• Hadoop: If you're using Hadoop for distributed processing, download and install it.
• Scala (Optional): If you plan to write Spark applications in Scala, install Scala.
Setup and Configuration:
1. Unzip Spark: Extract the downloaded Spark distribution to a desired location.
2. Set Environment Variables:
o SPARK_HOME: Set this variable to the extracted Spark directory.
o JAVA_HOME: Set this variable to the directory where your JDK is installed.
o HADOOP_HOME: If using Hadoop, set this variable to the Hadoop
installation directory.
o PATH: Add the bin directory of Spark and Hadoop (if applicable) to your
system's PATH.
3. Run Spark:
o Local Mode: To run Spark locally on your machine, open a terminal and
navigate to the bin directory of Spark. Execute the following command:
Bash
./spark-shell
o Standalone Mode: For a standalone cluster, configure spark-env.sh in the conf
directory of Spark. Set the necessary properties (e.g., spark.master,
spark.executor.cores, spark.executor.memory) and start the master and worker
nodes.
o YARN Mode: If using YARN, configure spark-defaults.conf and submit
applications using the yarn client.
o Mesos Mode: If using Mesos, configure spark-defaults.conf and submit
applications using the mesos client.
Name: Sumit Kothari
Roll number: 17
Practical 3
Aim: Write a scala program to perform basic mathematical operations in scala
Source code & Output:
Name: Sumit Kothari
Roll number: 17
Practical 4
4.1
Aim: Write a scala program to compute the sum of the two given integer values. If the two
values are the same, then return triple their sum.
Source code & Output:
4.2
Aim: Write a scala program to compute the sum of the two given integer values. If the two
values are the same, then return triples their sum.
Source code & Output:

4.3
Aim: Write a scala program to print the table of a number.
Source code & Output:
Name: Sumit Kothari
Roll number: 17
Practical 5
5.1
Aim: Write a program to greet the user
Source code & Output:

5.2
Aim: Write a recursive function that calculates the factorial
Source code & Output:
5.3
Aim: Write a program to print a List
Source code & Output:

5.4
Aim: Write a program to add two numbers
Source code & Output:
5.5
Aim: Write a higher order function to apply functions to a list
Source code & Output:

5.6
Aim: Write an anonymous to filter even numbers
Source code & Output:
Name: Sumit Kothari
Roll number: 17
Practical 06
Aim: To implement a basic word count program using Spark RDDs.
Steps:
Name:
Roll number:

1. Create a text file containing repeated words and save it in the C: drive. This file will
be used for the Word Count program.

2. Open Windows PowerShell, change the directory to the Spark bin folder using the cd
command, and then execute the command spark-shell to launch the Spark REPL.
3. Next, load your text file into Spark by typing the following command in the Spark
shell.

4. Next, execute the command text.collect() to display the contents of the file loaded into
Spark.

5. Now, run the command given below.

6. Next, run counts.collect() to display the list of words after splitting each line of text.

7. Use the command given below to map each word to a key-value pair where the word
is the key, and the value is 1.

8. Type in the next command to retrieve and display the results of the mapped RDD
from Spark to the driver program as a list.
9. The command written below is used to aggregate the values of each key in the RDD
mapf by summing them up, creating a new RDD reducef with the total counts for each unique
key.

10. Use reducef.collect to retrieve and display the aggregated results from the reducef
RDD.

11. Enter the command mentioned below in order to save the aggregated results from the
reducef RDD to the "spark_output" folder in the C drive. This folder will be created
automatically. One does not need to create it manually.

12. The folder can be accessed on the C drive.


13. Navigate to the "spark_output" folder, open the file named "part-00000," and you'll
find the words along with their counts from the text file created in step 1.

Output:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy