0% found this document useful (0 votes)
7 views1 page

Spark Installation

This document provides step-by-step instructions for installing and configuring Apache Spark on a Windows system. It includes details on installing Anaconda, setting up JAVA, downloading Spark, modifying configuration files, and creating a virtual environment. Additionally, it outlines how to run a simple Spark program using a text file.

Uploaded by

An Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views1 page

Spark Installation

This document provides step-by-step instructions for installing and configuring Apache Spark on a Windows system. It includes details on installing Anaconda, setting up JAVA, downloading Spark, modifying configuration files, and creating a virtual environment. Additionally, it outlines how to run a simple Spark program using a text file.

Uploaded by

An Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

1. Install Anaconda from https://www.anaconda.

com/download
2. Install JAVA 17 (or 8, 11). DO NOT INSTALL JAVA 21+
3. Download a prebuilt version of Apache Spark 3 from https://spark.apache.org/downloads.html
Extract to C:\spark (end up with C:\spark\bin, C:\spark\conf, etc.)

4. Open the the c:\spark\conf folder, and make sure “File Name
Extensions” is checked in the “view” tab of Windows Explorer. Rename
the log4j2.properties.template file to log4j2.properties. Edit this file
(using Wordpad or something similar) and change the error level from
“info” to “error” for log4j.rootCategory
5. Right-click your Windows menu, select Control Panel, System and
Security, and then System. Click on “Advanced System Settings” and
then the “Environment Variables” button.
a. Add the following new USER variables:
i. SPARK_HOME c:\spark
ii. PYSPARK_PYTHON python
b. Add the following path to your PATH user variable:

%SPARK_HOME%\bin
6. Open Anaconda Terminal
7. Create & Activate virtual env
a. conda create -n py310 python=3.10
b. conda activate py310
8. Enter cd c:\spark and then dir to get a directory listing. Look for a
text file we can play with, like README.md or CHANGES.txt
9. Enter pyspark
10. Enter rdd = sc.textFile(“README.md”) (or whatever text file
you’ve found) Enter rdd.count()
11. Enter quit() to exit the spark shell, and close the console window

Spark program
12. Create Working folder (“C:\SparkPython”) and cd to it

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy