Spark Installation
Spark Installation
com/download
2. Install JAVA 17 (or 8, 11). DO NOT INSTALL JAVA 21+
3. Download a prebuilt version of Apache Spark 3 from https://spark.apache.org/downloads.html
Extract to C:\spark (end up with C:\spark\bin, C:\spark\conf, etc.)
4. Open the the c:\spark\conf folder, and make sure “File Name
Extensions” is checked in the “view” tab of Windows Explorer. Rename
the log4j2.properties.template file to log4j2.properties. Edit this file
(using Wordpad or something similar) and change the error level from
“info” to “error” for log4j.rootCategory
5. Right-click your Windows menu, select Control Panel, System and
Security, and then System. Click on “Advanced System Settings” and
then the “Environment Variables” button.
a. Add the following new USER variables:
i. SPARK_HOME c:\spark
ii. PYSPARK_PYTHON python
b. Add the following path to your PATH user variable:
%SPARK_HOME%\bin
6. Open Anaconda Terminal
7. Create & Activate virtual env
a. conda create -n py310 python=3.10
b. conda activate py310
8. Enter cd c:\spark and then dir to get a directory listing. Look for a
text file we can play with, like README.md or CHANGES.txt
9. Enter pyspark
10. Enter rdd = sc.textFile(“README.md”) (or whatever text file
you’ve found) Enter rdd.count()
11. Enter quit() to exit the spark shell, and close the console window
Spark program
12. Create Working folder (“C:\SparkPython”) and cd to it