0% found this document useful (0 votes)

31 views8 pages

v2 3 Running+PySpark+on+Jupyter+NoteBook

The document provides steps to setup and configure Jupyter notebook and PySpark environment on an EC2 instance. It includes verifying prerequisite software, configuring Jupyter, setting environment variables and testing PySpark setup on Jupyter notebook. The document contains detailed instructions with commands and code snippets for complete configuration.

Uploaded by

Junaid Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views8 pages

v2 3 Running+PySpark+on+Jupyter+NoteBook

Uploaded by

Junaid Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Prerequisite:

1. Anaconda installation
2. Java 1.8
3. Spark 2.3

1. Verify the installation

a. Run the following command to verify the anaconda installation

conda --version

b. Run the following command to verify Java 8 installation

java -version

c. Run the following command to verify Spark2 installation

spark2-shell --version
2. Configure Jupyter(One-time process)
Jupyter notebook is already installed in our CDH instance with Anaconda. However, we need to
configure it before we can actually run it.

a. Run the following command to generate jupyter configuration file.

jupyter notebook --generate-config

You can see jupyter_notebook_config.py has been created inside the /root/.jupyter directory.

b. Allow access to your remote Jupyter server

Open the jupyter_notebook_config.py file

vi .jupyter/jupyter_notebook_config.py

Press “I” to get into the insert mode

Copy and paste the below two lines as shown in the screenshot.
c.NotebookApp.allow_origin = ' *' #allow all origins
c.NotebookApp.ip = '0.0.0.0' # listen on all IPs

To exit
> Press ‘Esc’ > Type :wq! > Hit Enter
3. Now, use the following command to run the Jupyter Notebook

jupyter notebook --port 7861 --allow-root

You will see the Jupyter server as started.

Note: Don’t kill this process, keep it running until you are done using your Jyputer notebook.
To stop/close your Jyputer Notebook open this terminal and press Ctrl + C

You can see it has given you a URL where your Jupyter notebook is running. In my case the
URL is

http://(ip-10-0-0-228.ec2.internal or 127.0.0.1):7861/?token=5935c2bdf10c2212013af0b4e9e7cd98cd22ca03eeaf4005

In order to open the Jupyter notebook on your browser, replace the highlighted part with the
Public IP of your EC2 Instance.

In my case, the final URL will be-

http://3.89.129.54:7861/?token=5935c2bdf10c2212013af0b4e9e7cd98cd22ca03eeaf4005
Now, you can open your web browser and copy-paste your final URL to open the Jupyter
notebook. You should be able to see the Jupyter home page.

4. PySpark on Jupiter

In order to run PySpark on Jupyter notebook, you need to paste and run the following
few lines of codes in your every PySpark Notebook.

import os
import sys
os.environ["PYSPARK_PYTHON"] = "/opt/cloudera/parcels/Anaconda/bin/python"
os.environ["JAVA_HOME"] = "/usr/java/jdk1.8.0_161/jre"
os.environ["SPARK_HOME"] = "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.6-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")

Note: The value for the above environment variables may be different in your case, we suggest
you verify these values before using it. Please follow the below-mentioned steps for the same.
i. Anaconda and Python

os.environ["PYSPARK_PYTHON"] = "/opt/cloudera/parcels/Anaconda/bin/python"

Make sure Anaconda is installed in the /opt/cloudera/parcels/ director. To Verify please check
whether the Anaconda directory is present under /opt/cloudera/parcels/ path.

ls /opt/cloudera/parcels/

ii. Java Home

os.environ["JAVA_HOME"] = "/usr/java/jdk1.8.0_161/jre"

Run the following command to get your Java_Home path

echo $JAVA_HOME

Hence /usr/java/jdk1.8.0_161/jre will be the value of our JAVA_HOME variable

iii. Spark Home

For the Spark home path in the following line-
os.environ["SPARK_HOME"] = "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/"

The name of this Spark file(highlighted above) be exactly the same as it is present under your
/opt/cloudera/parcels/

Run the following command to check/verify and replace it in case you have a different
version/distribution installed.

ls /opt/cloudera/parcels/
iv. Version of py4j and pyspark.zip file

sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.6-src.zip")

sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")

After you have identified your correct Spark home path this step is to verify the version of the
py4j file.

For that run the following command(make sure to modify this command according to your Spark
home identified in the 3rd point)

ls /opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib

You should be able to see the py4j and p

yspark file. Make sure you modify this according to
your instance in the above code.
5. Test your PySpark setup

a. Open a new Jupyter Notebook

b. Copy-paste the environment variable that you have finalized in the 4th point into a cell.
(You need to do it for every pySpark notebook which you will create)

c. Run the cell you should not see any error.

d. Now, let’s initialize the SparkContext object. Copy-paste the following code in a new cell
> Run the cell
from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("jupyter_Spark").setMaster("yarn-client")
sc = SparkContext(conf=conf)
sc

You should be able see the following output

This means your pySpark is working fine.

6. Closing Jupyter Notebook

To stop the notebook open the terminal/Putty window where the Jupyter process is running and
press Ctrl + C.
Enter ‘y’ to shut down the cluster.

VHS To DVD 7.0: Honestech
No ratings yet
VHS To DVD 7.0: Honestech
74 pages
Network Security All-in-one: ASA Firepower WSA Umbrella VPN ISE Layer 2 Security
From Everand
Network Security All-in-one: ASA Firepower WSA Umbrella VPN ISE Layer 2 Security
Redouane MEDDANE
No ratings yet
JUNIPER SRX Security-Policies
No ratings yet
JUNIPER SRX Security-Policies
736 pages
Install Spark On Windows 10-MacOS
No ratings yet
Install Spark On Windows 10-MacOS
23 pages
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
Create EC2 Instance Containing Jupyter Server
No ratings yet
Create EC2 Instance Containing Jupyter Server
5 pages
Spark Python Install
No ratings yet
Spark Python Install
3 pages
Pyspark Tutorial
100% (2)
Pyspark Tutorial
27 pages
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Installation Lectures
No ratings yet
Installation Lectures
41 pages
Understanding Software Engineering Vol 3: Programming Basic Software Functionalities.
From Everand
Understanding Software Engineering Vol 3: Programming Basic Software Functionalities.
Gabriel Clemente
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Spark Installation Mac
No ratings yet
Spark Installation Mac
1 page
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Cisco Packet Tracer for Beginners
From Everand
Cisco Packet Tracer for Beginners
kalyan chinta
5/5 (3)
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
Colab Spark Initialize Step
No ratings yet
Colab Spark Initialize Step
1 page
PySpark Exam Setup and Basic Code Guide
No ratings yet
PySpark Exam Setup and Basic Code Guide
4 pages
Apache Spark
No ratings yet
Apache Spark
8 pages
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Server Setup For Jupyter Notebook
No ratings yet
Server Setup For Jupyter Notebook
12 pages
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
Pellegrino Jupyter Isgc2022
No ratings yet
Pellegrino Jupyter Isgc2022
39 pages
SIC - Big Data - Chapter 6 - Workbook
No ratings yet
SIC - Big Data - Chapter 6 - Workbook
133 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Build your own Blockchain: Make your own blockchain and trading bot on your pc
From Everand
Build your own Blockchain: Make your own blockchain and trading bot on your pc
Magelan Cybersecurity
No ratings yet
The Complete Guide to Installing Parrot OS
From Everand
The Complete Guide to Installing Parrot OS
mehul kothari
No ratings yet
CIS612 SparkInstallation Ubuntun
No ratings yet
CIS612 SparkInstallation Ubuntun
10 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
My Jupyter Docker Full Stack
No ratings yet
My Jupyter Docker Full Stack
33 pages
Jupyter PDF
No ratings yet
Jupyter PDF
114 pages
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
From Everand
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
Ian Taylor
No ratings yet
Mastering Go Network Automation
From Everand
Mastering Go Network Automation
Ian Taylor
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Spark Installation
No ratings yet
Spark Installation
1 page
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Jupyter PDF
No ratings yet
Jupyter PDF
39 pages
Make Sure You Have Virtualenv Installed Create A Virtual: Ipython Kernel Install - User - Name .Venv
No ratings yet
Make Sure You Have Virtualenv Installed Create A Virtual: Ipython Kernel Install - User - Name .Venv
3 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-7
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-7
26 pages
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Jupyter Notebook Commands
No ratings yet
Jupyter Notebook Commands
10 pages
XProc 3.0 Programmer Reference
From Everand
XProc 3.0 Programmer Reference
Erik Siegel
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hedaya Alasooly
No ratings yet
Spark: Big Data Cluster Computing in Production
From Everand
Spark: Big Data Cluster Computing in Production
Ilya Ganelin
No ratings yet
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hidaia Mahmood Alassouli
No ratings yet
Overview of Some Windows and Linux Intrusion Detection Tools
From Everand
Overview of Some Windows and Linux Intrusion Detection Tools
Dr. Hidaia Mahmood Alassouli
No ratings yet
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
From Everand
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
Dr. Hedaya Mahmood Alasooly
No ratings yet
Pyspark
No ratings yet
Pyspark
10 pages
Data Science For Big Data: Runtime Distribution For Hadoop and Spark Jobs
No ratings yet
Data Science For Big Data: Runtime Distribution For Hadoop and Spark Jobs
2 pages
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
From Everand
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
Andrew Lee
3/5 (2)
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
200 pages
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
Big Book of MLOps 2nd Edition
No ratings yet
Big Book of MLOps 2nd Edition
78 pages
MATLAB Short Notes
No ratings yet
MATLAB Short Notes
312 pages
Electrical Machines Lab Manual Student
No ratings yet
Electrical Machines Lab Manual Student
59 pages
Romexis Manual
No ratings yet
Romexis Manual
204 pages
DG8045 Datasheet 03
No ratings yet
DG8045 Datasheet 03
2 pages
Xps 13 9305 Laptop - Administrator Guide - en Us
No ratings yet
Xps 13 9305 Laptop - Administrator Guide - en Us
15 pages
Complete Glossary of Terms For Windows XP Operating System
No ratings yet
Complete Glossary of Terms For Windows XP Operating System
38 pages
Black Miner F1 Operation Manual
No ratings yet
Black Miner F1 Operation Manual
10 pages
Subnetting Practice Questions & Answers
67% (3)
Subnetting Practice Questions & Answers
6 pages
3.operating System - Process
No ratings yet
3.operating System - Process
19 pages
Chapter 3 (3.2.1) Apply Assembly Language
No ratings yet
Chapter 3 (3.2.1) Apply Assembly Language
24 pages
Unit-I SSOS Notes
No ratings yet
Unit-I SSOS Notes
61 pages
Cisco HyperFlex Study Notes (Part 1) - InfraPCS
No ratings yet
Cisco HyperFlex Study Notes (Part 1) - InfraPCS
5 pages
Cimon Scada Manuel
No ratings yet
Cimon Scada Manuel
513 pages
Sve Za Laptop Cenovnik 28-04-2015
No ratings yet
Sve Za Laptop Cenovnik 28-04-2015
17 pages
Contoh Surat Perjanjian Kerja Karyawan Kontrak
No ratings yet
Contoh Surat Perjanjian Kerja Karyawan Kontrak
2 pages
CY28447 - ETC PLL Clock Generator Datasheet
No ratings yet
CY28447 - ETC PLL Clock Generator Datasheet
21 pages
AWS VM Creation
No ratings yet
AWS VM Creation
4 pages
BL-R7601MU6 Product Specification
No ratings yet
BL-R7601MU6 Product Specification
9 pages
Windows - Airsnort
No ratings yet
Windows - Airsnort
3 pages
Keysight's Ixia Fabric Controller (IFC) Clustering
No ratings yet
Keysight's Ixia Fabric Controller (IFC) Clustering
6 pages
Sinumerik 840C SINUMERIK 880/880 GA2 PLC 135 WB/WB2/WD: Planning Guide 12.93 Edition
No ratings yet
Sinumerik 840C SINUMERIK 880/880 GA2 PLC 135 WB/WB2/WD: Planning Guide 12.93 Edition
129 pages
Lab 11 Access Control List ACL 21122022 115035am
No ratings yet
Lab 11 Access Control List ACL 21122022 115035am
5 pages
MetaFrame Presentation Server Administrators Guide
No ratings yet
MetaFrame Presentation Server Administrators Guide
436 pages
AWS EC2 Interview Questions - MindMajix
No ratings yet
AWS EC2 Interview Questions - MindMajix
27 pages
Data Representation
No ratings yet
Data Representation
34 pages
Kuka Magistarski Etherne
No ratings yet
Kuka Magistarski Etherne
48 pages
Unit-5 PDF Material
No ratings yet
Unit-5 PDF Material
27 pages
Manual SerDia2010 EN PDF
No ratings yet
Manual SerDia2010 EN PDF
217 pages
Computer Components
No ratings yet
Computer Components
59 pages
Troubleshooting: Using Recovery Procedures
No ratings yet
Troubleshooting: Using Recovery Procedures
18 pages
TCPIP Foundation For Engineers
No ratings yet
TCPIP Foundation For Engineers
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

v2 3 Running+PySpark+on+Jupyter+NoteBook

Uploaded by

v2 3 Running+PySpark+on+Jupyter+NoteBook

Uploaded by

Prerequisite:

1. Verify the installation

a. Run the following command to verify the anaconda installation

b. Run the following command to verify Java 8 installation

c. Run the following command to verify Spark2 installation

a. Run the following command to generate jupyter configuration file.

b. Allow access to your remote Jupyter server

Open the jupyter_notebook_config.py file

Press “I” to get into the insert mode

jupyter notebook --port 7861 --allow-root

You will see the Jupyter server as started.

In my case, the final URL will be-

ii. Java Home

Run the following command to get your Java_Home path

Hence /usr/java/jdk1.8.0_161/jre will be the value of our JAVA_HOME variable

iii. Spark Home

sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.6-src.zip")

You should be able to see the py4j and p

a. Open a new Jupyter Notebook

c. Run the cell you should not see any error.

You should be able see the following output

This means your pySpark is working fine.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

v2 3 Running+PySpark+on+Jupyter+NoteBook

Uploaded by

v2 3 Running+PySpark+on+Jupyter+NoteBook

Uploaded by

Prerequisite:

1. Verify the installation

a. Run the following command to verify the anaconda installation

b. Run the following command to verify Java 8 installation

c. Run the following command to verify Spark2 installation

a. Run the following command to generate jupyter configuration file.

b. Allow access to your remote Jupyter server

Open the ​ jupyter_notebook_config.py ​file

Press “I” to get into the insert mode

jupyter notebook --port 7861 --allow-root

You will see the Jupyter server as started.

In my case, the final URL will be-

ii. Java Home

Run the following command to get your Java_Home path

Hence​ /usr/java/jdk1.8.0_161/jre​ will be the value of our JAVA_HOME variable

iii. Spark Home

sys.path.insert(​0​, os.environ[​"PYLIB"​] +​"/​py4j-0.10.6-src​.zip"​)

You should be able to see the ​py4j​ and p

a. Open a new Jupyter Notebook

c. Run the cell you should not see any error.

You should be able see the following output

This means your pySpark is working fine.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Open the jupyter_notebook_config.py file

Hence /usr/java/jdk1.8.0_161/jre will be the value of our JAVA_HOME variable

sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.6-src.zip")

You should be able to see the py4j and p