No results found

Your search did not match any results.

We suggest you try the following to help find what you're looking for:

  • Check the spelling of your keyword search.
  • Use synonyms for the keyword you typed, for example, try "application" instead of "software."
  • Try one of the popular searches shown below.
  • Start a new search.
Trending Questions
 

How to Deploy Spark Standalone in Oracle Cloud (OCI)

The following walk-through guides you through the steps needed to set up your environment to run Spark and Hadoop in Oracle Cloud Infrastructure.

Author: Olivier Francois Xavier Perard

Updated:

1 Introduction

The following walk-through guides you through the steps needed to set up your environment to run Spark and Hadoop in Oracle Cloud Infrastructure.

2 Prerequisites

You have deployed a VM 2.1 or + with Oracle Linux 7.9 (OEL7) in Oracle Cloud Infrastructure (OCI).

  • The installation of Oracle Linux 7.9 is using a JVM by default.
  • You have access to root either directly or via sudo. By default in OCI, you are connected like “opc” user with sudo privilege.
    [opc@xxx ~]$ java -version
                        java version "1.8.0_281"
                        Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
                        Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)
                    

3 Java Installation

The install is quite simple. It consists of setting up Java, installing Spark and Hadoop components and libraries. Lets start with setting up the Spark and Hadoop environment.

Download the last version of JDK 1.8 because Hadoop 2.X is using this Java version.

rpm -ivh /home/opc/jdk-8u271-linux-x64.rpm
                    

Check Java Version.

java -version
                    

4 Spark and Hadoop Setup

The next step is to install Spark and Hadoop environment.

First, choose the version of Spark and Hadoop you want to install. Then, download the version you want to install:

Download Spark 2.4.5 for Hadoop 2.7

cd /home/opc
                    wget http://apache.uvigo.es/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
                    

Download Spark 2.4.7 for Hadoop 2.7

wget http://apache.uvigo.es/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz
                    

Download Spark 3.1.1 for Hadoop 3.2

wget http://apache.uvigo.es/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
                    

Install the Spark and Hadoop Version

Install the Spark and Hadoop Version choosen in the directory “/opt”.

sudo -i
                    cd /opt
                    tar -zxvf /home/opc/spark-2.4.5-bin-hadoop2.7.tgz
                    #or 
                    tar -zxvf /home/opc/spark-2.4.7-bin-hadoop2.7.tgz
                    #or
                    tar -zxvf /home/opc/spark-3.1.1-bin-hadoop3.2.tgz
                    

5 Install PySpark in Python3 environment

/opt/Python-3.7.6/bin/pip3 install 'pyspark=2.4.7'
                    /opt/Python-3.7.6/bin/pip3 install findspark
                    

Next we shall create a virtual environment and enable it.

Modify your environment to use this Spark and Hadoop Version

Add to “.bashrc” for the user “opc” the following lines:

# Add by %OP%
                    export PYTHONHOME=/opt/anaconda3
                    export PATH=$PYTHONHOME/bin:$PYTHONHOME/condabin:$PATH
                    
                    # SPARK ENV
                    #export JAVA_HOME=$(/usr/libexec/java_home)
                    export SPARK_HOME=/opt/spark-2.4.5-bin-hadoop2.7
                    export PATH=$SPARK_HOME/bin:$PATH
                    export PYSPARK_PYTHON=python3
                    
                    export PYSPARK_DRIVER_PYTHON=jupyter
                    export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
                    

6 Test your Spark and Hadoop Environment

If you’re running directly on a virtual machine and have a browser installed it should take you directly into the jupyter environment. Connect to your “http://xxx.xxx.xxx.xxx:8001/”.

And upload the next notebooks: