Click here to Skip to main content
14,430,927 members
Rate this:
Please Sign up or sign in to vote.
See more:
Hi,
I am trying to run few spark commands using SparkR (from local R-GUI). For setting up the spark cluster on EC2 I used most of the commands from ( https://edgarsdatalab.com/2016/08/25/setup-a-spark-2-0-cluster-r-on-aws/) with little modification to install the latest versions. All I was trying to do is to interact with remote spark(on EC2-Ubuntu) from my local R-GUI using SparkR package.

**Here is my setup (step by step):**

1. I have Windows 8.1 on my PC with R3.3.3 and SparkR package.
2. I created an AWS-EC2 instance (free tier account) and used existing Ubuntu image from Amazon.
3. Installed PuTTy on my local PC. Used PuTTy terminal to connect to Ubuntu-16 (on EC2) and used it for steps 4 to 10 below.
4. Installed Java and then spark-2.1.1-bin-hadoop2.7 on EC2
5. Added following to .bashrc (/home/ubuntu)

export SPARK_HOME=~/server/spark-2.1.1-bin-hadoop2.7

PATH=$PATH:$SPARK_HOME/bin

export PATH


6. Load modified file.

. .bashrc

7. Installed R on EC2-Ubuntu
8. I created another instance on EC2 (with Ubuntu) and followed steps 4 to 6(above) to set up spark worker node.
9. On the first EC2 instance (call it Master instance), I started spark master using start-master.sh. Got master's URL from web-ui for spark.
10. On the second EC2 instance (call it Slave instance), I started spark slave using start-slave.sh and passing spark master's URL.
11. Then launched R (GUI) on my local PC.
12. Ran following from R to connect and to execute commands in spark. (in the following xx.yy.zz.aa is the spark master's public ip address).

library(SparkR)

sparkR.session(master = "spark://xx.yy.zz.aa:7077", sparkHome = "/home/ubuntu/server/spark-2.1.1-bin-hadoop2.7", enableHiveSupport=FALSE)

ds <- createDataFrame(mtcars) ## R becomes unresponsive

13. when I killed the process from Spark web UI after waiting for long enough. I get following error (see screenshot):
[Screenshot]

Please help. what am I doing wrong? How can I fix this? All I want to do is to use remote spark from local PC using R interface(local PC).

Thanks,
SG

What I have tried:

- in sparkR.Session(), I tried passing public and private address of EC2 first instance(master).
- I also tried installing R on both EC2 instances. Even uninstalling R from both didn't work.
- Also, tried launching spark master and slave on same EC2-Ubuntu (the first EC2).
- Ran R inside EC2-Ubuntu instance that had both master and slave running on same EC2. Nothing worked
Posted
Updated 9-Jul-17 18:45pm
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100