Jonathan Jonathan - 2 months ago 39
R Question

How do I run R script for sparkR?

I am running sparkR 2.0.0 from the terminal, and I can run R commands. However, how do I create a .r script and be able to run in it within the spark session.

Answer

SparkR uses standard R interpreter so the same rules apply. If you want to execute external script inside current session use source function.

## Welcome to
##    ____              __ 
##   / __/__  ___ _____/ /__ 
##  _\ \/ _ \/ _ `/ __/  '_/ 
## /___/ .__/\_,_/_/ /_/\_\   version  2.1.0-SNAPSHOT 
##    /_/ 
##
##
## SparkSession available as 'spark'.
> sink("test.R")
> cat("print(head(createDataFrame(mtcars)))")
> sink()
> source("test.R")
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

If you want to submit a standalone script outside existing SparkR session you should initialize required context in the script itself. After that you can execute it using SPARK_HOME/bin/spark-submit (preferred option) or even Rscript.

Comments