-
Notifications
You must be signed in to change notification settings - Fork 24
Build and Package
In order for EclairJS Client to talk to Apache Spark, it needs a instance of Apache Toree running and Toree must be able to connect to your Spark master.
Prerequisites
- Java 8 update 70 or higher
Instructions
-
Download Apache Spark 2.0.0 built with Hadoop 2.7 and extract it from the archive.
-
Install Jupyter (
pip install jupyterfor example) and the Jupyter Kernel Gateway (pip install jupyter-kernel-gateway) -
Download and build Apache Toree
$ git clone https://github.com/apache/incubator-toree
$ cd incubator-toree
$ git checkout e8ecd0623c65ad104045b1797fb27f69b8dfc23f
$ make dist
$ make sbt-publishM2This will create a dist directory containing dist/toree/bin/run.sh
-
Download the EclairJS Server JAR file from Maven (http://repo2.maven.org/maven2/org/eclairjs/eclairjs-nashorn/${ECLAIRJS_VERSION}/eclairjs-nashorn-${ECLAIRJS_VERSION}-jar-with-dependencies.jar) (replace
${ECLAIRJS_VERSION}with the version you are using) -
Download kernel.json and replace the following:
-
/usr/local/share/jupyter/kernels/apache_toree_scala/bin/run.shwith the location of your installed Apache Toree (/usr/local/incubator-toree/dist/toree/bin/run.shfor example on OSX, see the Location in step 3) -
"SPARK_HOME"should point to the extracted Apache Spark directory (spark-2.0.0-bin-hadoop2.7) -
/opt/nashorn/lib/eclairjs.jarshould point at the JAR file downloaded in step 4. If you run into memory issues (such as out of memory errors), you can up the memory limit in thekernel.jsonfile by adding--driver-memory 8gtoSPARK_OPT
- Figure out your Jupyter data directory by running:
$ jupyter --data
/Users/youruser/Library/JupyterCopy kernel.json to kernels/eclair/ in the directory you got above.
- Start Jupyter:
jupyter notebook --no-browserIf you get an error similar to '_xsrf' argument missing from POST, this means you are running a version of Jupyter that has Token Authentication enabled. Please read this guide on how to fix this.
- Test EclairJS Client
To make sure everything is working, create a simple EclairJS Client example.
Create a file called package.json:
{
"name": "eclairjs-test",
"version": "0.1.0",
"dependencies": {
"eclairjs": "*"
}
}
And a file called test.js:
var eclairjs = require('eclairjs');
var spark = new eclairjs();
var sc = new spark.SparkContext("local[*]", "Simple Text");
var data = sc.parallelize([1,2,3,4,5,6,7,8,9,0]);
data.collect().then(function(val) {
console.log("Success:", val);
sc.stop().then(process.exit);
}).catch(function(err) {
console.log("Error:", err);
sc.stop().then(process.exit);
});
Install the dependencies:
$ npm installNow we are ready to actually run the example:
$ node --harmony test.js
Starting WebSocket: ws://127.0.0.1:8888/api/kernels/436e67e6-2605-4085-9c5d-ba43d828a038
got kernel
Success: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 ]If you get an error similar to API request failed or Failed to connect to Jupyter instance, please follow this guide on how to fix this.
To run test suite:
$ npm run integration-test