RedStorm provides the JRuby integration for the Storm distributed realtime computation system.
This has been tested on OSX 10.6.8 and Linux 10.04 using Storm 0.6.2 and JRuby 1.6.6
$ gem install redstorm- create a new empty project directory.
- install the RedStorm gem.
- create a subdirectory which will contain your sources.
- perform the initial setup as described below to install the dependencies in the
target/subdir of your project directory. - run your topology in local mode and/or on a production cluster as described below.
- install RedStom dependencies; from your project root directory execute:
$ redstorm installThe install command will install all Java jars dependencies using ruby-maven in target/dependency and generate & compile the Java bindings in target/classes
DON'T PANIC it's Maven. The first time you run $ redstorm install Maven will take a few minutes resolving dependencies and in the end will download and install the dependency jar files.
- create a topology class. The underscore topology_class_file_name.rb MUST correspond to its CamelCase class name.
Until this is better integrated, you can use gems in local mode and on a production cluster:
-
local mode: simply install your gems the usual way, they will be picked up when run in local mode.
-
production cluster: install your gem in the
target/gemsfolder using:
gem install <the gem> --install-dir target/gems/ --no-ri --no-rdoc$ redstorm local <path/to/topology_class_file_name.rb>See examples below to run examples in local mode or on a production cluster.
- generate
target/cluster-topology.jar. This jar file will include your sources directory plus the required dependencies from thetarget/directory:
$ redstorm jar <sources_directory>- submit the cluster topology jar file to the cluster. Assuming you have the Storm distribution installed and the Storm
bin/directory in your path:
storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster <path/to/topology_class_file_name.rb>Basically you must follow the Storm instructions to setup a production cluster and submit your topology to the cluster.
Install the example files in your project. The examples/ dir will be created in your project root dir.
$ redstorm examplesAll examples using the simple DSL are located in examples/simple. Examples using the standard Java interface are in examples/native.
$ redstorm local examples/simple/exclamation_topology.rb
$ redstorm local examples/simple/exclamation_topology2.rb
$ redstorm local examples/simple/word_count_topology.rbThis next example requires the use of the Redis Gem and a Redis server running on localhost:6379
$ redstorm local examples/simple/redis_word_count_topology.rbUsing redis-cli, push words into the test list and watch Storm pick them up
All examples using the simple DSL can also run on a productions cluster. The only native example compatible with a production cluster is the ClusterWordCountTopology
- genererate the
target/cluster-topology.jarand include theexamples/directory.
$ redstorm jar examples- submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm
bin/directory in your path:
$ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster examples/simple/word_count_topology.rb- to run
examples/simple/redis_word_count_topology.rbyou need a Redis server running onlocalhost:6379and the Redis gem intarget/gemsusing:
gem install redis --install-dir target/gems/ --no-ri --no-rdoc- generate jar and submit:
$ redstorm jar examples
$ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster examples/simple/redis_word_count_topology.rb- using
redis-cli, push words into thetestlist and watch Storm pick them up
Basically you must follow the Storm instructions to setup a production cluster and submit your topology to the cluster.
Your project can be created in a single file containing all spouts, bolts and topology classes or each classes can be in its own file, your choice. There are many examples for the simple DSL.
The DSL uses a callback metaphor to attach code to the topology/spout/bolt execution contexts using on_* DSL constructs (ex.: on_submit, on_send, ...). When using on_* you can attach you code in 3 different ways:
- using a code block
on_receive (:ack => true, :anchor => true) {|tuple| do_something_with(tuple)}
on_receive :ack => true, :anchor => true do |tuple|
do_something_with(tuple)
end- defining the corresponding method
on_receive :ack => true, :anchor => true
def on_receive(tuple)
do_something_with(tuple)
end- defining an arbitrary method
on_receive :my_method, :ack => true, :anchor => true
def my_method(tuple)
do_something_with(tuple)
endThe example SplitSentenceBolt shows the 3 different coding style.
Normally Storm topology components are assigned and referenced using numeric ids. In the SimpleTopology DSL ids are optional. By default the DSL will use the component class name as an implicit symbolic id and bolt source ids can use these implicit ids. The DSL will automatically resolve and assign numeric ids upon topology submission. If two components are of the same class, creating a conflict, then the id can be explicitly defined using either a numeric value, a symbol or a string. Numeric values will be used as-is at topology submission while symbols and strings will be resolved and assigned a numeric id.
require 'red_storm'
class MyTopology < RedStorm::SimpleTopology
spout spout_class, options
bolt bolt_class, options do
source source_id, grouping
...
end
configure topology_name do |env|
config_attribute value
...
end
on_submit do |env|
...
end
endspout spout_class, optionsspout_class— spout Ruby classoptions:id— spout explicit id (default is spout class name):parallelism— spout parallelism (default is 1)
bolt bolt_class, options do
source source_id, grouping
...
endbolt_class— bolt Ruby classoptions:id— bolt explicit id (default is bolt class name):parallelism— bolt parallelism (default is 1)
source_id— source id reference. can be the source class name if unique or the explicit id if definedgrouping:fields => ["field", ...]— fieldsGrouping using fields on the source_id:shuffle— shuffleGrouping on the source_id:global— globalGrouping on the source_id:none— noneGrouping on the source_id:all— allGrouping on the source_id:direct— directGrouping on the source_id
configure topology_name do |env|
configuration_field value
...
endThe configure statement is required.
topology_name— alternate topology name (default is topology class name)env— is set to:localor:clusterfor you to set enviroment specific configurationsconfig_attribute— the Storm Config attribute name. See Storm for complete list. The attribute name correspond to the Java setter method, without the "set" prefix and the suffix converted from CamelCase to underscore. Ex.:setMaxTaskParallelismis:max_task_parallelism.:debug:max_task_parallelism:num_workers:max_spout_pending- ...
on_submit do |env|
...
endThe on_submit statement is optional. Use it to execute code after the topology submission.
env— is set to:localor:cluster
For example, you can use on_submit to shutdown the LocalCluster after some time. The LocalCluster instance is available usign the cluster method.
on_submit do |env|
if env == :local
sleep(5)
cluster.shutdown
end
endrequire 'red_storm'
class MySpout < RedStorm::SimpleSpout
set spout_attribute => value
...
output_fields :field, ...
on_send options do
...
end
on_init do
...
end
on_close do
...
end
on_ack do |msg_id|
...
end
on_fail do |msg_id|
...
end
endset spout_attribute => valueThe set statement is optional. Use it to set spout specific attributes.
spout_attributes:is_distributed— set totruefor a distributed spout (default isfalse)
output_fields :field, ...Define the output fields for this spout.
:field— the field name, can be symbol or string.
on_send options do
...
endon_send relates to the Java spout nextTuple method and is called periodically by storm to allow the spout to output a tuple. When using auto-emit (default), the block return value will be auto emited. A single value return will be emited as a single-field tuple. An array of values [a, b] will be emited as a multiple-fields tuple. Normally a spout should only output a single tuple per on_send invocation.
:options:emit— set tofalseto disable auto-emit (default istrue)
on_init do
...
endon_init relates to the Java spout open method. When on_init is called, the config, context and collector are set to return the Java spout config Map, TopologyContext and SpoutOutputCollector.
on_close do
...
endon_close relates to the Java spout close method.
on_ack do |msg_id|
...
endon_ack relates to the Java spout ack method.
on_fail do |msg_id|
...
endon_fail relates to the Java spout fail method.
require 'red_storm'
class MyBolt < RedStorm::SimpleBolt
output_fields :field, ...
on_receive options do
...
end
on_init do
...
end
on_close do
...
end
endon_receive options do
...
endon_receive relates to the Java bolt execute method and is called upon tuple reception by Storm. When using auto-emit, the block return value will be auto emited. A single value return will be emited as a single-field tuple. An array of values [a, b] will be emited as a multiple-fields tuple. An array of arrays [[a, b], [c, d]] will be emited as multiple-fields multiple tuples. When not using auto-emit, the unanchored_emit(value, ...) and anchored_emit(tuple, value, ...) method can be used to emit a single tuple. When using auto-anchor (disabled by default) the sent tuples will be anchored to the received tuple. When using auto-ack (disabled by default) the received tuple will be ack'ed after emitting the return value. When not using auto-ack, the ack(tuple) method can be used to ack the tuple.
Note that setting auto-ack and auto-anchor is possible only when auto-emit is enabled.
:options:emit— set tofalseto disable auto-emit (default istrue):ack— set totrueto enable auto-ack (default isfalse):anchor— set totrueto enable auto-anchor (default isfalse)
on_init do
...
endon_init relates to the Java bolt prepare method. When on_init is called, the config, context and collector are set to return the Java spout config Map, TopologyContext and SpoutOutputCollector.
on_close do
...
endon_close relates to the Java bolt cleanup method.
- JRuby 1.6.6
- rake gem ~> 0.9.2.2
- ruby-maven gem ~> 3.0.3.0.28.5
- rspec gem ~> 2.8.0
Fork the project, create a branch and submit a pull request.
Some ways you can contribute:
- by reporting bugs using the issue tracker
- by suggesting new features using the issue tracker
- by writing or editing documentation
- by writing specs
- by writing code
- by refactoring code
- ...
- fork project
- create branch
- install dependencies in
target/dependencies
$ rake deps- generate and build Java source into
target/classes
$ rake build- run topology in local dev cluster
$ bin/redstorm local path/to/topology_class.rb- generate remote cluster topology jar into
target/cluster-topology.jar, including theexamples/directory.
$ rake jar['examples']Colin Surprenant, @colinsurprenant, colin.surprenant@needium.com, colin.surprenant@gmail.com, http://github.com/colinsurprenant
Apache License, Version 2.0. See the LICENSE.md file.
