|
| 1 | +# Declarative style design |
| 2 | + |
| 3 | +## Requirements |
| 4 | + |
| 5 | +1. Declarative programming style: the user declaratively defines the pipeline structure, e.g. via chaining or composition of the step classes. |
| 6 | +2. Java Stream API-like: handles parallelism, backpressure, and synchronization automatically. |
| 7 | +3. Invisible framework: the user code should not depend directly on the framework, if possible. The framework should wrap the logic and add asynchronicity. |
| 8 | +4. Maximum Freedom, Simplicity, Reusability: as a consequence, nodes should be easy to develop and usable outside the framework. |
| 9 | +5. Encapsulation: the user defines "nodes of computation" that are plain Python classes unaware of the framework, of their asynchronous execution and of other nodes. |
| 10 | +6. Single entry point: we want a single entry point, responsible for linking the nodes together, orchestration and handling parallelism, backpressure, and synchronization. |
| 11 | +7. Simple API: the API should be easy to understand and implement. |
| 12 | + |
| 13 | +## Design |
| 14 | + |
| 15 | +- Pipeline: the Pipeline class will take a sequence of source -> step -> sink classes as input. The order of these classes will define the order in the processing pipeline. |
| 16 | +- Source, Step & Sink: since the framework shouldn't impose any inheritance or interface to implement, we need a convention for how the framework will interact with these classes. A simple approach is to expect each node to have a specific method that the framework can call to process an item. Let's call this method `produce` for sources, `process` for steps and `consume` for sinks. This method should take one input and return one output (or an exception). |
| 17 | +- Data Flow: the framework will be responsible for passing data between the steps and should provide a way to capture the source data, process it in steps and consume the final output of the pipeline in the sink. This could involve using queues for internal communication and managing backpressure and asynchronicity. We also need to expose some sort of tuning of the backpressure strategy and adapters to manage the data model mismatch between these components. |
| 18 | +- Boot sequence: the Pipeline class will need a method to start the processing and block the execution (TBD). The source can be a class that naturally produces data when polled (by blocking) or an iterable of data. |
| 19 | + |
| 20 | + |
| 21 | +## Example syntax |
| 22 | +### Defaults |
| 23 | +```python |
| 24 | +pipe = Pipeline() |
| 25 | +pipe.add_source(UserInputText()) |
| 26 | +pipe.add_step(WeatherForecast()) |
| 27 | +pipe.add_step(print) |
| 28 | +pipe.add_sink(DBStore()) |
| 29 | +pipe.process() |
| 30 | +``` |
| 31 | + |
| 32 | +### Adapter functions |
| 33 | +```python |
| 34 | +pipe = Pipeline() |
| 35 | +pipe.add_source(UserInputText(), map = lambda x: x.strip()) |
| 36 | +pipe.add_step(WeatherForecast(), map = lambda x: x.temperature_c, max_rate = "1/s") |
| 37 | +pipe.add_step(print) |
| 38 | +pipe.add_sink(DBStore(), map = lambda x: { "temperature_c": x.temperature_c, "time": datetime.now() }) |
| 39 | +pipe.process() |
| 40 | +``` |
0 commit comments