Rain v0.3 released

tl;dr You can write your own tasks in C++ and Rust now, we have reworked metadata for tasks and objects and overall imroved the Python API.

Rain logo

We have just released version 0.3.0 of Rain, a framework for executing parallel pipelines. This release brings lots of changes, both internal and external. In this text, we would like to summarize them together with a motivation.

Custom tasks in Rust and C++

The big news is a possibility to create your own tasks in C++ and Rust. In version 0.3.0 you can create executors (formerly known as subworkers) that provide some predefined tasks. We have created libraries for C++ and Rust that allows you to simply create such executors. In other words, if you have some code that you want to expose as tasks in Rain, you simply wrap it by our library. The resulting program is a working executor that knows how to communicate with a governor (formerly worker). In the current version, Rust library is almost complete; the C++ library is working but provides only the basic functionality.

Public node attributes

The second large change is stabilization of “attributes” - the metadata for tasks and data objects. Each task and data object stores two sets of attributes: spec (specification of task/object, i.e. configuration, dependencies, scheduler hints, etc.) and info (attributes created at runtime, i.e. where was task executed, task duration, error messages). Both spec and info can contain also user-defined attributes that can contain any JSON-serializable data. This gives the user possibility to send and receive small (meta)data from tasks.

Python task API

Another visible change is refactoring of Python API for tasks. We have refactored API to be more “pythonic”. Among some smaller changes such as better attribute checking and consistent attributes orders, the big one is that each task type is now represented by a class. This is mainly relevant for people creating own tasks, but others may see the change as capitalization of “task factories” in tasks module, e.g. tasks.execute is now tasks. Execute, as they are now classes, not functions. Being aware that this change breaks compatibility, we believe it was necessary to make the usage of Rain more intuitive. Based on our experience, people are sometimes confused about the idea of building a “plan of computation” that is actually executed after submission. We hope that using the capital letters will help to explain that tasks. Execute only creates a node in a task graph but do not run the computation.

Distribution

We have also news in the distribution of packages. The Rain infrastructure and library for writing Rust packages are now installable by cargo install rain_server or as a Cargo.toml dependency. The Python client and task API is published at PyPI and installable by pip install rain-python (note that Rain needs Python 3).

Other changes

We have renamed some of the Rain components as follows: “worker” -> “governor” and “subworker” -> “executor”. If you are not familiar with the old names, good for you and you can skip ignore paragraph; the reason behind the rename is to avoid confusing worker with either the local manager (now called governor) or the actual task executor process (now executor). A governor is the local manager of the resources and data objects on a computational node (In the similar sense as the server is manager of the cluster). The tasks are performed by executor, that was renamed from “subworker” as the original name misleading.

We are now supporting the arrow format in load/encode, and lots of other minor bugs were fixed in many places. There were also some smaller tweaks in the dashboard. We know that the dashboard needs more care. This is a quite high priority in our TODO list for the next release - you can track the plan in the v0.4 milectone but it will be fleshed out only later in the summer (we like vacations too!). We plan major improvement of dashboard together with new functionality to propagate custom data from tasks to the dashboard (e.g. a chart with some internal values).

There are also two internal changes that are almost invisible to users, but had quite a large impact in the code. Both of them are on the long way of refactoring our RPC that probably ends by removing capnp from Rain. The first is refactoring of data fetching API, that was simplified and does not relly on capnp “capabilities”. As a side-effect of this, we have implemented better caching of serialized directories. The other change is a complete revamp of the protocol between governor and executor. In this place, capnp was completely replaced by a simple protocol based on exchanging CBOR messages. This allows to develop new executors in almost in any language.

Standa, Tomáš and Vojta from the Rain Team