on
Rain v0.3 released
tl;dr You can write your own tasks in C++ and Rust now, we have reworked metadata for tasks and objects and overall imroved the Python API.
We have just released version 0.3.0 of Rain, a framework for executing parallel pipelines. This release brings lots of changes, both internal and external. In this text, we would like to summarize them together with a motivation.
Custom tasks in Rust and C++
The big news is a possibility to create your own tasks in C++ and Rust. In version 0.3.0 you can create executors (formerly known as subworkers) that provide some predefined tasks. We have created libraries for C++ and Rust that allows you to simply create such executors. In other words, if you have some code that you want to expose as tasks in Rain, you simply wrap it by our library. The resulting program is a working executor that knows how to communicate with a governor (formerly worker). In the current version, Rust library is almost complete; the C++ library is working but provides only the basic functionality.
Public node attributes
The second large change is stabilization of “attributes” -
the metadata for tasks and data objects. Each task and data object stores
two sets of attributes: spec
(specification of task/object, i.e. configuration,
dependencies, scheduler hints, etc.) and info
(attributes created at runtime,
i.e. where was task executed, task duration, error messages). Both spec and info
can contain also user-defined attributes that can contain any JSON-serializable
data. This gives the user possibility to send and receive small (meta)data from
tasks.
Python task API
Another visible change is refactoring of Python API for tasks. We have refactored API to be more “pythonic”. Among some smaller changes such as better attribute checking and consistent attributes orders, the big one is that each task type is now represented by a class. This is mainly relevant for people creating own tasks, but others may see the change as capitalization of “task factories” in tasks module, e.g. tasks.execute is now tasks. Execute, as they are now classes, not functions. Being aware that this change breaks compatibility, we believe it was necessary to make the usage of Rain more intuitive. Based on our experience, people are sometimes confused about the idea of building a “plan of computation” that is actually executed after submission. We hope that using the capital letters will help to explain that tasks. Execute only creates a node in a task graph but do not run the computation.
Distribution
We have also news in the distribution of packages. The Rain infrastructure and
library for writing Rust packages are now installable by cargo install rain_server
or as a Cargo.toml
dependency. The Python client and task API
is published at PyPI and installable by
pip install rain-python
(note that Rain needs Python 3).
Other changes
We have renamed some of the Rain components as follows: “worker” -> “governor” and “subworker” -> “executor”. If you are not familiar with the old names, good for you and you can skip ignore paragraph; the reason behind the rename is to avoid confusing worker with either the local manager (now called governor) or the actual task executor process (now executor). A governor is the local manager of the resources and data objects on a computational node (In the similar sense as the server is manager of the cluster). The tasks are performed by executor, that was renamed from “subworker” as the original name misleading.
We are now supporting the arrow format in load
/encode
,
and lots of other minor bugs were fixed in many places. There were also some
smaller tweaks in the dashboard. We know that the dashboard needs more care.
This is a quite high priority in our TODO list for the next release - you can track
the plan in the v0.4 milectone
but it will be fleshed out only later in the summer (we like vacations too!).
We plan major improvement of dashboard together with new functionality to propagate
custom data from tasks to the dashboard (e.g. a chart with some internal
values).
There are also two internal changes that are almost invisible to users, but had
quite a large impact in the code. Both of them are on the long way of
refactoring our RPC that probably ends by removing capnp
from Rain. The first is
refactoring of data fetching API, that was simplified and does not relly on
capnp “capabilities”. As a side-effect of this, we have implemented better
caching of serialized directories. The other change is a complete revamp of the
protocol between governor and executor. In this place, capnp was completely
replaced by a simple protocol based on exchanging CBOR messages. This allows to
develop new executors in almost in any language.
Standa, Tomáš and Vojta from the Rain Team