on
Hyperparameter Search on a Cluster (Part 3)
- Part 1: Hyperparameter search pipeline
- Part 2: Hyperparameter search pipeline distributed on Exoscale cloud
- Part 3: Employing external programs
In our previous blog posts, we showed how to use Rain for hyperparameter search for MNIST models and how to distribute the whole process in a cloud environment. This time, we will modify the example so it employs an external application which performs the actual training process instead of doing this within a Python function decorated with the Rain remote decorator.
Executing external programs
Before we start, let us have a brief detour and have a look at how Rain handles
the execution of external programs. Rain features task type tasks.Execute
.
When this task is executed in a governor, it creates a local temporary
directory and maps all inputs and outputs of the program as files stored in
this directory and executes the program. When it finishes, the output files are
exported and the temporary directory removed.
The executed program may do whatever it wants in its temporary directory. Nonetheless, it should not use files located outside of the directory as it cannot assume on which governor (worker machine) it actually runs. If the program requires some files/directories, they should be properly mapped as inputs. In that case, Rain distributes the data across the cluster as necessary.
Simple external program - mnist.py
Our external application is a TensorFlow code in an external Python file
mnist.py
with a command line interface:
python3 mnist.py --data <PATH_TO_DATA> --dropout <DROPOUT> --size <SIZE>
The dropout
and size
parameters have the same meaning as in the
previous examples. The --data
argument expects a path to the MNIST dataset.
The program prints prediction accuracy on the standard output. The full source
code of the mnist.py
file can be found at the end of this post.
Note that, in this tutorial, we use the application only through its CLI as we would use any other third-party black-box application.
Initilization
Now, let’s start with our Rain client code. Most of the initialization process
remains the same as explained in the previous blog posts. We only add a new
variable MNIST_PROGRAM
which contains a path to mnist.py
. Do not forget
to modify it so it points to the correct path on your system.
MNIST_PROGRAM = "/path/to/mnist.py" # <-- The only difference
SIZES = [32, 64, 128, 256]
DROPOUTS = [0.0, 0.2, 0.4, 0.6, 0.8]
client = Client("localhost", 7210)
session = client.new_session("MNIST test", default=True)
Data download
Next, we need to download the data. In the previous version, we have used
Tensorflow build-in function to download it and then distributed it as pickled
data. This time, we use external program mnist.py
in order to ensure that
it only gets downloaded once. In Rain, this can be as follows:
mnist_data = tasks.Execute(
"wget https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz",
output_paths=["mnist.npz"])
First argument is the actual command to be executed. Argument output_paths
allows to specify what files represent task outputs. It is a relative path to
the working directory of the task.
Training
As mentioned before, the main goal of this blog post is to show how to dedicate
the training process to an external program and here we are. Previously, the
model training was done by a decorated Python function train_mnist
. This
time, instead of calling this function we parametrize and call our external
program mnist.py
as follows:
mnist_tasks = [
tasks.Execute(
["python3", MNIST_PROGRAM, "--data", mnist_data, "--size", str(size), "--dropout", str(dropout)],
stdout=True,
name="mnist size={} dropout={}".format(size, dropout))
for size, dropout in itertools.product(SIZES, DROPOUTS)
]
for task in mnist_tasks:
task.output.keep()
Note, that we are directly passing our data downloading task mnist_data
as
the 4th element in the list of command arguments. In such a case, the data
object produced by mnist_data
is mapped to a randomly named file that is
created in the working directory of the executed task. This random name is then
expanded into arguments of mnist.py
on the place where mnist_data
are
placed. In other words, the command is actually executed like this:
python3 /path/to/mnist.py --data <RANDOM_NAME_WHERE_DATA_ARE_MAPPED> --size <SIZE> -- data <DROPOUT>
The stdout
argument indicates that we want to capture a standard
output of the executed program and use it as the result of the task.
Getting the result
Getting the results is again very similar to the previous version. The only difference is that this time we do not get pickled Python objects but raw data captured from standard outputs of executed programs. Hence, we have to covert them into floats explicitly:
session.submit()
accuracies = [float(task.output.fetch().get_str()) for task in mnist_tasks]
This is all. Now you can test the pipeline locally:
rain start --simple
python3 <our-rain-script.py>
Running on Exoscale
We have thoroughly described how to use Rain on Exoscale in our previous blog post. Therefore, here we only focus on the differences.
To run our example on Exoscale cloud, we need to make the mnist.py
program
available on all nodes. For that, we can use our exoscale.py
deployment
script:
python3 exoscale.py scp /path/to/mnist.py mnist.py
This copies the program to all nodes into their home directories (that is
/home/ubuntu
). The last thing that we need to modify is variable
MNIST_PROGRAM
:
MNIST_PROGRAM = "/home/ubuntu/mnist.py
Now, the script is ready to be executed on the Exoscale cluster.
Full source code of the example
from rain.client import Client, tasks
import tensorflow as tf
import itertools
import numpy as np
import matplotlib.pyplot as plt
MNIST_PROGRAM = "/path/to/mnist.py"
SIZES = [32, 64, 128, 256]
DROPOUTS = [0.0, 0.2, 0.4, 0.6, 0.8]
client = Client("localhost", 7210)
session = client.new_session("MNIST test", default=True)
mnist_data = tasks.Execute(
"wget https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz",
output_paths=["mnist.npz"])
mnist_tasks = [
tasks.Execute(
["python3", MNIST_PROGRAM, "--data", mnist_data, "--size", str(size), "--dropout", str(dropout)],
stdout=True,
name="mnist size={} dropout={}".format(size, dropout))
for size, dropout in itertools.product(SIZES, DROPOUTS)
]
for task in mnist_tasks:
task.output.keep()
session.submit()
accuracies = [float(task.output.fetch().get_str()) for task in mnist_tasks]
def make_heatmap(ax, data):
data = np.array(data).reshape((len(SIZES), len(DROPOUTS)))
im = ax.imshow(data)
ax.set_yticklabels(SIZES)
ax.set_xticklabels(DROPOUTS)
ax.set_yticks(np.arange(len(SIZES)))
ax.set_xticks(np.arange(len(DROPOUTS)))
for j in range(len(DROPOUTS)):
for i in range(len(SIZES)):
text = ax.text(j, i, data[i, j],
ha="center", va="center", color="w")
fig, ax1 = plt.subplots()
make_heatmap(ax1, accuracies)
fig.tight_layout()
plt.show()
File ‘mnist.py’
import tensorflow as tf
import argparse
import numpy as np
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--data', help='Path to data', default=None)
parser.add_argument('--size', type=int, help='Size of hidden layer', default=256)
parser.add_argument('--dropout', type=float, help='Dropout', default=0.2)
args = parser.parse_args()
with np.load(args.data) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(args.size, activation=tf.nn.relu),
tf.keras.layers.Dropout(args.dropout),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1, verbose=False)
result = model.evaluate(x_test, y_test, verbose=False)
print(result[1])
if __name__ == "__main__":
main()