This is a simple package / script which can be used to run multiple experiments in parallel on the same machine or distributed across many different machines. This script makes minimal assumptions about the kind of experiments you want to run and focuses on flexibility. The core idea is that there are three functions pre -> fit -> post for each experiment:
- 
pre(cfg): Is called before the experiment defined bycfgis performed. You can place anything here what you want to do before starting the experiment, e.g. loading the data or creating a model. Anything which is returned by this function is passed to the next call. If you don't return anything, just returnNone. - 
fit(cfg, returned_by_pre): Is called afterprehas been called. Whatever has been returned bypreis passed to this function throughreturned_by_pre. Place any code for the actual experiment in this function. For example, if you want to fit a model you should place the code here. Make sure, that you return anything (also stuff computed bypre) which you might need in the next function. If you don't return anything, just returnNone. - 
post(cfg, returned_by_fit): Is called afterfithas been called. Whatever has been returned byfitis passed to this function throughreturned_by_fit. Usually you want to compute some statistics / metrics about your experiment, e.g. a test error, test loss etc. This function should return a dictionary with key/value pairs for each metrics. If you do not compute any metrics, return an empty dictionary{}. Note that theexperiment_runnerwill automatically add a fieldfit_timeinto this dictionary which is the time spend in thefitfunction (measured viatime.time()). 
Each experiment is defined by a basis config basecfg and a list of individual configs cfgs. The basecfg is a dictionary containing the basis configuration of the experiment with the following fields:
out_path: The path in which results should be written to. All scores returned bypostwill be gathered and stored under${out_path}/results.jsonl. Moreover, each individual configuration will be stored under${out_path}/$id/$rep/config.jsonwhere$idis a unique identifier for the experiment (starting by 0) and$repis the current repetition of the experiment (see below). Ifn_repeitionsis 1, then$repis omitted an the path becomes${out_path}/$id/config.json. Note thatpre/fit/postwill receive the adjustedout_pathwhich contains the$idand$rep(if any). Also note, that the individual configscfgsshould be serializable to JSON files so that the config can be properly stored. Theexperiment_runnerwill attempt to convert list / numpy arrays accordingly, but I would not rely on this code for reproducibility. See the "best practices" section below for more information on that.pre(required): The pre function.fit(required): The fit function.post(required): The post function.backend(optional, defaults tosingle): The backend you want to use for running the experiments. Currently supported are {multiprocessing,ray,single,malocher}. Any string which is notmultiprocessing,malocherorraywill be interpreted assingle. As the names suggest,multiprocessingwill run experiment on the same machine using multiple processes (using amultiprocessing.Pool).rayuses Ray to distribute experiments across multiple machines,malocheruses malocher to run the experiments on multiple ssh-machines andsinglejust runs one experiment after another on the current machine without multi-threading.num_cpus(optional, only used by {multiprocessing,ray}, defaults to 1): The number of cpus / threads used by each experimentnum_gpus(optional, only used by {ray}, defaults to 0): The number of gpus required by each experimentmax_memory(optional, only used by {ray}, defaults to 1GB): The maximum number of memory required by the experiment.address(optional, only used by {ray}, defaults toauto): Address of the ray headredis_password(optional, only used by {ray}, defaults toNone): Redis password of the ray headverbose(optional {True,False}, defaults toTrue): Displays a TQDM progress bar over all experiments.repetitions(optional, defaults to 1): How often this experiment should be repeated (e.g. due to randomness). Note that theexperiment_runnerdoes not take care of random seeds, but you have to implement it infit.timeout(optional, defaults to 0): Sets an optional timeout in seconds. A single experiment is stopped aftertimeoutseconds. No statistcs are kept and an exception is printed. Execution of other experiments is resumed as usual.
An example basecfg could be:
basecfg = {
    "out_path":"results/",
    "pre": pre,
    "post": post,
    "fit": fit,
    "backend": "multiprocessing",
    "num_cpus":8,
    "verbose":True
}
The list of individual configurations cfgs can be pretty much whatever you want. The following keywords are reserved however:
experiment_id: The unique id of the experiment starting by 0.run_id: The current repetition of the experiment starting by 0.out_path: The corresponding file path as detailed above.
An example experiment_cfg could be:
cfg = {
    # Whatever you need for your experiment
}
experiment_cfg = {
    **cfg, 
    'experiment_id':experiment_id,
    'out_path':rep_out_path, 
    'run_id':i
}
Similarly, you can return whatever scores you need for your experiment. The following keywords are reserved however:
fit_time: The amount of time spend in thefitfunction measured viatime.time(). This field is automatically added byexperiment_runnermean_$M: Ifn_repetitionsis greater than 1, for each metric$Mthe mean over all runs is addedstd_$M: Ifn_repetitionsis greater than 1, for each metric$Mthe standard deviation over all runs is added
The experiment_runner has some basic support for hyperparameter search. We follow the philosophy discussed in "Random Search for Hyper-Parameter Optimization" by Bergstra and Bengio, JMLR (13) in 2012 (https://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf) which states that randomly chosen hyperparameter often outperform a classic grid search. In order to perform hyperparameter search you can tag a parameter as Variation which receives a list of possible variations for the hyperparameter and then call generate_configs with the desired number of configurations you want to check. For example you can generate n_configs = 5 different hyperparameter configurations for two parameters:
generate_configs(
    cfg = {
        "param_1": Variation([1,2,3,4]),
        "param_2": Variation([1e-1,1e-2,1e-3]),
    },
    n_configs = 3
)
generate_configs will recursively check all dictionary for Variation classes and return a list with n_configs configurations. Every occurrence of Variation is replaced by a randomly chosen entry from the supplied list. The function takes care of duplicate entries and makes sure that only unique hyperparameter configurations are returned. It also checks if there are at-least n_configs configurations possible and adapts n_configs accordingly. The result might look like:
[
    {
        "param_1": 1,
        "param_2": 1e-1,
    },
    {
        "param_1": 2,
        "param_2": 1e-3,
    },
    {
        "param_1": 4,
        "param_2": 1e-1,
    }
]
The results are written to a json-line file which can be read via the following snippet:
import json 
from pandas.io.json import json_normalize 
def read_jsonl(path):
    data = []
    with open(path, 'r', encoding='utf-8') as f:
        for line in f:
            data.append(json.loads(line))
    return json_normalize(data)
The json_normalize call will return a pandas data frame which is flattend and thereby column names change a bit. For example, you have to access the fit_time via scores.fit_time and so on.
- 
One of the central ideas of the
experiment_runneris, that a dictionary defines a single experiment and that this dictionary can be stored in a json file which can be read by a human and reproduced by a machine. Python objects and functions are difficult to store and thus it is best practice to configure experiments stuff which can be easily stored in a json file, e.g. strings, lists or numbers. The main purpose of theprefunction is then to extract these strings and create the corresponding objects. This can be a bit tedious form time to time, but the following snippets might help:import sys def str_to_class(classname): return getattr(sys.modules[__name__], classname)import foo method_to_call = getattr(foo, 'bar') result = method_to_call()eval("Foo") - 
When running many experiments which differ in runtime it is a good idea to shuffle them beforehand so that the get randomly distributed across ray / multiprocessing:
basecfg = {...} experiments = [...] random.shuffle(experiments) run_experiments(basecfg, experiments) - 
You should somehow organize the results, e.g. by using the current time + date for the
out_path:from datetime import datetime basecfg = { "out_path":"results/" + datetime.now().strftime('%d-%m-%Y-%H:%M:%S'), "pre": pre, "post": post, "fit": fit, "backend": "single", "verbose":True } - 
The
results.jsonlcontains a lot of information which is usually not required for displaying results. Moreover, entries are often close to programming and not close to what you would write in a paper. The following snippets can be helpfuldef nice_name(row): # Build a nice looking string return "{} {}".format(row["model"], row["param_1"]) df = read_jsonl(os.path.join(latest_folder, "results.jsonl")) df["nice_name"] = df.apply(nice_name, axis=1) df = df.round(decimals = 3)tabledf = df[["nice_name", "mean_accuracy", "mean_params", "scores.mean_fit_time"]] tabledf = tabledf.sort_values(by=['mean_accuracy'], ascending = False) display(HTML(shortdf.to_html()))idx = tabledf.groupby(['nice_name'])['mean_accuracy'].transform(max) == tabledf['mean_accuracy'] shortdf = tabledf[idx] display(HTML(shortdf.to_html()))