Config Serialization and Management

In a prior tutorial, we saw how coma’s config serialization and persistence management behavior is not baked in but rather depends on which PersistenceManager gets passed to @command when declaring a command.

In this extended example, we’ll explore the functionality of the PersistenceManager and its implications for config serialization.

YAML over JSON

Let’s start with a simple example:

from coma import command, wake
from dataclasses import dataclass

@dataclass
class Config:
    message: str = "Hello World!"

@command
def greet(cfg: Config):
    print(cfg.message)

if __name__ == "__main__":
    wake()

The PersistenceManager interacts with the default parser_hook to add an argparse flag for setting the serialization file path of configs (cfg in this example) and interacts with the default config_hook where the (possibly user-supplied) file path is retrieved to perform serialization. See here for details.

When instantiating a PersistenceManager, a file type is chosen for all configs to fall back to. By default, YAML is chosen because omegaconf only supports YAML. However, coma does natively support JSON as well via a JSON-YAML translation.

Since the above example uses a default PersistenceManager, cfg will fall back to a YAML serialization (cfg.yaml) by default:

$ python main.py greet
Hello World!
$ ls
cfg.yaml
main.py
$ cat cfg.yaml
message: Hello World!

Even with a default PersistenceManager, we can force serialization to JSON by specifying an explicit file path for cfg with a .json file extension:

$ python main.py greet --cfg-path cfg.json
Hello World!
$ ls
cfg.json
cfg.yaml
main.py
$ cat cfg.json
{
    "message": "Hello World!"
}

Note

By default, the PersistenceManager automatically adds the --cfg-path flag through the default parser_hook. We’ll explore non-default options later.

We now have two competing config files! Let’s modify each one to distinguish them:

Let’s update cfg.yaml to:

message: Hello YAML!

and cfg.json to:

{
    "message": "Hello JSON!"
}

Now, if we run the program, we see that YAML is favored by default:

$ python main.py greet
Hello YAML!

But, as before, we can force JSON to used instead:

$ python main.py greet --cfg-path cfg.json
Hello JSON!

If we specify a file path without an extension, YAML will again be favored:

$ python main.py greet --cfg-path cfg
Hello YAML!

Finally, if we delete the YAML file while keeping the JSON file, the PersistenceManager will ignore the existing JSON file (and cfg will be serialized to a new YAML file instead) unless explicitly given a JSON file extension:

$ rm cfg.yaml
$ python main.py greet --cfg-path cfg
Hello World!
$ ls
cfg.json
cfg.yaml
main.py
$ python main.py greet --cfg-path cfg.json
Hello JSON!

Summary:

Because omegaconf only supports YAML, the default PersistenceManager always favors YAML, while still supporting JSON. In the next section, we’ll see how to reverse this.

Favoring JSON

We can reverse the default preference of YAML over JSON by setting JSON as the default file extension when instantiating a PersistenceManager. Let’s modify the previous example to achieve this:

from coma import Extension, PersistenceManager, command, wake
from dataclasses import dataclass

@dataclass
class Config:
    message: str = "Hello World!"

@command(persistence_manager=PersistenceManager(Extension.JSON))
def greet(cfg: Config):
    print(cfg.message)

if __name__ == "__main__":
    wake()

First, let’s ensure that both YAML and JSON config files exist and are differentiated.

Update cfg.yaml (or create a file if none exists) to read:

message: Hello YAML!

And likewise for cfg.json:

{
    "message": "Hello JSON!"
}

Now, when running the program, we see that JSON is favored in all cases unless a YAML file extension is explicitly provided:

$ python main.py greet
Hello JSON!
$ python main.py greet --cfg-path cfg
Hello JSON!
$ python main.py greet --cfg-path cfg.json
Hello JSON!
$ python main.py greet --cfg-path cfg.yaml
Hello YAML!

Registering a Config with the Persistence Manager

A PersistenceManager enables an individual config to be explicitly registered with it. When no registration is given, a sensible default is used. This default functionality is what we’ve seen so far. Registering a specific config requires providing new values to override one or more of these sensible defaults. Specifically:

  • The file extension can optionally be set. If None, the extension falls back to the PersistenceManager’s default, which is YAML by default, but can be set to JSON.

  • argparse flag arguments can optionally be set. These are meant to provide a way for the user to set an explicit config file path to override the default. Specifically, provide any desired *names_or_flags and other keyword arguments to pass to add_argument(). For any of the following that are not provided, a sensible default is derived from the config’s parameter name in the command signature (cfg in the previous example):

    *names_or_flags:

    Defaults to --{config_name}-path (i.e., --cfg-path in the previous example). Any _ in config_name are replaced with -.

    type:

    Defaults to str.

    metavar:

    Defaults to "FILE".

    dest:

    Defaults to {config_name}_path.

    default:

    Defaults to {config_name}.{extension}. If extension is None, it falls back to the PersistenceManager’s default.

    help:

    Defaults to "{config_name} file path".

    Additional parameters beyond those listed above can also be provided via registration. These are just the parameters that have sensible defaults if omitted during registration.

These argparse flag arguments get added (via ArgumentParser.add_argument()) during the default parser_hook. Then, in the default config_hook, for each config, the corresponding dest attribute of InvocationData.known_args is queried to retrieve the user-supplied file path (when the corresponding --{config_name}-path flag is explicitly provided as a command line argument). If the corresponding flag omitted, we instead fall back to the corresponding default attribute of InvocationData.known_args. For details, see get_file_path().

Warning

Registering a particular config with a persistence manager does not guarantee/force that the config will be serialized, but rather only explicitly determines which parameters get passed to add_argument().

In particular, it is the responsibility of the default config_hook to perform the serialization. This default hook always skips non-serializable configs regardless of whether they have been registered.

Let’s expand the previous example with a second config that represents additional command data. Suppose we want this data to be serialized in JSON format to a specific data directory under a specific (non-default) file name. To do so, we register it with a JSON extension and a specific default that points to the new data directory and file name. Since we don’t register the existing cfg, its management will fall back to the sensible defaults:

from coma import Extension, PersistenceManager, command, wake
from dataclasses import dataclass

@dataclass
class Config:
    message: str = "Hello World!"

@command(
    persistence_manager=PersistenceManager().register(
        "data", Extension.JSON, default="path/to/data/dir/greet.json"
    )
)
def greet(cfg: Config, data: dict):
    print(cfg.message)
    print("data is:", data)

if __name__ == "__main__":
    wake()

Before running the program, let’s create the data directory with mkdir:

$ mkdir -p path/to/data/dir

and add a greet.json file to that directory with the following content:

{
    "some": "data",
    "for": "greet"
}

Then, when running the program without specifying a file path for data, we see that path/to/data/dir/greet.json gets loaded by default because of its registration:

$ python main.py greet
Hello World!
data is: {'some': 'data', 'for': 'greet'}

But, as in the previous example, we can still force data to serialize elsewhere and in an alternative (in this case YAML) format if desired:

$ python main.py greet --data-path data.yaml
Hello World!
data is: {}
$ ls
data.yaml
main.py
path/
$ cat data.yaml
{}
$ cat path/to/data/dir/greet.json
{
    "some": "data",
    "for": "greet"
}