Config Serialization and Management¶
In a prior tutorial, we saw how coma’s
config serialization and persistence management behavior is not baked in but rather
depends on which PersistenceManager gets passed to
@command when declaring a command.
In this extended example, we’ll explore the functionality of the PersistenceManager
and its implications for config serialization.
YAML over JSON¶
Let’s start with a simple example:
from coma import command, wake
from dataclasses import dataclass
@dataclass
class Config:
message: str = "Hello World!"
@command
def greet(cfg: Config):
print(cfg.message)
if __name__ == "__main__":
wake()
The PersistenceManager interacts with the default
parser_hook to add an argparse flag for setting the serialization file path
of configs (cfg in this example) and interacts with the
default config_hook where the (possibly user-supplied)
file path is retrieved to perform serialization.
See here for details.
When instantiating a PersistenceManager, a file type is chosen for all configs to
fall back to. By default, YAML is chosen because omegaconf only supports YAML.
However, coma does natively support JSON as well
via a JSON-YAML translation.
Since the above example uses a default PersistenceManager, cfg will
fall back to a YAML serialization (cfg.yaml) by default:
$ python main.py greet
Hello World!
$ ls
cfg.yaml
main.py
$ cat cfg.yaml
message: Hello World!
Even with a default PersistenceManager, we can force serialization to JSON
by specifying an explicit file path for cfg with a .json file extension:
$ python main.py greet --cfg-path cfg.json
Hello World!
$ ls
cfg.json
cfg.yaml
main.py
$ cat cfg.json
{
"message": "Hello World!"
}
Note
By default, the PersistenceManager automatically adds the --cfg-path
flag through the default parser_hook. We’ll
explore non-default options later.
We now have two competing config files! Let’s modify each one to distinguish them:
Let’s update cfg.yaml to:
message: Hello YAML!
and cfg.json to:
{
"message": "Hello JSON!"
}
Now, if we run the program, we see that YAML is favored by default:
$ python main.py greet
Hello YAML!
But, as before, we can force JSON to used instead:
$ python main.py greet --cfg-path cfg.json
Hello JSON!
If we specify a file path without an extension, YAML will again be favored:
$ python main.py greet --cfg-path cfg
Hello YAML!
Finally, if we delete the YAML file while keeping the JSON file, the
PersistenceManager will ignore the existing JSON file (and cfg will be
serialized to a new YAML file instead) unless explicitly given a JSON file extension:
$ rm cfg.yaml
$ python main.py greet --cfg-path cfg
Hello World!
$ ls
cfg.json
cfg.yaml
main.py
$ python main.py greet --cfg-path cfg.json
Hello JSON!
Summary:
Because omegaconf only supports YAML, the default PersistenceManager
always favors YAML, while still supporting JSON. In the next section, we’ll
see how to reverse this.
Favoring JSON¶
We can reverse the default preference of YAML over JSON by setting JSON as the
default file extension when instantiating a PersistenceManager. Let’s modify
the previous example to achieve this:
from coma import Extension, PersistenceManager, command, wake
from dataclasses import dataclass
@dataclass
class Config:
message: str = "Hello World!"
@command(persistence_manager=PersistenceManager(Extension.JSON))
def greet(cfg: Config):
print(cfg.message)
if __name__ == "__main__":
wake()
First, let’s ensure that both YAML and JSON config files exist and are differentiated.
Update cfg.yaml (or create a file if none exists) to read:
message: Hello YAML!
And likewise for cfg.json:
{
"message": "Hello JSON!"
}
Now, when running the program, we see that JSON is favored in all cases unless a YAML file extension is explicitly provided:
$ python main.py greet
Hello JSON!
$ python main.py greet --cfg-path cfg
Hello JSON!
$ python main.py greet --cfg-path cfg.json
Hello JSON!
$ python main.py greet --cfg-path cfg.yaml
Hello YAML!
Registering a Config with the Persistence Manager¶
A PersistenceManager enables an individual config to be explicitly
registered with it. When no
registration is given, a sensible default is used. This default functionality
is what we’ve seen so far. Registering a specific config requires providing
new values to override one or more of these sensible defaults. Specifically:
The file
extensioncan optionally be set. IfNone, the extension falls back to thePersistenceManager’s default, which is YAML by default, but can be set to JSON.argparseflag arguments can optionally be set. These are meant to provide a way for the user to set an explicit config file path to override the default. Specifically, provide any desired*names_or_flagsand other keyword arguments to pass to add_argument(). For any of the following that are not provided, a sensible default is derived from the config’s parameter name in the command signature (cfgin the previous example):*names_or_flags:Defaults to
--{config_name}-path(i.e.,--cfg-pathin the previous example). Any_inconfig_nameare replaced with-.type:Defaults to
str.metavar:Defaults to
"FILE".dest:Defaults to
{config_name}_path.default:Defaults to
{config_name}.{extension}. IfextensionisNone, it falls back to thePersistenceManager’s default.help:Defaults to
"{config_name} file path".Additional parameters beyond those listed above can also be provided via registration. These are just the parameters that have sensible defaults if omitted during registration.
These argparse flag arguments get added (via ArgumentParser.add_argument())
during the default parser_hook. Then, in the
default config_hook, for each config, the
corresponding dest attribute of
InvocationData.known_args
is queried to retrieve the user-supplied file path (when the corresponding
--{config_name}-path flag is explicitly provided as a command line argument).
If the corresponding flag omitted, we instead fall back to the corresponding
default attribute of InvocationData.known_args. For details, see
get_file_path().
Warning
Registering a particular
config with a persistence manager does not guarantee/force that the config
will be serialized, but rather only explicitly determines which parameters get
passed to add_argument().
In particular, it is the responsibility of the default
config_hook to perform the serialization. This default hook always skips
non-serializable configs regardless of whether
they have been registered.
Let’s expand the previous example with a second config that
represents additional command data. Suppose we want this data to be serialized in
JSON format to a specific data directory under a specific (non-default) file name.
To do so, we register it with a JSON extension and a specific default that
points to the new data directory and file name. Since we don’t register the existing
cfg, its management will fall back to the sensible defaults:
from coma import Extension, PersistenceManager, command, wake
from dataclasses import dataclass
@dataclass
class Config:
message: str = "Hello World!"
@command(
persistence_manager=PersistenceManager().register(
"data", Extension.JSON, default="path/to/data/dir/greet.json"
)
)
def greet(cfg: Config, data: dict):
print(cfg.message)
print("data is:", data)
if __name__ == "__main__":
wake()
Before running the program, let’s create the data directory with mkdir:
$ mkdir -p path/to/data/dir
and add a greet.json file to that directory with the following content:
{
"some": "data",
"for": "greet"
}
Then, when running the program without specifying a file path for data, we see that
path/to/data/dir/greet.json gets loaded by default because of its registration:
$ python main.py greet
Hello World!
data is: {'some': 'data', 'for': 'greet'}
But, as in the previous example, we can still force data
to serialize elsewhere and in an alternative (in this case YAML) format if desired:
$ python main.py greet --data-path data.yaml
Hello World!
data is: {}
$ ls
data.yaml
main.py
path/
$ cat data.yaml
{}
$ cat path/to/data/dir/greet.json
{
"some": "data",
"for": "greet"
}