Defining Hooks¶
coma has a template-based architecture. Hooks not only implement coma’s
default behavior, but also make it easy to tweak, replace, or extend that behavior.
Hook Semantics¶
coma has very few baked in assumptions. All of coma’s default behavior
results from pre-defined hooks that have been chosen to fill its template slots.
Nearly all behavior can be drastically changed with user-defined hooks.
coma has 10 total hook slots in its template. To help users decide which hooks to
define, each of these slots has semantics (the type of functionality that coma
expects that particular hook slot will have). Most of these semantics are not hard
requirements, and hook implementations are free to vary wildly from their semantics.
That said, semantics provide a solid base from which to explore this space.
At the highest level, hooks belong to one of two semantic types: parser and invocation.
Parser Hooks¶
Parser hooks are the only type of hook that get executed at command registration
time (prior to command invocation). The parser_hook semantics are to add command
line flags via calls to add_argument()
on the underlying ArgumentParser
bound to the command that will later be invoked.
Parser hooks for all declared commands are executed (at registration time).
This is needed so that argparse has all the necessary information to invoke
the correct command based on the provided command line arguments. This means that
parser hooks with side effects will always execute those side effects, even
if the command they are bound to isn’t the one that ultimately gets invoked.
Invocation Hooks¶
Invocation hooks are stored during command registration, but only get executed
if the command to which they are bound is invoked (i.e., is chosen by argparse
based on the provided command line arguments). Invocation hooks, as their name
suggests, are responsible for completing all necessary steps involved in successfully
invoking the command to which they are bound.
Invocation hooks semantics further belong to one of three sub-types:
Config:
Config hooks are meant to initialize or affect the config objects that are bound to a particular command.
Init:
Init hooks are meant to instantiate or affect the command object (either a function or a class).
Warning
Function-based command objects are internally wrapped in a programmatically-generated class, and it is this wrapper class that an init hook receives, not the raw function object. This unifies the interface, since (from the perspective of an init hook) all command objects are classes that ought to be instantiated.
Run:
Run hooks are meant to execute or surround the execution of the command object after it has been instantiated (presumably by an init hook).
Each of the three invocation hook sub-types (config, init, and run) is further split into three flavors:
Pre:
Pre hooks are executed immediately before the main hook of the same type as a way to add additional behavior.
Main:
Main hooks are generally meant to perform the bulk of the work for that semantic category (config, init, or run).
Post:
Post hooks are executed immediately after the main hook of the same type as a way to add additional behavior.
Altogether, there are 9 invocation hooks. The following keywords are used in
@command and wake()
to define command and shared hooks,
respectively:
Type |
Sub-Type |
Flavor |
Keyword |
|---|---|---|---|
parser |
N/A |
N/A |
|
invocation |
config |
pre |
|
main |
|
||
post |
|
||
init |
pre |
|
|
main |
|
||
post |
|
||
run |
pre |
|
|
main |
|
||
post |
|
Hook Pipeline¶
As stated above, parser hooks are executed when a command is registered,
whereas the invocation hooks are executed if, and only if, the command to
which they are bound is invoked by argparse. The invocation hook pipeline
consists of executing all the invocation hooks (in order) one immediately
following the other, with no other code in between. In other words, the invocation
hooks make up the entirety of the code responsible for completing all necessary
steps involved in successfully invoking the command to which they are bound.
Hook Protocol¶
To enable interoperability between hooks (especially in the hook pipeline), all
hooks must follow a specific protocol (i.e., function signature). All hooks,
regardless of semantics, must take exactly one parameter. For parser hooks,
this parameter is a ParserData object, whereas it is an
InvocationData object for invocation hooks. Both of
these inherit from HookData, and it is perfectly acceptable
to subclass any of these to add additional attributes needed in custom hooks.
Hooks typically modify their input parameter inplace and return None. However,
a hook can also return a new object (of the same type as its input parameter) derived
from the input parameter instead of making inplace modifications. Subsequent hooks in
the pipeline receive whichever object is the latest non-None return object from a
preceding hook.
Default Hooks¶
Rather than being hardcoded, coma’s default behavior is, almost entirely, a
result of having specific pre-defined hooks as default value in the definition of
wake() that propagate to all command
declarations unless explicitly redefined. The upshot is
that there is almost no part of coma’s default behavior that cannot be tweaked,
replaced, or extended through hooks.
That being said, coma’s default hooks already provide extensive functionality.
Of coma’s 10 total hooks, only 4 have pre-defined defaults: the parser_hook,
the main config_hook, the main init_hook, and the main run_hook. All
default hooks are generated from factory functions with default parameters.
Note
Factories to enable behavioral tweaks as one-liners by redefining a default
hook using its factory with a single changed parameter. For example,
run_hook.default_factory()
can be used to change the command execution method name from the default
run() to something else. See here.
Browse the hooks’ package reference to
explore factory options. Factory function names always end with *_factory.
All the default factories are named default_factory and can be found in
their respective hook-semantic module. For example, the default factory for
run_hook is found in coma.hooks.run_hook.default_factory().
If you are finding that the factory functions are insufficient, consider making use of the many config-related utilities found here to help you in writing your own custom hooks.
In the explanations below, data refers to the input parameter of the hook
(ParserData for parser hooks and
InvocationData for invocation hooks).
Default Parser Hook:
The
defaultparser_hookusesdata.persistence_managerto add, for eachserializableconfig, aparser path argument. This enables an explicit file path to the config file to be specified on the command line via a flag. By default, the flag is--{config_name}-path, whereconfig_nameis the name of the corresponding config parameter in the command signature.
Default Main Config Hook:
The
defaultconfig_hookdoes all the heaving lifting for manifestingcoma’s default behavior regarding configs. It makes the following assumptions:
Configs are declarative. They should always follow the declarative hierarchy.
Declared configs are required. This means that declared configs (both in the command’s signature and any supplemental configs) are loaded (based on the declarative hierarchy) by default.
Persistence of configs is typically desirable. This means that, by default, all
serializableconfigs are serialized (to enable the middle step of the declarative hierarchy), but skipping serialization for a particular config is easy.In short, for each config, this hook initializes the config based on the declarative hierarchy protocol:
At minimum, each config is initialized from its base declaration.
Serializableconfigs are then loaded from file (if one exists) or written to file (otherwise) unless serialization has been explicitly toggled off for that particular config. Serialization interacts with the defaultparser_hooksince it queries the samedata.persistence_managertoget the file pathof each config based on its path declaration in the defaultparser_hook. See here for more details on config files.For each config, an attempt is made to override its config attribute values with any command line arguments that fit
omegaconf’s dot-list notation.Note
Each config variant in the declarative hierarchy is
storedso that later hooks can access any variant (if needed). This is particularly helpful in cases where some configs need to be preloaded before others.The
config_hook’sdefault factoryincludes many flags for tweaking the default behavior. For example, you can skip the override or the serialization of some configs but not others. Or you can raise aFileNotFoundErrorif a particular config file cannot be found. Or even force the serialization of the override values rather than the base config declaration.
Default Main Init Hook:
The
defaultinit_hookinstantiates thedata.commandclass by calling its__init__()method with all declared parameters (config, inline, and regular) filled in through thecall_on()method ofdata.parameters. Then, the value ofdata.command(a class type) gets replaced inplace with the value of the instantiated object.Warning
In user-defined hooks, be sure to never make decisions based on directly inspecting the
data.commandobject. Not only are function-based commands implicitly wrapped in a class, but also the value ofdata.commandchanges from a class type to an instance of that class as part of this default init hook.Instead, use
data.nameif you need to determine which command is being invoked, since the command name is guaranteed to be unique across all declared commands.
Default Main Run Hook:
The
defaultrun_hookcalls thedata.commandobject’srun()(by default, though this can be changed) method with no parameters. This assumes that theinit_hookhas instantiateddata.commandfrom a class type to an instance.
Hooks as Sequences¶
Typically, a hook is a function with a signature based on the
hook protocol. However, there are three additional
(non-function) sentinel objects (SHARED, DEFAULT, and None) that have
special meaning as command
and/or shared hook values. A valid “plain” hook can be any
single function adhering to the hook protocol or any single of these three sentinels.
In addition, any (recursively) nested sequences of these singular/plain values
is also a valid hook. Each item in these sequences is recursively inspected for the
presence of any of the three sentinels. These are replaced at runtime with their
semantic equivalent function. This is particularly
useful to extend coma’s default behavior,
rather than outright replacing replacing it. To
emphasize the recursive potential of nested hook sequences, consider this toy example:
from coma import command, wake, DEFAULT
@command(
run_hook=(
(
None,
lambda _: print("First"),
),
lambda _: print("Second"),
(
(
(
(
DEFAULT,
lambda _: print("Fourth"),
),
),
),
),
None,
(),
lambda _: print("Last"),
),
)
def nested():
print("Third")
if __name__ == "__main__":
wake()
Let’s see how coma resolves the nested sequences:
$ python main.py nested
First
Second
Third
Fourth
Last
Notice that DEFAULT gets replaced at runtime with the default run_hook which
runs the command and prints Third at that position in the nested sequences.
Beyond this toy example, sequences are helpful in practice for decomposing a complex hook function into a series of smaller ones. Often these component functions will be hook variants created using factories. Hook sequences essentially wrap each component function into a higher-order function that executes the components in order following the rules of the hook protocol.
As an extreme example, we could redefine the pre_config_hook of a command to
stuff the entire default invocation pipeline into it
while setting the standard hooks to None:
from coma import command, wake, config_hook, init_hook, run_hook
@command(
pre_config_hook=(
config_hook.default_factory(),
init_hook.default_factory(),
run_hook.default_factory(),
),
config_hook=None,
init_hook=None,
run_hook=None,
)
def cmd():
print("No problem!")
if __name__ == "__main__":
wake()
This example also highlights the utility of pre and post hooks. They are really
just conceptual convenience functions. All functionality could in principle be placed
in a single hook sequence as shown here. The benefit of multiple hook types and
sub-types with differing semantics is to help conceptually separate concerns. Consider
that, in this example, we defined a pre_run_hook that
exits the program before running the command. In principle, we could have implemented
this same functionality by redefining the run_hook as (pre_run_hook, SHARED).
However, because the new functionality is an early exit (before running the command),
it feels conceptually cleaner to exit as as a separate pre_run_hook, rather than as
an initial component of the run_hook in the invocation pipeline. This distinction
is purely conceptual. The resulting behavior is essentially equivalent.