Directives

Variables

Suppose you have a very long and complex configuration file, and you often need to change some values in it, depending on external factors. You can:

  • Manually edit it each time, waste some time looking for the specific value to edit, keep a history of the changes or otherwise you won’t be able to go back.

  • Keep the files immutable but instead create a duplicate for every possible value, and when you eventually realize something was wrong with the original file, you have to propagate the changes in all the 20.000 copies you created.

  • Replace the values you need to change with Variables, and let Choixe fill in the values for you, keeping only one, immutable version of the original file.

The following example consists in a toy configuration file for a deep learning training task, involving a moderate amount of parameters. Your configuration file looks like this:

model:
  architecture:
    backbone: resnet18
    use_batch_norm: false
    heads:
      - type: classification
        num_classes: 10
      - type: classification
        num_classes: 7
training:
  device: cuda
  epochs: 100
  optimizer:
    type: Adam
    params:
      learning_rate: 0.001
      betas: [0.9, 0.99]

In this toy configuration file there are some parameters entirely dependant from the task at hand. Take for instance the number of classes, whenever you decide to perform a training on a different dataset, the number of classes is inevitably going to change.

To avoid the pitfalls described earlier, you can use variables. Think of a variable as a placeholder for something that will be defined later. Variables values are picked at runtime from a structure referred to as “context”, that can be passed to Choixe python API.

To use variables, simply replace a literal value with a var directive:

$var(identifier: str, default: Optional[Any] = None, env: bool = False)

Where:

  • identifier is the pydash path (dot notation only) where the value is looked up in the context.

  • default is a value to use when the context lookup fails - essentially making the variable entirely optional. Defaults to None.

  • env is a bool that, if set to True, will also look at the system environment variables in case the context lookup fails. Defaults to False.

  • help is a string containing an optional help message. Choixe uses this message when performing inspection to provide the user with a brief description. It serves no other purpose.

Here is what the deep learning toy configuration looks like after replacing some values with variables:

model:
  architecture:
    backbone: resnet18
    use_batch_norm: $var(hparams.normalize, default=True)
    heads:
      - type: classification
        num_classes: $var(data.num_classes1) # No default: entirely task dependant
      - type: classification
        num_classes: $var(data.num_classes2) # No default: entirely task dependant
training:
  device: $var(TRAINING_DEVICE, default=cpu, env=True) # Choose device based on env vars.
  epochs: $var(hparams.num_epochs, default=100)
  optimizer:
    type: Adam
    params:
      learning_rate: $var(hparams.lr, default=0.001)
      betas: [0.9, 0.99]

The minimal context needed to use this configuration will look something like this:

data:
  num_classes1: 10
  num_classes2: 7

The full context can contain all of these options:

data:
  num_classes1: 10
  num_classes2: 7
hparams:
  normalize: true # Optional
  num_epochs: 100 # Optional
  lr: 0.001 # Optional
TRAINING_DEVICE: cuda # Optional, env

Contexts can also be seen as a “meta-configuration” providing an easier and cleaner access to a subset of “public” parameters of a templatized “private” configuration file with lots of details to keep hidden.

Imports

Imagine having a configuration file in which some parts could be reused in other configuration files. It’s not the best idea to duplicate them, instead, you can move those parts in a separate configuration file and dynamically import it using the import directive.

To use an import directive, replace any node of the configuration with the following directive:

$import(path: str)

Where:

  • path can be an absolute or relative path to another configuration file. If the path is relative, it will be resolved relatively from the parent folder of the importing configuration file, or, in case there is no importing file, the system current working directive.

Let’s build on top of the previous “deep learning” example:

model:
  architecture:
    backbone: resnet18
    use_batch_norm: $var(hparams.normalize, default=True)
    heads:
      - type: classification
        num_classes: $var(data.num_classes1)
      - type: classification
        num_classes: $var(data.num_classes2)
training:
  device: $var(TRAINING_DEVICE, default=cpu, env=True)
  epochs: $var(hparams.num_epochs, default=100)
  optimizer:
    type: Adam
    params:
      learning_rate: $var(hparams.lr, default=0.001)
      betas: [0.9, 0.99]

Here, one could choose to factor out the optimizer node and move it into a separate file called “adam.yml”.

# neural_network.yml
model:
  architecture:
    backbone: resnet18
    use_batch_norm: $var(hparams.normalize, default=True)
    heads:
      - type: classification
        num_classes: $var(data.num_classes1)
      - type: classification
        num_classes: $var(data.num_classes2)
training:
  device: $var(TRAINING_DEVICE, default=cpu, env=True)
  epochs: $var(hparams.num_epochs, default=100)
  optimizer: $import(adam.yml)
# adam.yml
type: Adam
params:
  learning_rate: $var(hparams.lr, default=0.001)
  betas: [0.9, 0.99]

Note that “adam.yml” contains some directives. This is not a problem and it is handled automatically by Choixe. There is also no restriction on using imports in imported files, you can nest them as you please.

Sweeps

Sweeps allow to perform an exhaustive search over a parametrization space, in a grid-like fashion, without having to manually duplicate the configuration file.

To use a sweep directive, replace any node of the configuration with the following directive:

$sweep(*args: Any)

Where:

  • args is an arbitrary set of parameters.

All the directives introduced so far are “non-branching”, i.e. they only have one possible outcome. Sweeps instead, are currently the only “branching” Choixe directives, as they produce multiple configurations as their output.

Example:

foo:
  alpha: $sweep(a, b) # Sweep 1
  beta: $sweep(10, 20, hello) # Sweep 2

Will produce the following six outputs, the cartesian product of {a, b} and {10, 20, hello}:

  1. foo:
      alpha: a
      beta: 10
    
  2. foo:
      alpha: b
      beta: 10
    
  3. foo:
      alpha: a
      beta: 20
    
  4. foo:
      alpha: b
      beta: 20
    
  5. foo:
      alpha: a
      beta: hello
    
  6. foo:
      alpha: b
      beta: hello
    
    flowchart TD
    root -->|Sweep 1| a
    root -->|Sweep 1| b
    a -->|Sweep 2| id1[10]
    a -->|Sweep 2| id2[20]
    a -->|Sweep 2| id3[hello]
    b -->|Sweep 2| id4[10]
    b -->|Sweep 2| id5[20]
    b -->|Sweep 2| id6[hello]

By default, all sweeps are global, each of them adds a new axis to the parameter space, regardless of the depth at which they appear in the structure. There is only one exception to this rule: if a sweep appears inside a branch of another sweep; in this case, the new axis is added locally.

Example:

foo:
  $directive: sweep # Sweep 1 (global)
  $args:
    - alpha: $sweep(foo, bar) # Sweep 2 (local)
      beta: 10
    - gamma: hello
  $kwargs: {}

Will produce the following three outputs:

  1. foo:
      alpha: foo
      beta: 10
    
  2. foo:
      alpha: bar
      beta: 10
    
  3. foo:
      gamma: hello
    
    flowchart TD
    root -->|Sweep 1| a["{alpha: $sweep(foo, bar), beta: 10}"]
    root -->|Sweep 1| b["gamma: hello"]
    a -->|Sweep 2| foo
    a -->|Sweep 2| bar

Instances

Instances allow to dynamically replace configuration nodes with real python objects. This can happen in two ways:

  • With the call directive - dynamically import a python function, invoke it and replace the node content with the function result.

  • With the model directive - dynamically import a pydantic BaseModel and use it to deserialize the content of the current node.

Note: these directives can only be used with the special form.

Call

To invoke a python Callable, use the following directive.

$call: SYMBOL [str]
$args: ARGUMENTS [dict]

Where:

  • SYMBOL is a string containing either:

    • A filesystem path to a .py file, followed by : and the name of a callable.

    • A python module path followed by . and the name of a callable.

  • ARGUMENTS is a dictionary containing the arguments to pass to the callable.

Example:

foo:
  $call: path/to/my_file.py:MyClass
  $args:
    a: 10
    b: 20

Will import MyClass from path/to/my_file.py and produce the dictionary:

{"foo": SomeClass(a=10, b=20)}

Another example:

foo:
  $call: numpy.zeros
  $args:
    shape: [2, 3]

Will import numpy.zeros and produce the dictionary:

{"foo": numpy.zeros(shape=[2, 3])}

Model

Similar to call, the model directive will provide an easier interface to deserialize pydantic models.

The syntax is essentially the same as call:

$model: SYMBOL [str]
$args: ARGUMENTS [dict]

In this case, there is the constraint that the imported class must be a BaseModel subtype.

Symbol

If you want to import an already instanced python object, you can use the symbol directive. This directive is also available with a simple call form, instead of the more complex special forms of call and model.

$symbol(symbol: str)

Where symbol is a string in the same format used in call or model.

Example:

pi: $symbol(numpy.pi)

Loops

You want to repeat a configuration node, iterating over a list of values? You can do this with the for directive, available only with the following key-value form:

$for(ITERABLE[, ID]): BODY

Where:

  • ITERABLE can be either:

    • an integer: the loop will iterate from 0 to ITERABLE - 1.

    • a context key (just like a var) that points to an integer, as above.

    • a context key that points to a list: the loop will iterate over the items of this list.

  • ID is an optional identifier for this for-loop, used to distinguish this specific loop from all the others, in case multiple loops are nested. Think of it like the x in for x in my_list:. When not specified, Choixe will use a random uuid behind the scenes.

  • BODY can be either:

    • A dictionary - the for loop will perform dictionary union over all the iterations.

    • A list - the for loop will perform list concatenation over all the iterations.

    • A string - the for loop will perform string concatenation over all the iterations.

For-loops alone are not that powerful, but they are meant to be used along two other directives:

  • $index(identifier: Optional[str] = None) or $index

  • $item(identifier: Optional[str] = None) or $item

They, respectively, return the integer index and the item of the current loop iteration. If no identifier is specified (you can use the compact form), they will refer to the first for loop encountered in the stack. Otherwise, they will refer to the loop whose identifier matches the one specified.

Optionally, the item directive can contain a pydash key starting with the loop id, to refer to a specific item inside the structure.

Example:

alice:
    # For loop that merges the resulting dictionaries
    "$for(params.cats, x)":
        cat_$index(x):
            index: I am cat number $index
            name: My name is $item(x.name)
            age: My age is $item(x.age)
bob:
    # For loop that extends the resulting list
    "$for(params.cats, x)":
        - I am cat number $index
        - My name is $item(x.name)
charlie:
    # For loop that concatenates the resulting strings
    "$for(params.cats, x)": "Cat_$index(x)=$item(x.age) "

Given the context:

params:
  cats:
    - name: Luna
      age: 5
    - name: Milo
      age: 6
    - name: Oliver
      age: 14

Will result in:

alice:
  cat_0:
    age: My age is 5
    index: I am cat number 0
    name: My name is Luna
  cat_1:
    age: My age is 6
    index: I am cat number 1
    name: My name is Milo
  cat_2:
    age: My age is 14
    index: I am cat number 2
    name: My name is Oliver
bob:
  - I am cat number 0
  - My name is Luna
  - I am cat number 1
  - My name is Milo
  - I am cat number 2
  - My name is Oliver
charlie: "Cat_0=5 Cat_1=6 Cat_2=14 "

Switch Case

Choixe also supports switch-case-like control statements to change a configuration node based on the value of a context variable. This is especially useful for conditioned workflows and data pipelines.

Switch-case is only available with a key-value form.

greeting_action:
  $switch(nation):
    # If the value is one of the following ...
    - $case: [UK, USA, Australia]
      $then: "Say 'hello'."

    # If the nation exactly matches ...
    - $case: Italy
      $then: "Say 'ciao'."

    # Optional default case
    - $default: "Just wave."

Based on different values from the context, the possible outcomes are the following:

# For UK, USA and Australia
greeting_action: "Say 'hello'."
# For Italy
greeting_action: "Say 'ciao'."
# For everything else
greeting_action: "Just wave."

Utilities

Along other more structural directives, Choixe provides some utilities commonly used when writing a configuration file.

UUID

You can generate a random uuid with the uuid directive.

$uuid

Simple as that.

Date

You can get the current datetime with the date directive.

$date(format: str = None)

Where format is the format string (see python strftime format codes for more info). Defaults to isoformat.

Command

You can get the output of a system command with the cmd directive.

$cmd(command: str)

Where command is a string with the command to run.

Temporary Directories

You can get the path to a temporary directory with a custom name, on Linux, this results in the creation of a subfolder of /tmp.

$tmp(name: str = None)

Where name is the name of the subfolder. If no name is specified, a random name unique for this specific configuration will be generated.

Random number generator

Choixe provides the rand directive to generate random numbers. You can choose whether to generate single numbers, lists or numpy arrays; you can switch between float/int type and also control the distribution of random numbers using a PDF of your choice. Check out the “rand” examples section for a brief tutorial on how to use it.