Overview¶
Pipelime is equipped with a complete framework to ease the creation of any command line interface. To get the most out of it, first you should feel comfortable with the basic concepts. So, just run pipelime in your shell and dive into the documentation!
Basic Usage¶
The following options applies to main pipelime command. You recognize them because they start with -- or -:
--help,-h: show the help message and exit.--version: show pipelime version number and exit.--dry-run,-d: load the configuration, create the command object, but skip the actual execution.--verbose,-v: increase verbosity level, really useful for debugging, especially when used in combination with--dry-run.--output,-o: output file path (yaml/json) where to save the effective configuration.--run-all,--no-run-all: in case of multiple configurations, e.g., when a$sweepis present, run them all; otherwise, run only the first one. If not specified, user will be notified if multiple configurations are found.--module,-m: additional module and packages where user-defined commands, sequence generators, piped operations and stages are defined. This option can be specified multiple times.--config,-c: path to a yaml/json file with all the parameters required by the command.--context: path to a yaml/json file with the context needed by Choixe to resolve variables, for loops etc. It can be automatically loaded if namedcontext*.[yaml|yml|json]and placed in the same folder of the configuration file.
As we will see in a moment, the configuration file is in fact merged with command line arguments starting with ++ or +. Likewise, context file is merged with command line arguments starting with @@ or @. Also, after a double slash //, both ++/+ and @@/@ can be used to specify context arguments.
Beside this bunch of options, there is also a list of CLI subcommands:
help,h: same as--help, but can be used also to get help on a pipelime command, a sequence operator or a stage (see Get Help).list,ll,l,ls: list all the available pipelime commands, sequence operators and stages. If--moduleis specified, only the symbols defined in the specified module(s) will be listed.list-commands,list-cmds,list-cmd,lc,ls-cmds,ls-cmd,lsc: same aslist, but printing only pipelime commands.list-operators,list-ops,list-op,lo,ls-ops,ls-op,lso: same aslist, but printing only sequence generators and piped operators.list-stages,list-stgs,list-stg,lst,ls-stgs,ls-stg,lss: same aslist, but printing only stages.audit,a: inspect the given configuration and context, if any, printing the effective configuration and missing definitions. A wizard to write a new valid context is started afterwards.wizard,w: start a wizard to write a configuration file for a given pipelime command.exec,exe,x,e: execute a configuration where the command is the top-level key, useful when you want to ship a configuration for a single command to run.
Now we are ready to explore some common scenarios.
Get Help¶
All the list* commands can be used to retrieve the available pipelime interfaces, i.e., commands, sequence operators and stages, limiting the search to specific modules with -m. For example:
$ pipelime list-stg
>>>
━━━━━ Sample Stages
albumentations pipelime.stages.augmentations.StageAlbumentations Sample augmentation via Albumentations.
compose pipelime.stages.base.StageCompose Applies a sequence of stages.
duplicate-key pipelime.stages.key_transformations.StageDuplicateKey Duplicate an item.
forget-source pipelime.stages.item_sources.StageForgetSource Removes data sources, ie, file paths or remotes, from items.
identity pipelime.stages.base.StageIdentity Returns the input sample.
item-info pipelime.stages.item_info.StageItemInfo Collects item infos from samples.
WARNING: this stage CANNOT be combined with MULTIPROCESSING.
format-key pipelime.stages.key_transformations.StageKeyFormat Changes key names following a format string.
filter-keys pipelime.stages.key_transformations.StageKeysFilter Filters sample keys.
lambda pipelime.stages.base.StageLambda Applies a callable to the sample.
remap-key pipelime.stages.key_transformations.StageRemap Remaps keys in sample preserving internal values.
replace-item pipelime.stages.item_replacement.StageReplaceItem Replaces items in sample preserving internal values.
remote-upload pipelime.stages.item_sources.StageUploadToRemote Uploads the sample to one or more remote servers.
Where each line shows:
the title of the stage you can use in your configuration file.
the full name of the class implementing the stage, if needed.
a short description of the stage, i.e., the docstring of the class.
To get help on a specific command, operator or stage, just type help:
$ pipelime help filter-keys
>>>
━━━━━ Sample Stage
filter-keys
(*, key_list: Sequence[str], negate: bool = False)
Filters sample keys.
Fields Description Type Default
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
key_list ▶ List of keys to preserve. Sequence[str] ✗
negate ▶ TRUE to delete `key_list`, FALSE delete all but keys in `key_list`. bool False
pipelime.stages.key_transformations.StageKeysFilter
Tip
You can autogenerate similar help messages for any class derived from pydantic.BaseModel!
Just print your class with pipelime.cli.pl_print.
Create A New Configuration¶
To create a new configuration file, just run pipelime wizard [command] and follow the instructions:
$ pipelime wizard clone
First, an help message is printed:
enclose values in
"or'quotes to enforce string values.use
[to start a sequence, then]to end it.likewise,
{to start a mapping and}to end it. Each key-value pair must be separated by a colon:.< [model.class.path]to insert a pydantic model (should be explicitly listed in the type list).? [class.path]to begin a wizard configuration for a Choixe$calldirective, e.g., an object to be instantiated.! [class.path]to add a Choixe$symboldirective.# [name]to begin a wizard configuration for a pipelime command, stage or operation.c# [name],s# [name],o# [name]as above, but specifying the type.
Warning
Since you may use Choixe directives and other fancy stuffs, no validation nor parsing is performed on the data you provide!
See next section to see how to do it.
Tip
Anytime you have to insert a class path, you can either use the usual python dot notation, or provide a path to a python file, e.g., path/to/mymodule.py:MyClass. Though, the latter should be use with caution, since multiprocessing execution is not supported.
Note
The wizard is intended to be used with pipelime commands, however, you may find it works also with stages and operations.
Moreover, you can run it on any class derived from pydantic.BaseModel! Just give the class type to pipelime.cli.wizard.model_cfg_wizard.
Validate A Configuration And Write A Context¶
Once you get your new configuration file, it’s time to validate it and write a context, if needed. Run pipelime audit on your configuration:
$ pipelime audit -c config.yaml
If the configuration is valid, you will see the list of internal imports, variables and symbols.
Tip
pipelime audit works with any yaml/json file using Choixe, so you can use it to validate and parse any configuration file! To save the final processed configuration, use the --output/-o option.
Merge Options From File And Command Line¶
If you run pipelime help on a command, you often see the options in a tree-like structure:
$ pipelime help clone
>>>
━━━━━ Pipelime Command
clone
(*, i: pipelime.commands.interfaces.InputDatasetInterface, o:
pipelime.commands.interfaces.OutputDatasetInterface, g: pipelime.commands.interfaces.GrabberInterface = None)
Clone a dataset. You can use this command to create a local copy of a dataset
hosted on a remote data lake by disabling the `REMOTE_FILE` serialization option.
Fields Description Type Piper Port Default
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
input / i ▶ The input dataset. 📥 INPUT ✗
━━━━━ Compact form:
`<folder>[,<skip_emp
ty>]`
folder ▶ Dataset root Path 📐 PARAMETER ✗
folder.
merge_root_items ▶ Adds root items as bool 📐 PARAMETER True
shared items to each
sample (sample
values take
precedence).
...
output / o ▶ The output 📦 OUTPUT ✗
dataset.
━━━━━ Compact form:
`<folder>[,<exists_o
k>[,<force_new_files
>]]`
folder ▶ Dataset root Path 📐 PARAMETER ✗
folder.
...
serialization ▶ Serialization 📐 PARAMETER override={}
modes for items and disable={} keys={}
keys.
override ▶ Serialization Mapping[str, 📐 PARAMETER {}
modes overridden for Union[str,
specific item types, Sequence[str],
eg, NoneType]]
`{CREATE_NEW_FILE:
[ImageItem,
my.package.MyItem,
my/module.py:OtherIt
em]}`. A Null value
applies to all
items.
The same structure is what you should follow both when writing a configuration file and when providing options from the command line. To do so, you can adopt a pydash-like notation:
.<key>to access a mapped field.[<idx>]to index a list entry.
As for the values:
trueandfalseare converted to booleans (case insensitive).none,nullandnulare interpreted asNone(case insensitive).numbers are converted to integers or floats, depending on the presence of a decimal point.
Also, options declared with no value are interpreted as True boolean flags.
Executing A Command¶
Once you have a valid configuration file, you can run the command as pipelime <command> followed by the configuration and context (NB: context file is usually auto-loaded):
$ pipelime clone -c config.yaml +i input @the_answer 42
In the example above we are running clone using the parameters in config.yaml and the context in context.yaml. We are also overriding the input dataset with the folder input and the Choixe variable the_answer with the value 42.
Though it works, there is a serious drawback: if you distribute config.yaml to other users, you also have to say them to run the clone command, since it might not be easy to infer it from the configuration file. Instead, add clone as a top-level key in config.yaml:
clone:
input:
folder: any_input
output:
folder: output_$var(the_answer)
And now run again with pipelime exec -c config.yaml --context context.yaml +i input @the_answer 42.