Renku Command Line¶
The base command for interacting with the Renku platform.
renku
(base command)¶
To list the available commands, either run renku
with no parameters or
execute renku help
:
$ renku help
Usage: renku [OPTIONS] COMMAND [ARGS]...
Check common Renku commands used in various situations.
Options:
--version Print version number.
--config PATH Location of client config files.
--config-path Print application config path.
--install-completion Install completion for the current shell.
--path <path> Location of a Renku repository.
[default: (dynamic)]
--renku-home <path> Location of the Renku directory.
[default: .renku]
--external-storage / -S, --no-external-storage
Use an external file storage service.
-h, --help Show this message and exit.
Commands:
# [...]
Configuration files¶
Depending on your system, you may find the configuration files used by Renku command line in a different folder. By default, the following rules are used:
- MacOS:
~/Library/Application Support/Renku
- Unix:
~/.config/renku
- Windows:
C:\Users\<user>\AppData\Roaming\Renku
If in doubt where to look for the configuration file, you can display its path
by running renku --config-path
.
You can specify a different location via the RENKU_CONFIG
environment
variable or the --config
command line option. If both are specified, then
the --config
option value is used. For example:
$ renku --config ~/renku/config/ init
instructs Renku to store the configuration files in your ~/renku/config/
directory when running the init
command.
renku init
¶
Create an empty Renku project or reinitialize an existing one.
Starting a Renku project¶
If you have an existing directory which you want to turn into a Renku project, you can type:
$ cd ~/my_project
$ renku init
or:
$ renku init ~/my_project
This creates a new subdirectory named .renku
that contains all the
necessary files for managing the project configuration.
If provided directory does not exist, it will be created.
Updating an existing project¶
There are situations when the required structure of a Renku project needs
to be recreated or you have an existing Git repository. You can solve
these situation by simply adding the --force
option.
$ git init .
$ echo "# Example\nThis is a README." > README.md
$ git add README.md
$ git commit -m 'Example readme file'
# renku init would fail because there is a git repository
$ renku init --force
You can also enable the external storage system for output files, if it was not installed previously.
$ renku init --force --external-storage
renku config
¶
Get and set Renku repository or global options.
Set values¶
You can set various Renku configuration options, for example the image registry URL, with a command like:
$ renku config registry https://registry.gitlab.com/demo/demo
Query values¶
You display a previously set value with:
$ renku config registry
https://registry.gitlab.com/demo/demo
renku datasets
¶
Work with datasets in the current repository.
Manipulating datasets¶
Creating an empty dataset inside a Renku project:
$ renku dataset create my-dataset
Adding data to the dataset:
$ renku dataset add my-dataset http://data-url
This will copy the contents of data-url
to the dataset and add it
to the dataset metadata.
To add data from a git repository, you can specify it via https or git+ssh URL schemes. For example,
$ renku dataset add my-dataset git+ssh://host.io/namespace/project.git
Sometimes you want to import just a specific path within the parent project.
In this case, use the --target
flag:
$ renku dataset add my-dataset --target relative-path/datafile \
git+ssh://host.io/namespace/project.git
To trim part of the path from the parent directory, use the --relative-to
option. For example, the command above will result in a structure like
data/
my-dataset/
relative-path/
datafile
Using instead
$ renku dataset add my-dataset \
--target relative-path/datafile \
--relative-to relative-path \
git+ssh://host.io/namespace/project.git
will yield:
data/
my-dataset/
datafile
renku run
¶
Track provenance of data created by executing programs.
Capture command line execution¶
Tracking execution of your command line script is done by simply adding the
renku run
command before the actual command. This will enable detection of:
- arguments (flags),
- string and integer options,
- input files or directories if linked to existing paths in the repository,
- output files or directories if modified or created while running the command.
Note
If there were uncommitted changes in the repository, then the
renku run
command fails. See git status for details.
Warning
Input and output paths can only be detected if they are passed as
arguments to renku run
.
Detecting input paths¶
Any path passed as an argument to renku run
, which was not changed during
the execution, is identified as an input path. The identification only works if
the path associated with the argument matches an existing file or directory
in the repository.
The detection might not work as expected if:
- a file is modified during the execution. In this case it will be stored as an output;
- a path is not passed as an argument to
renku run
.
Detecting output paths¶
Any path modified or created during the execution will be added as an output.
Because the output path detection is based on the Git repository state after
the execution of renku run
command, it is good to have a basic understading
of the underlying principles and limitations of tracking files in Git.
Git tracks not only the paths in a repository, but also the content stored in those paths. Therefore:
- a recreated file with the same content is not considered an output file, but instead is kept as an input;
- file moves are detected based on their content and can cause problems;
- directories cannot be empty.
Note
When in doubt whether the outputs will be detected, remove all
outputs using git rm <path>
followed by git commit
before running
the renku run
command.
Command does not produce any files (--no-output
)
If the program does not produce any outputs, the execution ends with an error:
Error: There are not any detected outputs in the repository.
You can specify the --no-output
option to force tracking of such
an execution.
Detecting standard streams¶
Often the program expect inputs as a standard input stream. This is detected
and recorded in the tool specification when invoked by renku run cat < A
.
Similarly, both redirects to standard output and standard error output can be done when invoking a command:
$ renku run grep "test" B > C 2> D
Warning
Detecting inputs and outputs from pipes |
is not supported.
Exit codes¶
All Unix commands return a number between 0 and 255 which is called “exit code”. In case other numbers are returned, they are treaded module 256 (-10 is equivalent to 246, 257 is equivalent to 1). The exit-code 0 represents a success and non-zero exit-code indicates a failure.
Therefore the command speficied after renku run
is expected to return
exit-code 0. If the command returns different exit code, you can speficy them
with --success-code=<INT>
parameter.
$ renku run --success-code=1 --no-output fail
renku log
¶
Show provenance of data created by executing programs.
File provenance¶
Unlike the traditional file history format, which shows previous revisions of the file, this format presents tool inputs together with their revision identifiers.
A *
character shows to which lineage the specific file belongs to.
A @
character in the graph lineage means that the corresponding file does
not have any inputs and the history starts there.
When called without file names, renku log
shows the history of most
recently created files. With the --revision <refname>
option the output is
shown as it was in the specified revision.
Provenance examples¶
renku log B
- Show the history of file
B
since its last creation or modification. renku log --revision HEAD~5
- Show the history of files that have been created or modified 5 commits ago.
renku log --revision e3f0bd5a D E
- Show the history of files
D
andE
as it looked in the commite3f0bd5a
.
Output formats¶
Following formats supported when specified with --format
option:
- ascii
- dot
You can generate a PNG of the full history of all files in the repository using the dot program.
$ FILES=$(git ls-files --no-empty-directory --recurse-submodules)
$ renku log --format dot $FILES | dot -Tpng > /tmp/graph.png
$ open /tmp/graph.png
renku status
¶
Show status of data files created in the repository.
Inspecting a repository¶
Displays paths of outputs which were generated from newer inputs files and paths of files that have been used in diverent versions.
The first paths are what need to be recreated by running renku update
.
See more in section about renku update.
The paths mentioned in the output are made relative to the current directory
if you are working in a subdirectory (this is on purpose, to help
cutting and pasting to other commands). They also contain first 8 characters
of the corresponding commit identifier after the #
(hash). If the file was
imported from another repository, the short name of is shown together with the
filename before @
.
renku update
¶
Update outdated files created by the “run” command.
Recreating outdated files¶
The information about dependencies for each file in the repository is generated from information stored in the underlying Git repository.
A minimal dependency graph is generated for each outdated file stored in the repository. It means that only the necessary steps will be executed and the workflow used to orchestrate these steps is stored in the repository.
Assume that the following history for the file H
exists.
C---D---E
/ \
A---B---F---G---H
The first example shows situation when D
is modified and files E
and
H
become outdated.
C--*D*--(E)
/ \
A---B---F---G---(H)
** - modified
() - needs update
In this situation, you can do efectively two things:
Recreate a single file by running
$ renku update E
Update all files by simply running
$ renku update
Note
If there were uncommitted changes then the command fails. Check git status to see details.
Pre-update checks¶
In the next example, files A
or B
are modified, hence the majority
of dependent files must be recreated.
(C)--(D)--(E)
/ \
*A*--*B*--(F)--(G)--(H)
To avoid excesive recreation of the large portion of files which could have
been affected by a simple change of an input file, consider speficing a single
file (e.g. renku update G
). See also renku status.
Update siblings¶
If a tool produces multiple output files, these outputs need to be always updated together.
(B)
/
*A*--[step 1]--(C)
\
(D)
An attempt to update a single file would fail with the following error.
$ renku update C
Error: There are missing output siblings:
B
D
Include the files above in the command or use --with-siblings option.
The following commands will produce the same result.
$ renku update --with-siblings C
$ renku update B C D
renku rerun
¶
Recreate files created by the “run” command.
Recreating files¶
Assume you have run a step 2 that uses a stochastic algorithm, so each run
will be slightly different. The goal is to regenerate output C
several
times to compare the output. In this situation it is not possible to simply
call renku update since the input file A
has not been modified
after the execution of step 2.
A-[step 1]-B-[step 2*]-C
Recreate a specific output file by running:
$ renku rerun C
If you would like to recreate a file which was one of several produced by a tool, then these files must be recreated as well. See the explanation in updating siblings.
renku mv
¶
Move or rename a file, a directory, or a symlink.
Moving a file that belongs to a dataset will update its metadata. It also will attempt to update tracking information for files stored in an external storage (using Git LFS). Finally it makes sure that all relative symlinks work after the move.
renku workflow
¶
Manage the set of CWL files created by renku
commands.
With no arguments, shows a list of captured CWL files. Several subcommands are available to perform operations on CWL files.
Reference tools and workflows¶
Managing large number of tools and workflows with automatically generated
names may be cumbersome. The names can be added to the last executed
run
, rerun
or update
command by running
renku workflow set-name <name>
. The name can be added to an arbitrary
file in .renku/workflow/*.cwl
anytime later.
renku show
¶
Show information about objects in current repository.
Siblings¶
In situations when multiple outputs have been generated by a single
renku run
command, the siblings can be discovered by running
renku show siblings PATH
command.
Assume that the following graph represents relations in the repository.
D---E---G
/ \
A---B---C F
Then the following outputs would be shown.
$ renku show siblings C
C
D
$ renku show siblings G
F
G
$ renku show siblings A
A
Input and output files¶
You can list input and output files generated in the repository by running
renku show inputs
and renku show outputs
commands. Alternatively,
you can check if all paths specified as arguments are input or output files
respectively.
$ renku run wc < source.txt > result.wc
$ renku show inputs
source.txt
$ renku show outputs
result.wc
$ renku show outputs source.txt
$ echo $? # last command finished with an error code
1
renku storage
¶
Manage an external storage.
renku image
¶
Manipulate images related to the Renku project.
Configure the image registry¶
First, obtain an access token for the registry from GitLab by going to
<gitlab-URL>/profile/personal_access_tokens
. Select only the
read_registry
scope and copy the access token.
$ open https://<gitlab-URL>/profile/personal_access_tokens
$ export ACCESS_TOKEN=<copy-from-browser>
Find your project’s registry path by going to
<gitlab-url>/<namespace>/<project>/container_registry
. The string following
the docker push command is the registry-path
for the project.
$ open https://<gitlab-url>/<namespace>/<project>/container_registry
$ renku config registry https://oauth2:$ACCESS_TOKEN@<registry-path>
You can use any registry with manual authentication step using Docker command line.
$ docker login docker.io
$ renku config registry https://docker.io
Pull image¶
If the image has indeed been built and pushed to the registry, you should be able to fetch it with:
$ renku image pull
This pulls an image that was built for the current commit. You can also fetch an image built for a specific commit with:
# renku image pull --revision <ref-name>
$ renku image pull --revision HEAD~1
renku githooks
¶
Install and uninstall Git hooks.
Prevent modifications of output files¶
The commit hooks are enabled by default to prevent situation when some output file is manually modified.
$ renku init
$ renku run echo hello > greeting.txt
$ edit greeting.txt
$ git commit greeting.txt
You are trying to update some output files.
Modified outputs:
greeting.txt
If you are sure, use "git commit --no-verify".
Error Tracking¶
Renku is not bug-free and you can help us to find them.
GitHub¶
You can quickly open an issue on GitHub with a traceback and minimal system information when you hit an unhandled exception in the CLI.
Ahhhhhhhh! You have found a bug. 🐞
1. Open an issue by typing "open";
2. Print human-readable information by typing "print";
3. See the full traceback without submitting details (default: "ignore").
Please select an action by typing its name (open, print, ignore) [ignore]:
Sentry¶
When using renku
as a hosted service the Sentry integration can be enabled
to help developers iterate faster by showing them where bugs happen, how often,
and who is affected.
- Install Sentry-SDK with
python -m pip install sentry-sdk
; - Set environment variable
SENTRY_DSN=https://<key>@sentry.<domain>/<project>
.
Warning
User information might be sent to help resolving the problem. If you are not using your own Sentry instance you should inform users that you are sending possibly sensitive information to a 3rd-party service.