To add a module to ngless there are two options: external or internal modules. External modules are the simplest option.
External modules can perform two tasks:
- Add new references to ngless
- Add functions to ngless
Adding references makes them available to the
map() call using the
reference argument and (optionally) allows for calls to
specifying any annotation file.
Like everything else in ngless, these are versioned for reproducibility so that the resulting script implicitly encodes the exact version of the databases used.
Functions in external modules map to command line calls to a script you provide.
How to define an external module¶
You can use the example module in the ngless source for inspiration. That is a complete, functional module.
A module is defined by an
Every module has a name and a version:
name: 'module' version: '0.0.0'
Everything else is optional.
References are added with a references section, which is a list of
references. A reference contains a
fasta-file and (optionally) a
gtf-file. For example:
references: - name: 'ref' fasta-file: 'data/reference.fna' gtf-file: 'data/reference.gtf.gz'
Note that the paths are relative to the module directory. The GTF file may be gzipped.
init section defines an initialization command. This will be run
before anything else in any script which imports this module. The intention
is that the module can check for any dependencies and provide the user with an
early error message instead of failing later after. For example:
init: init_cmd: './init.sh' init_args: - "Hello" - "World"
will cause ngless to run the command
./init.sh Hello World whenever a user
imports the module.
A note about paths: paths you define in the module.yaml file are relative
to the Yaml file itself. Thus you put all the necessary scripts and data in
the module directory. However, the scripts are run with the current working
directory of wherever the user is running the ngless protocol (so that any
relative paths that the user specifies work as expected). To find your data
files inside your module, ngless sets the environmental variable
NGLESS_MODULE_DIR as the path to the module directory.
To add new functions, use a
functions section, which should contain a list of
functions encoded in YaML format. Each function has a few required arguments:
the name by which the function will be called inside of an ngless
the script to call for this function. Note that the user will never see
functions: - nglName: "test" arg0: "./run-test.sh"
will enable the user to call a function
test() which will translate into a
call to the
run-test.sh script (see the note above about paths).
You can also add arguments to your function, naturally. Remember that ngless
functions can have only one unnamed argument and any number of named arguments.
To specify the unnamed argument add a
arg1 section, with the key
arg1: atype: <one of 'readset'/'mappedreadset'/'counts'/'str'/'flag'/'int'/'option'>
The arguments of type readset, mappedreadset, and counts are passed as paths to a file on disk. Your command is assumed to not change these, but make a copy if necessary. Bad things will happen if you change the files. You can specify more details on which kind of file you expect with the following optional arguments:
filetype: <one of "tsv"/"fq1"/"fq2"/"fq3"/"sam"/"bam"/"sam_or_bam"/"tsv"> can_gzip: true/false can_bzip2: true/false can_stream: true/false
can_bzip2 indicate whether your script can accept
compressed files (default: False).
can_stream indicates whether the input
can be a pipe (default: False, which means that an intermediate file will
always be used).
For example, if your tool wants a SAM file (and never a BAM file), you can write:
arg1: atype: mappedreadset filetype: sam
ngless will ensure that your tool does receive a SAM file (including
converting BAM to SAM if necessary).
Finally, additional argument are specified by a list called
Entries in this list have exactly the same format as the
arg1 entry, except
that they have a few extra fields. The extra field
name is mandatory, while
everything else is optional:
additional: - name: <name> atype: <as for arg1: 'readset'/'mappedreadset'/...> def: <default value> required: true/false
Arguments of type
flag have an optional extra argument,
is a list of strings which will be passed as extra arguments when the flag is
true. You can also just specify a single string. If
when-true is missing,
ngless will pass an option of the form
--name (i.e., a double-dash then the
name used). For example:
additional: - name: verbose atype: flag def: false when-true: "-v" - name: complete atype: flag def: false when-true: - "--output=complete" - "--no-filter"
All other argument types are passed to your script using the syntax
--name=value if they are present or if a default has been provided.
Arguments of type
option map to symbols in ngless and require you to add an
allowed specifying the universe of allowed symbols. Ngless
will check that the user specifies arguments from the allowable universe. For
additional: - atype: 'option' name: 'verbosity' def: 'quiet' allowed: - 'quiet' - 'normal' - 'loud'
If you do not have a fixed universe for your argument, then it should be a
required flag determines whether the argument is required. Note that
arguments with a default argument are automatically optional (ngless may
trigger a warning if you mark an argument with a default as required).
To return a value, you must request that ngless generate a new temporary file
for the script to generate output to. Therefore, you need to specify a
return section, with three parameters:
rtype (return type, see below),
name the name of the argument to use, and
extension the file extension
of the output type.
return: rtype: "counts" name: "ofile" extension: "sam"
rtype must be one of
readset isn’t currently supported.
If you plan to make use of search path expansion, in order
for NGLess to expand the argument prior to passing it to the external module
you need to set
atype: "str" and
additional: - atype: 'str' name: 'reference' expand_searchpath: true
Finally, if you wish to, you can add one or more citations:
citation: "A paper which you want to be listed when users import your module"
This will be printed out whenever users use your module and thus will help you get exposure.
If you have more than one citation, you can use the
citations key and
provide a list:
citations: - "Paper 1" - "Paper 2"