YAML based sample lists¶
Since NGLess 1.5
Specifying a list of samples in a YAML file:¶
You can specify a list of samples in YAML format.
basedir: /share/data/metagenomes
samples:
sample1:
- paired:
- data/Sample1a.1.fq.gz
- data/Sample1a.2.fq.gz
- paired:
- data/Sample1b.1.fq.gz
- data/Sample1b.2.fq.gz
sample2:
- paired:
- data/Sample2.1.fq.gz
- data/Sample2.2.fq.gz
- single:
- data/Sample2.extra.fq.gz
The format is the following
basedir
(optional): if specified, all relative paths are relative to this directory. Otherwise, paths are relative to the current directory where NGLess is executing (not where the YAML file is located)samples
: a dictionary mapping a sample name to a list of files
Using the YAML format in NGLess¶
You can load a sample list with the load_sample_list
function:
ngless "1.5"
samples = load_sample_list('list.yaml')
input = samples[0]
input = preprocess(input) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
...
It can also be used with the parallel module module’s
run_for_all_samples
function. For example:
ngless "1.5"
import "parallel" version "1.1"
input = run_for_all_samples(load_sample_list('list.yaml'))
input = preprocess(input) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
write(input, ofile='outputs' </> input.name() + '.fq.xz')
...
Note how we used the .name()
method in the readset object to get the name of
the selected sample.
Loading a single sample from an YAML file¶
The function load_sample_from_yaml
(which takes a YAML file and a mandatory
sample
argument) will return a single sample (identified by the sample
argument).
ngless "1.5"
input = load_sample_from_yaml('list.yaml', sample='sample-id')
...