Advanced options

Subsample mode

Subsample mode simply throws away >90% of the data. This allows you to quickly check whether your pipeline works as expected and the output files have the expected format. Subsample mode should never be used in production. To use it, pass the option --subsample on the command line:

ngless --subsample script.ngl

will run script.ngl in subsample mode, which will probably run much faster than the full pipeline, allowing to quickly spot any issues with your code. A 10 hour pipeline will finish in a few minutes (sometimes in just seconds) when run in subsample mode.

Note

subsample mode is also a way to make sure that all indices exist. Any map() calls will check that the necessary indices are present: if a fafile argument is used, then the index will be built if necessary; if a reference argument is used, then the necessary datasets are downloaded if they have not previously been obtained.

Subsample mode also changes all your write() so that the output files include the subsample extension. That is, a call such as:

write(output, ofile='results.txt')

will automatically get rewritten to:

write(output, ofile='results.txt.subsample')

This ensures that you do not confuse subsampled results with the real thing.