Getting started
Introduction
Nextflow is a bioinformatics workflow manager in which we can develop or reuse bioinformatics pipeline analyses. It manages and supports many execution platforms like local, HPC, cloud and so one. The aim is to develop a pipeline which can work locally on your laptop and eventually in HPC environments or any other resource that could scale with your data. This could be achieved by writing pipelines in Nextflow scripting language (or DSL2) and managing your software requirements with conda, singularity or docker.
Learning Nextflow
There are online a series of resources and tutorial about nextflow. The first is this youtube playlist (here are the tutorial notes to code along and a git repository adapted to work in a local environment). Next there is the nextflow reference documentation, which explains things in details. The community builded pipelines can be found in the pipeline section of nf-core community site, while DSL2 pipeline modules can be found in nf-core/modules github repository.
Installing Nextflow
The official nextflow installation page is located at https://www.nextflow.io. In order to install nextflow in your local environment ensure you have java installed:
java -version
Warning
You don’t need a full release of java, the openjdk version is enough. Starting
from nextflow 24.10.3 the support from java lower than 11 has been dropped,
even for the
vscode nextflow extensions.
If you don’t have java installed, you can install
as a user using SdkMan. See
nextflow installation requirements
for more information.
Next you could download nextflow in your local directory and make it executable with:
curl -s https://get.nextflow.io | bash
chmod +x nextflow
If you install nextflow in a directory inside your $PATH environment, you can
avoid to specify the relative or the full path when calling nextflow.
Hint
Nextflow is already installed in our shared core environment, and can be called
like a command since is available via $PATH environment variable
Finally verify your installation with:
mkdir nf-hello
cd nf-hello
nextflow run hello
Install nf-core/tools
nf-core/tools is a python package which
integrates nextflow and is an helper tools for the nextflow community. Using
nf-core software you could manage nextflow pipelines and modules. You can install
nf-core in many ways,
but the recommended way is using pip:
pip install nf-core
nf-core --help
To update the package you can use:
pip install nf-core --upgrade
Note
You could install nf-core in a conda environment. Even if there’s a nf-core
conda package, is better to install the pypi package version since it is the
most update release and avoid some dependency issues with bioconda:
conda create --name nf-core pip
conda activate nf-core
pip install nf-core
However, it’s better to do this in a fresh conda environment used only for nextflow. Please see our consideration on channels.
Tip
You can add autocompletion for nf-core within a conda environment. Simply
add the activation instruction in eval "$(_NF_CORE_COMPLETE=bash_source nf-core)"
in your $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh, and the deactivation
instruction complete -r nf-core in your
$CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh. Test the nf-core
autocompletion with:
complete -p nf-core
See Setting environment variables for more information.
Hint
nf-core with autocompletion is already installed in a nf-core
environment in our shared core VM:
conda activate nf-core
nf-core --help
Configuring nextflow
Nextflow can be customized in different ways: there are configuration files,
which can be used to customize a single pipeline execution, and environment
variables, which can be used to customize the nextflow runtime and the underlying
Java virtual machine. Those configuration files can be stored in multiple location,
for example in your home directory, in the pipeline directory and in the directory
where you are running the pipeline. The configuration files are loaded in a specific
order, and the last loaded configuration file will override the previous ones: the
lowest priority configuration file is the one in the home directory, while the
highest priority configuration files are in the directory where you are
running the pipeline. This means that you can have a default configuration file in
$HOME/.nextflow/config and a pipeline specific configuration file in the
pipeline directory, and the latter will override the former.
More information on configuration files can be found in
the Configuration file
section of nextflow documentation.
Default configuration file
The default configuration has the lowest priority and can be used to define option
can be applied to all your pipelines. This file is
located in $HOME/.nextflow/config and can be used for example for limiting
resources usage:
executor {
name = 'slurm'
queueSize = 50
submitRateLimit = '10 sec'
}
In this way is possible to setup a default configuration for all your pipelines, by limiting the job submission in order to avoid to overload the cluster scheduler. There are some tips for HPC users, please take a look at nextflow forum for 5 Nextflow Tips for HPC Users and Five more tips for Nextflow user on HPC articles.
Environment variables
Setting NXF_SINGULARITY_CACHEDIR
Using nextflow with singularity lets you to define a directory where remote Singularity
images are stored. This could speed up a lot pipelines execution times, since images
are downloaded once and then used when needed. You can define the location of such
directory by setting the NXF_SINGULARITY_CACHEDIR environment variable. Nextflow
will create such directory for you and will place every singularity downloaded image
inside this directory
Hint
NXF_SINGULARITY_CACHEDIR is already defined for every user in our shared core
infrastructure, and points by default at your ${HOME}/nxf_singularity_cache/ directory.
If you want to change this value (for example, by setting a shared cache folder),
you have to define such variable in your $HOME/.profile configuration file,
for example:
# override nextflow singularity cache dir
export NXF_SINGULARITY_CACHEDIR=/home/core/nxf_singularity_cache/
You can also define the SINGULARITY_CACHEDIR environment variable, which
will be used by singularity itself to cache layers and temporary files: this
could help in managing singularity cache in a more efficient way. See
Set SINGULARITY_CACHEDIR for more information.
Warning
When using a computing cluster it must be a shared folder accessible from all computing nodes.
Other nextflow environment variables
There are others environment variables which could be useful to set in order to customize your nextflow experience. You could find a list of them in the Environment variables nextflow documentation. Here are a selection of them:
Name |
Description |
Example |
|---|---|---|
NXF_EXECUTOR |
Defines the default process executor |
|
NXF_OPTS |
Provides extra options for the Java and Nextflow runtime.
It must be a blank separated list of
-Dkey[=value] properties |
|
NXF_SINGULARITY_CACHEDIR |
Directory where remote Singularity images are stored.
When using a computing cluster it must be a shared
folder accessible from all compute nodes.
|
|
NXF_WORK |
Directory where working files are stored
(usually your scratch directory)
|
|
NXF_OFFLINE |
When true disables the project automatic download and
update from remote repositories (default:
false). |
|
NXF_ANSI_LOG |
Enables/disables ANSI console output
(default
true when ANSI terminal is detected). |
|
Those environment variables could be set in your $HOME/.profile (Debian) or
$HOME/.bash_profile (Red-Hat) configuration files, for example:
# Nextflow custom environment variables
export NXF_EXECUTOR=slurm
export NXF_OPTS="-Xms500M -Xmx4G"
export NXF_SINGULARITY_CACHEDIR="$WORK/nxf_singularity_cache"
export NXF_WORK="$CINECA_SCRATCH/nxf_work"
export NXF_OFFLINE='true'
export NXF_ANSI_LOG='false'
Access to private repositories
The file $HOME/.nextflow/scm can store the configuration required to access to
private repository in GitHub, for example:
providers {
github {
user = '<your GitHub user>'
password = '<your GitHub password>'
}
}
You could find more information in Git configuration section of nextflow documentation and in Configure Git private repositories with Nextflow blog post.
Access to private nextflow modules
Warning
This section is quite old and could be outdated. Please check if the following information are still valid.
In order to get access to the private
nextflow-modules, you need to
configure GitHub CLI in order to create the
~/.config/gh/hosts.yml file, which is a fundamental requisite in order to
deal with private modules with nf-core modules.
The easiest way to create this configuration is through GitHub CLI:
gh auth login
See the documentation on gh auth login to have more information