Getting started

Introduction

Nextflow is a bioinformatics workflow manager in which we can develop or reuse bioinformatics pipeline analyses. It manages and supports many execution platforms like local, HPC, cloud and so one. The aim is to develop a pipeline which can work locally on your laptop and eventually in HPC environments or any other resource that could scale with your data. This could be achieved by writing pipelines in Nextflow scripting language (or DSL2) and managing your software requirements with conda, singularity or docker.

Learning Nextflow

There are online a series of resources and tutorial about nextflow. The first is this youtube playlist (here are the tutorial notes to code along and a git repository adapted to work in a local environment). Next there is the nextflow reference documentation, which explains things in details. The community builded pipelines can be found in the pipeline section of nf-core community site, while DSL2 pipeline modules can be found in nf-core/modules github repository.

Installing Nextflow

The official nextflow installation page is located at https://www.nextflow.io. In order to install nextflow in your local environment ensure you have java installed:

java -version

Warning

You don’t need a full release of java, the openjdk version is enough. Starting from nextflow 24.10.3 the support from java lower than 11 has been dropped, even for the vscode nextflow extensions. If you don’t have java installed, you can install as a user using SdkMan. See nextflow installation requirements for more information.

Next you could download nextflow in your local directory and make it executable with:

curl -s https://get.nextflow.io | bash
chmod +x nextflow

If you install nextflow in a directory inside your $PATH environment, you can avoid to specify the relative or the full path when calling nextflow.

Hint

Nextflow is already installed in our shared core environment, and can be called like a command since is available via $PATH environment variable

Finally verify your installation with:

mkdir nf-hello
cd nf-hello
nextflow run hello

Install nf-core/tools

nf-core/tools is a python package which integrates nextflow and is an helper tools for the nextflow community. Using nf-core software you could manage nextflow pipelines and modules. You can install nf-core in many ways, but the recommended way is using pip:

pip install nf-core
nf-core --help

To update the package you can use:

pip install nf-core --upgrade

Note

You could install nf-core in a conda environment. Even if there’s a nf-core conda package, is better to install the pypi package version since it is the most update release and avoid some dependency issues with bioconda:

conda create --name nf-core pip
conda activate nf-core
pip install nf-core

However, it’s better to do this in a fresh conda environment used only for nextflow. Please see our consideration on channels.

Tip

You can add autocompletion for nf-core within a conda environment. Simply add the activation instruction in eval "$(_NF_CORE_COMPLETE=bash_source nf-core)" in your $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh, and the deactivation instruction complete -r nf-core in your $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh. Test the nf-core autocompletion with:

complete -p nf-core

See Setting environment variables for more information.

Hint

nf-core with autocompletion is already installed in a nf-core environment in our shared core VM:

conda activate nf-core
nf-core --help

Configuring nextflow

Nextflow can be customized in different ways: there are configuration files, which can be used to customize a single pipeline execution, and environment variables, which can be used to customize the nextflow runtime and the underlying Java virtual machine. Those configuration files can be stored in multiple location, for example in your home directory, in the pipeline directory and in the directory where you are running the pipeline. The configuration files are loaded in a specific order, and the last loaded configuration file will override the previous ones: the lowest priority configuration file is the one in the home directory, while the highest priority configuration files are in the directory where you are running the pipeline. This means that you can have a default configuration file in $HOME/.nextflow/config and a pipeline specific configuration file in the pipeline directory, and the latter will override the former. More information on configuration files can be found in the Configuration file section of nextflow documentation.

Default configuration file

The default configuration has the lowest priority and can be used to define option can be applied to all your pipelines. This file is located in $HOME/.nextflow/config and can be used for example for limiting resources usage:

executor {
  name = 'slurm'
  queueSize = 50
  submitRateLimit = '10 sec'
}

In this way is possible to setup a default configuration for all your pipelines, by limiting the job submission in order to avoid to overload the cluster scheduler. There are some tips for HPC users, please take a look at nextflow forum for 5 Nextflow Tips for HPC Users and Five more tips for Nextflow user on HPC articles.

Environment variables

Setting NXF_SINGULARITY_CACHEDIR

Using nextflow with singularity lets you to define a directory where remote Singularity images are stored. This could speed up a lot pipelines execution times, since images are downloaded once and then used when needed. You can define the location of such directory by setting the NXF_SINGULARITY_CACHEDIR environment variable. Nextflow will create such directory for you and will place every singularity downloaded image inside this directory

Hint

NXF_SINGULARITY_CACHEDIR is already defined for every user in our shared core infrastructure, and points by default at your ${HOME}/nxf_singularity_cache/ directory. If you want to change this value (for example, by setting a shared cache folder), you have to define such variable in your $HOME/.profile configuration file, for example:

# override nextflow singularity cache dir
export NXF_SINGULARITY_CACHEDIR=/home/core/nxf_singularity_cache/

You can also define the SINGULARITY_CACHEDIR environment variable, which will be used by singularity itself to cache layers and temporary files: this could help in managing singularity cache in a more efficient way. See Set SINGULARITY_CACHEDIR for more information.

Warning

When using a computing cluster it must be a shared folder accessible from all computing nodes.

Other nextflow environment variables

There are others environment variables which could be useful to set in order to customize your nextflow experience. You could find a list of them in the Environment variables nextflow documentation. Here are a selection of them:

Nextflow environment variables

Name

Description

Example

NXF_EXECUTOR

Defines the default process executor

slurm

NXF_OPTS

Provides extra options for the Java and Nextflow runtime.
It must be a blank separated list of -Dkey[=value] properties

-Xms500M -Xmx4G

NXF_SINGULARITY_CACHEDIR

Directory where remote Singularity images are stored.
When using a computing cluster it must be a shared
folder accessible from all compute nodes.

$WORK/nxf_singularity_cache

NXF_WORK

Directory where working files are stored
(usually your scratch directory)

"$CINECA_SCRATCH/nxf_work"

NXF_OFFLINE

When true disables the project automatic download and
update from remote repositories (default: false).

true

NXF_ANSI_LOG

Enables/disables ANSI console output
(default true when ANSI terminal is detected).

false

NXF_VER

Specifies the Nextflow version to use.
This can be useful when working with multiple Nextflow versions.
or when your pipeline requires an older version of Nextflow.

25.10.4

Those environment variables could be set in your $HOME/.profile (Debian) or $HOME/.bash_profile (Red-Hat) configuration files, for example:

# Nextflow custom environment variables
export NXF_EXECUTOR=slurm
export NXF_OPTS="-Xms500M -Xmx4G"
export NXF_SINGULARITY_CACHEDIR="$WORK/nxf_singularity_cache"
export NXF_WORK="$CINECA_SCRATCH/nxf_work"
export NXF_OFFLINE='true'
export NXF_ANSI_LOG='false'

Access to private repositories

The file $HOME/.nextflow/scm can store the configuration required to access to private repository in GitHub, for example:

providers {
  github {
    user = '<your GitHub user>'
    password = '<your GitHub password>'
  }
}

You could find more information in Git configuration section of nextflow documentation and in Configure Git private repositories with Nextflow blog post.

Access to private nextflow modules

Warning

This section is quite old and could be outdated. Please check if the following information are still valid.

In order to get access to the private nextflow-modules, you need to configure GitHub CLI in order to create the ~/.config/gh/hosts.yml file, which is a fundamental requisite in order to deal with private modules with nf-core modules. The easiest way to create this configuration is through GitHub CLI:

gh auth login

See the documentation on gh auth login to have more information