Troubleshooting

Error executing process

While writing a new pipeline or by running and existing pipeline you could find error in a particular step, which interrupt nextflow execution and let you to fix issues, for example:

[a1/60b160] process > mirdeep2 (null)                                [100%] 1 of 1, failed: 1 ✘
Error executing process > 'mirdeep2 (null)'

Caused by:
  Missing output file(s) `novel_mature.fa` expected by process `mirdeep2 (null)`

Command executed:

  miRDeep2.pl all_samples.fasta ARS-UCD1.2_chrOnly_chrY.fa all_samples.arf bta_mature.fa.fix chi-oar-hsa_mature.fa.fix bta_hairpin.fa.fix -P

Command exit status:
  0

Command output:

<omitting lines>

Work dir:
  /home/cozzip/nf-mirna/work/a1/60b160bf3021f4891bbf42746173b0

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

In this error message, is reported the directory in which the pipeline found such error. You can get the same information by getting logs from the nextflow row. For example, supposing that our last run is named sharp_feynman (you can get information about run name using nextflow log or nextflow log -quiet), you can get information about steps working dir by printing specific fields with nextflow log, for example:

$ nextflow log sharp_feynman -f 'process,status,exit,hash,duration,workdir'
remove_whitespaces      COMPLETED       0       bd/2ebe9a       551ms   /home/cozzip/nf-mirna/work/bd/2ebe9a9f2e1703a18059fbdf1191e7
fastqc  CACHED  0       93/a6692c       1m 39s  /home/cozzip/nf-mirna/work/93/a6692cb2a6c04c08546f71b1814772
trim_galore     CACHED  0       88/984f53       1m 59s  /home/cozzip/nf-mirna/work/88/984f537d2e9641238d42906a959b17
mirdeep_input   CACHED  0       97/d4cd0b       77ms    /home/cozzip/nf-mirna/work/97/d4cd0bf8379abaeee37e4de1297127
mirdeep CACHED  0       b8/36c9c9       3m 38s  /home/cozzip/nf-mirna/work/b8/36c9c9a2265e03124eeaf39ca539b0
mirdeep2        FAILED  0       a1/60b160       1h 3m 22s       /home/cozzip/nf-mirna/work/a1/60b160bf3021f4891bbf42746173b0

In such example, you can see that the only failed step is mirdeep2 in the same folder we get from nextflow error report.

Tip

Get all field names with nextflow log -l or see the execution report table at Trace report

Note

This analysis is failed but has 0 as exit status. If you inspect the nextflow error, you can see that there’s no problem in the execution step, however nextflow is expecting some output files that this analysis doesn’t provide. This could be an error in nextflow configuration.

Now is time to understand what happened. Enter in the failed job work directory an list all files (including hidden ones) with ls -a:

$ ls -a .command*
.command.begin  .command.err  .command.log  .command.out  .command.run  .command.sh

In such example, we choose to display only .command* hidden files: those files are generated by nextflow, they contain pipeline output and also the command to perform such step. In particular, .command.run keeps all the instruction to prepare the working directory and to launch .command.sh, which contains the script parameter in the pipeline configuration files.

In order to have information on errors, we can manually execute the nextflow steps: first of all, we need to export an environment variable in order to increase nextflow verbosity:

export NXF_DEBUG=2

Next we can execute the .command.run scripts, which is executed by nextflow and that call .command.sh:

bash .command.run

Command is expected to fail (since nextflow returned an error previously). However by setting NXF_DEBUG=2, we can see all commands launched by nextflow and in particular the singularity command launched by nextflow. Next we can take such command, simplify it and launch a singularity session in order to test our command using a terminal inside the same singularity container used by our pipeline step, for example with:

singularity exec -B $HOME -B /home/ -B $PWD/ /home/core/nxf_singularity_cache/bunop-mirdeep2.img  /bin/bash

Where all -B parameters indicate all folders that will be mounted inside our container (such as our $HOME directory, the /home directory, which is the common position where we can find input files, and $PWD, which is our nextflow folder in which we found an error), next there is the physical location of our singularity image (/home/core/nxf_singularity_cache/bunop-mirdeep2.img in this example) and then the command we want to run, in such case a new terminal since we want to run .command.sh manually and see why this is raising an error.

Tip

If you cannot recover from the error, you can apply a custom configuration file to the pipeline in order to ignore the failed step. You can find more information in the Handling failing jobs section of this guide.

Failed to pull singularity image

Sometimes singularity cannot download an image from https://quay.io/. In such case, nextflow will raise an error and will stop the execution like this:

Error executing process > 'RNASEQ:QUANTIFY_SALMON:SALMON_SE_TRANSCRIPT (salmon_tx2gene.tsv)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name quay.io-biocontainers-bioconductor-summarizedexperiment-1.18.1--r40_0.img.pulling.1610634041691 docker://quay.io/biocontainers/bioconductor-summarizedexperiment:1.18.1--r40_0 > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    INFO:    Starting build...
    Getting image source signatures
    Copying blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
    Copying blob sha256:77c6c00e8b61bb628567c060b85690b0b0561bb37d8ad3f3792877bddcfe2500
    Copying blob sha256:3aaade50789a6510c60e536f5e75fe8b8fc84801620e575cb0435e2654ffd7f6
    Copying blob sha256:00cf8b9f3d2a08745635830064530c931d16f549d031013a9b7c6535e7107b88
    Copying blob sha256:7ff999a2256f84141f17d07d26539acea8a4d9c149fefbbcc9a8b4d15ea32de7
    Copying blob sha256:d2ba336f2e4458a9223203bf17cc88d77e3006d9cbf4f0b24a1618d0a5b82053
    Copying blob sha256:dfda3e01f2b637b7b89adb401f2f763d592fcedd2937240e2eb3286fabce55f0
    Copying blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
    Copying blob sha256:10c3bb32200bdb5006b484c59b5f0c71b4dbab611d33fca816cd44f9f5ce9e3c
    Copying blob sha256:f981c3bfe61f7355e034d40b620e60aefc6b272a8d0ac10fa9e1892bb6b17b56
    Copying config sha256:ff870dedc9d11d9622344d7a4ff0c0c25a890f2233a84926b6cb0e67f422500e
    Writing manifest to image destination
    Storing signatures
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: no descriptor found for reference "70c154f9aee9152d9e03c474cd4b5e5eee5856cda5b62c46b10c4ae7932e763d"

In such cases, you can solve those errors by manually download the singularity image into $NXF_SINGULARITY_CACHEDIR cache directory. Track the failed command line in nextflow output, then move in $NXF_SINGULARITY_CACHEDIR directory and call such command manually. After downloading the image, rename the file and remove the .pulling.[0-9]* from the image name (nextflow images should end with .img extension). For example in the previous case:

cd $NXF_SINGULARITY_CACHEDIR
singularity pull  --name quay.io-biocontainers-bioconductor-summarizedexperiment-1.18.1--r40_0.img.pulling.1610634041691 docker://quay.io/biocontainers/bioconductor-summarizedexperiment:1.18.1--r40_0 > /dev/null
mv quay.io-biocontainers-bioconductor-summarizedexperiment-1.18.1--r40_0.img.pulling.1610634041691 quay.io-biocontainers-bioconductor-summarizedexperiment-1.18.1--r40_0.img

After that, you could resume your nextflow pipeline by adding the -resume option in your command line in order using the cached results of the previous calculations

Note

nextflow singularity containers are moving from quay to depot.galaxyproject.org: the latter seems to have better downloading performance

Nextflow version does’t match the required version

It is possible that when running a pipeline with nextflow, you will get a error like this:

Nextflow version 20.10.0 does not match workflow required version: >=20.11.0-edge

Is such case, you have two options. The first is to execute a previous version of the pipeline that is compatible with your nextflow version. You can have information on version on nf-core pipeline or directly from the GitHub project of nf-core organization. Once you find your desired version, you have to declare it with the parameter -r when calling nextflow, for example:

nextflow run nf-core/rnaseq -r 2.0 -profile test,singularity -resume

The second option is to upgrade your nextflow version. You can install a specific version of nextflow from the nextflow release page Copy the nextflow asset link present in every release, and then install nextflow like this:

wget -qO- https://github.com/nextflow-io/nextflow/releases/download/v20.12.0-edge/nextflow-20.12.0-edge-all | bash

This will download all the requirements and will put nextflow in your current directory. Change the nextflow default permissions to 755 and move such executable in a directory with a higher position in your $PATH environment, for example $HOME/bin

Cannot find pipeline version

Sometimes is possible that you cannot find a specific version of a pipeline that you know is present in the remote repository with an error like this:

Cannot find revision `x.x.x` -- Make sure that it exists in the remote repository

This could happen if your local version of the pipeline (in your $HOME/.nextflow/assets/) is not updated with the remote repository. In this case, you need to synchronize your local version with the remote repository, for example:

nextflow pull nf-core/methylseq

You can also specify a specific version of the pipeline to pull, for example:

nextflow pull nf-core/methylseq -r 2.7.1

This will update your local version of the pipeline, and you will be able to call the desired version of the pipeline.

Cannot execute nextflow interactively

In HPC environment when the resources are limited in the login nodes, nextflow cannot be executed interactively. In such case, nextflow need to be submitted to a job scheduler. For example, in a SLURM environment, you can define a nextflow job like this:

#!/bin/bash
#SBATCH --nodes=1                       # 1 node
#SBATCH --ntasks-per-node=1             # 1 tasks per node
#SBATCH --cpus-per-task=2               # 2 CPUs per task
#SBATCH --time=4-00:00:00               # time limits if you are forced to use
#SBATCH --mem=16G                       # 16GB to manage process
#SBATCH --error=nextflow.err            # standard error file
#SBATCH --output=nextflow.out           # standard output file
#SBATCH --job-name=nf-core-rnaseq       # job name
#SBATCH --account=<your account>        # account name
#SBATCH --partition=<your partition>    # partition name were this job will run
#SBATCH --qos=<your QoS>                # quality of service (if any)
nextflow run nf-core/rnaseq -r 3.12.0 -profile "singularity,..." \
  -resume -config custom.config -params-file rnaseq-nf-params.json

Next you will require to configure nextflow to not working interactively and limiting some resources. For example you may require to disable the ansi-log, since you are not working interactively and all your standard output will be redirected to a file. You can do this by setting the NXF_ANSI_LOG environment variable to false:

export NXF_ANSI_LOG='false'

Take a look at environment-variables and Configuring nextflow sections of this guide to see all the environment variables you can set in order to configure your nextflow execution.

Terminating nextflow execution

If you need to terminate a nextflow execution, you can send a SIGTERM signal for example with Ctrl+C. This will terminate all running processes and will turn off the pipeline execution removing the temporary lock files. If you require to terminate a running process which nextflow can’t terminate, you will need to terminate such process manually, for example using scancel on a SLURM environment or by killing such process if you are running nextflow with a local executor.

Running nextflow offline

Nextflow can operate in environments without internet access by preparing all necessary resources in advance. This includes the pipeline code, software dependencies, reference genomes, and any required data.

You will require to download all necessary resources on a system with internet access, and then transfer these resources to the offline system using available methods. Moreover, you will need some extra steps in order to manage workflow properly. To get more information on how to run nextflow offline, see the Running offline nextflow documentation.

Set environment variables

You require to set the NXF_OFFLINE environment variable in order to run nextflow offline:

export NXF_OFFLINE='true'

This will tell nextflow to run in offline mode, disabling all attempts to download resources from the internet: this include test files, institutional configuration, software dependencies, reference genomes and plugins. However, all those resources must be available when running nextflow. You can find more information on Environment variables section of this guide and in the official nextflow Environment variables documentation.

Download the pipeline and its dependencies

You can download a pipeline and its dependencies using the nf-core tools utility, using nf-core pipelines download command. For example, to download the rnaseq pipeline and its dependencies you can use:

nf-core pipelines download nf-core/rnaseq

The utility will ask you if you want to download the singularity container images with the pipeline (usually yes) and if you want to copy singularity images into the pipeline download folder or if you want to amend the singularity images in the $NXF_SINGULARITY_CACHEDIR folder: the latter should be choose if you are downloading the container images in a shared folder that can be used during nextflow execution (ie. you are in a login node in HPC infrastructure with internet access, while in the computing nodes there’s no internet access). Otherwise, you will require to copy all the downloaded files in your final HPC infrastructure and putting container images where can be find during execution (usually at $NXF_SINGULARITY_CACHEDIR location). We have a section in this guide about setting up nf-core tools.

Clone institutional configuration files

The institutional configuration files should be cloned locally in order to be used by the pipelines when running nextflow in offline mode. Simple clone the repository in a local directory:

git clone https://github.com/nf-core/configs.git

Usually, pipelines have statements which disable the use of institutional configurations when running offline. For example, in the nf-core/rnaseq pipeline, you can find those statements in the nextflow.config file:

// Load nf-core custom profiles from different Institutions
includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"

// Load nf-core/rnaseq custom profiles from different institutions.
includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/rnaseq.config" : "/dev/null"

Those include statements are completely ignored when NXF_OFFLINE is set to true. In order to use institutional configuration files when running offline, you should provide the path of these files with the -c or -config option with the path of the full institutional configuration folder using the --custom_config_base option, for example:

export CUSTOM_CONFIG_BASE=<path/to/institutional/configs>

nextflow run nf-core/rnaseq -r 3.12.0 \
  --custom_config_base ${CUSTOM_CONFIG_BASE} \
  -config ${CUSTOM_CONFIG_BASE}/nfcore_custom.config \
  -config ${CUSTOM_CONFIG_BASE}/pipeline/rnaseq.config \
  -profile <institution> -resume -params-file <params-file>

This solution is pretty verbose, but it lets you to specify the desired profile using the same syntax used when running nextflow with internet access.

Warning

Not all the pipelines have the pipeline specific configuration file, like rnaseq.config in the previous example. Please check if this file exists in the pipeline repository before using it.

Tip

You can also download a copy of institutional configuration files from using --download-configuration yes with nf-core pipelines download command. See the download a pipeline with nf-core section of this guideline.

Hint

At cnr-ibba we have a forked version of the nf-core/configs repository with custom options and profiles, which is available at https://github.com/cnr-ibba/nf-configs/.

Install nextflow plugins

Nextflow plugins are required to run some pipelines but are downloaded and installed when running the pipeline for the first time. Before running nextflow offline, you can install them using the nextflow plugin install command, for example:

nextflow plugin install nf-schema@2.3.0

This will install the nf-schema version 2.3.0 plugin in the nextflow environment. You will required to inspect the pipeline nextflow.config file to see which plugins are required by the pipeline and install them individually. If the version of the plugin is not specified in the pipeline configuration file, you can pin it in a custom configuration file, for example:

plugins {
  id 'nf-schema@2.3.0'
}

This applies in an environment where you have internet access when installing nextflow (for example a login node in a HPC environment). If you don’t have any internet connection in your environment, you should copy the ${HOME}/.nextflow/plugins folder in your offline environment from a working environment.

Download reference genomes and other files

You should download manually all the reference genomes and other files required by the pipeline. If you plan to call the pipeline with the test profile, you need to ensure that all the required files are present locally. Mind to the samplesheet.csv of the test profile, which is a mandatory input in most of the community pipelines: usually it refers to file available on the internet, so you should download them locally and modify the samplesheet.csv file accordingly. Then you should pass the modified samplesheet.csv file to the pipeline using the proper CLI parameter or using a JSON file with the -params-file option.