Running Nextflow ================ .. contents:: Table of Contents A note on containers -------------------- Despite nextflow could be run using :doc:`conda <../general/conda>`, :doc:`singularity <../general/singularity>`, :doc:`docker <../general/docker>` or other container runtimes, the recommended container application to use is **singularity**: this solution in fact manages all software dependencies in a unique file and could be cached and reused in order to speed up the calculation process (see :ref:`set-nxf-singularity-cache` for more information). You can have more information about singularity in the :ref:`singularity ` section of this guidelines. You can select the type of container runtime to use with the ``-profile`` option, for example: .. code-block:: bash nextflow run nf-core/rnaseq -profile test,singularity -resume .. warning:: Downloading software dependencies could take a lot of time and could be subject to networking errors, which are not related to pipelines or data but can slow or broke pipeline execution. In such way, it's better to configure **caches** when downloading softwares: singularity cache could be configured in `singularity scope `_ or better using ``$NXF_SINGULARITY_CACHEDIR``. See :ref:`Setting NXF_SINGULARITY_CACHEDIR ` for more information Nextflow parameters and pipeline parameters ------------------------------------------- There are two types of parameters you can pass to nextflow: *nextflow parameters* and *pipeline parameters*. Nextflow parameters are related to nextflow itself, like ``-resume`` or ``-log``. Pipeline parameters are related to the pipeline you are running, like ``--input`` or ``--output``. In general, nextflow parameters have only one ``-`` before the parameter name, while pipeline parameters have two ``--``. To get a full list of available options, you can call nextflow with ``-h`` parameter or without any parameter:: $ nextflow -h While to have a list of parameters for a specific pipeline, you can call the pipeline with ``--help`` option, for example:: $ nextflow run nf-core/rnaseq --help Another important aspect if that pipeline parameters can be written in a json file and provided to nextflow with the ``-params-file`` option. This is useful when you have a lot of parameters to provide to the pipeline, or when you want to save a configuration for later use. For example, to provide a json file with parameters to the pipeline, you can do:: $ nextflow run nf-core/rnaseq -params-file params.json where ``params.json`` is a json file with the following content:: { "input": "samplesheet.csv", "fasta": "path/to/genome.fasta" } Nextflow parameters and pipeline parameters are not the only way to customize a pipeline: nextflow allows to define custom configuration files in which you can customize other aspects of the pipeline, like the number of CPUs to use, the memory to allocate, environment variables and also settings specific to the running environment in which the pipeline is called. For more information, see the `Configuration file `_ section of the nextflow documentation. See also :ref:`Configuring a pipeline ` section of this guidelines for more information. To get more information on CLI and pipeline options, please see `Command line `_, and both `CLI reference `_ and `Pipeline parameters `_ from nextflow documentation. Execute a community pipeline ---------------------------- Nextflow lets to build and share bioinformatics pipelines across the community. The simples way to use nextflow is to identify the pipeline you need, check for its requirements and then launch it using your data. Since all the nextflow community pipelines are public, you could download and modify them according your needs. Search for a community pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Community pipelines are available at `nf-core pipelines `_ site: you could search a pipeline and browse its documentation in the `nf-core website `_. For example, by searching for ``rnaseq`` you could reach the `rnaseq pipeline `_ page project and get documentation on its usage by clicking on `Usage `_ tab. You can download a pipeline using ``nextflow pull`` followed by the pipeline like ``/``, for example: .. code-block:: bash nextflow pull nf-core/rnaseq This will download a copy of the pipeline in a nextflow *cache* folder, which usually is ``$HOME/.nextflow/assets``: the pipeline will be placed in a subfolder for the organization and pipeline name (in this case ``nf-core/rnaseq``). The containers files required to execute the pipeline will be downloaded when the pipeline is executed for the first time: please check for internet connection during pipeline execution: if it not possible to download the container, there's the possibility to run nextflow offline. Please see :ref:`Running nextflow offline ` of this documentation and the official `Running offline `_ nextflow documentation for more information. .. hint:: The organization name is the GitHub organization which hosts the pipeline, like `nf-core GitHub `_ or `cnr-ibba `_, while the pipeline name is the name of the GitHub repository which contains the pipeline. You could derive the pipeline name by removing ``https://github.com/`` from the repository URL. For example, from ``_ you can derive the pipeline named ``nf-core/rnaseq``. .. tip:: You can get a list of available `nf-core pipelines `_ using `nf-core/tools `_ with ``nf-core pipelines list`` command. You can also add a pattern to search for a specific pipeline, for example:: nf-core pipelines list rna to get a list of pipelines related to RNA analysis. In order to download the pipeline, the softwares, and testing all in your local environment (which is recommended to see that all the stuff works as intended, see :ref:`run-a-pipeline-with-test-data`) you can call directly the nextflow pipeline on *test data*, for example for the *rnaseq* pipeline: .. code-block:: bash mkdir nf-rnaseq cd nf-rnaseq nextflow run nf-core/rnaseq -profile test,singularity -resume .. hint:: Calling ``nextflow run`` with a remote pipeline will place the ``work`` and ``results`` directories in the current working directory, with some other hidden files useful for logging the pipeline execution in the current directory. For such reason, it's better to create an empty project directory in which calling ``nextflow run`` or create a new directory for the project in which you plan to run the pipeline. .. tip:: The community pipelines have a ``--help`` option to show all supported parameters. try: .. code-block:: bash nextflow run nf-core/rnaseq --help To get a full list of the available options .. warning:: It is possible that the nextflow version required by the pipeline is different from your nextflow version installed and you couldn't execute the pipeline. Please see :ref:`this section ` of nextflow troubleshooting. When calling nextflow using a community pipeline like ``nextflow run nf-core/rnaseq``, nextflow will download the latest pipeline version, and will place a local copy of the pipeline in your ``$HOME/.nextflow/assets`` folder. This local copy of the pipeline is called whenever you will call ``nextflow run`` using the same pipeline. If you need a particular version or branch of such pipeline, you can indicate such requirement with the ``-r`` option, for example:: $ nextflow pull nf-core/rnaseq -r 3.0 .. warning:: Whenever you pull a pipeline version different from the latest, you **MUST** declare the same version or branch when calling nextflow, for example:: $ nextflow run nf-core/rnaseq -r 3.0 --help If you need to update your local pipeline to latest version see the :ref:`Update a pipeline ` section. .. _manage-community-pipelines: Manage community pipelines with ``nf-core`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Search for a pipeline ^^^^^^^^^^^^^^^^^^^^^ Whenever you run a community pipeline, nextflow will download and cache it (in your ``$HOME/.nextflow/assets/`` folder). You could check your installed community pipelines with:: nextflow list You can list all the available ``nf-core`` pipelines with:: nf-core pipelines list You could search for a specific pipeline by providing a name as an argument:: nf-core pipelines list rna Download a pipeline ^^^^^^^^^^^^^^^^^^^ .. _nf-core-pipelines-download: You can download a pipeline with its container dependencies. This will be helpful when running nextflow in an environment without internet connection:: nf-core pipelines download nf-core/rnaseq -r 3.12.0 this command let the possibility to amend singularity images in your ``$NXF_SINGULARITY_CACHEDIR``, which means that images will not be placed in the archive but in your local ``$NXF_SINGULARITY_CACHEDIR`` folder if missing. .. hint:: using the option ``--download-configuration yes`` you can download also the institutional configuration file for *offline* usage. This is useful when you need to run a pipeline in an environment without internet connection. For more information see :ref:`institutional-configuration-files` and :ref:`running-nextflow-offline`. Run a pipeline ^^^^^^^^^^^^^^ The most interesting thing is the possibility to configure params interactively with:: $ nf-core pipelines launch rnaseq This command will download the pipeline in the ``assets`` folder and then will open a web browser or a CLI interactive session to let you configure the pipeline parameters interactively. You can also save the configuration in a file and use it later with the nextflow ``-params-file`` option. See :ref:`Install nf-core/tools ` to get ``nf-core/tools`` software installed .. tip:: nextflow creates a lot of file in the current working directory. It's better to create a custom directory in which nextflow can be called Execute a shared custom pipeline -------------------------------- Nextflow is able to manage pipelines outside the scope of the **nf-core** team, if they are shared in public repositories. For example, to execute a pipeline available on GitHub, call nextflow with ```` like the following example:: nextflow run cnr-ibba/nf-resequencing-mem -resume -profile singularity \ --input --genome_fasta where `cnr-ibba/nf-resequencing-mem `_ is the repository which contains the nextflow pipeline. .. tip:: You can configure nextflow to store your GitHub access credentials, see :ref:`Access to private repositories ` of this guidelines Nextflow best-practices ----------------------- Here are some tips that could be useful while running nextflow. .. _run-a-pipeline-with-test-data: Run a pipeline with test data ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When you run a pipeline for the first time, it's better to use test data in order to check if the pipeline is working as expected. All the community pipelines have a ``-profile test`` option which will download a small dataset and run the pipeline on it. For example, to run the ``nf-core/rnaseq`` pipeline with test data, you can do:: nextflow run nf-core/rnaseq -profile test,singularity -resume This will also download the required dependencies (like the singularity images). Next time you will run the pipeline, nextflow will use the cached images and will not download them again. Getting information from logs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By calling ``nextflow log`` you can get information on your last nextflow runs, which includes timestamp, duration, status, *run name* and the command used when the pipeline was called:: $ nextflow log TIMESTAMP DURATION RUN NAME STATUS REVISION ID SESSION ID COMMAND 2021-10-27 12:40:32 54.8s serene_engelbart OK c44b10f3aa 598f0939-a7b0-497f-a16f-b2431a7e5ee3 nextflow run . -profile test,docker 2021-10-27 12:49:05 43.6s evil_ride OK c44b10f3aa a70a75e2-61fc-4407-aba4-19ac33f31774 nextflow run . -profile test,docker ``RUN NAME`` is an arbitrary name assigned to your pipeline. By calling ``nextflow log`` again and providing such name you can retrieve more information on single execution steps:: $ nextflow log serene_engelbart /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/5d/6ff357b9b679198557bf22d24adf1e /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/ff/dd919f582e8583a16aecc58f6cc093 /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/74/944e234214bcca20209637a94c0ac2 /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/31/b075adb744673b9cc8fb214729c455 By defaults ``nextflow log `` will return only the working directory, to get more informative results you need to specify some columns using ``-f`` parameter, for example:: $ nextflow log serene_engelbart -f 'process,status,exit,hash,duration,workdir' NFCORE_RESEQUENCING:RESEQUENCING:INPUT_CHECK:SAMPLESHEET_CHECK COMPLETED 0 5d/6ff357 1.8s /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/5d/6ff357b9b679198557bf22d24adf1e NFCORE_RESEQUENCING:RESEQUENCING:FASTQC COMPLETED 0 ff/dd919f 7.2s /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/ff/dd919f582e8583a16aecc58f6cc093 NFCORE_RESEQUENCING:RESEQUENCING:FASTQC COMPLETED 0 74/944e23 5.2s /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/74/944e234214bcca20209637a94c0ac2 NFCORE_RESEQUENCING:RESEQUENCING:FASTQC COMPLETED 0 31/b075ad 7.2s /home/paolo/Projects/NEXTFLOWetude/nf-core-resequencing/work/31/b075adb744673b9cc8fb214729c455 Call ``nextflow log -l`` to have a full list available columns. Resume calculations ~~~~~~~~~~~~~~~~~~~ Nextflow, by default, executes every calculation in a subfolder inside the ``work`` directory in your current working directory. Every steps is executed in separate subfolders and nextflow will take care about *inputs* and *outputs* among related steps. It is frequent to call nextflow multiple times, for example while modifying a pipeline or while tuning parameters or solving issues. In such way, you can save a lot of spaces (and calculation times) by *resuming* a pipeline (aka. don't run job completed with success). To achieve this, is important to add the ``-resume`` option while calling nextflow:: $ nextflow run -resume .. note:: nextflow parameters have only one ``-`` before parameter names. Pipeline parameters will always have ``--`` in front of them. Nextflow commands, like ``run, info, log, ...`` don't have any ``-`` in front of them Cleanup ~~~~~~~ After a pipeline is completed with success, it's better to clean up ``work`` directory in order to save space. All the desired outputs **need to be saved outside** this folder, in order to safely remove temporary data. There's a nextflow `clean `_ option which safely remove temporary files and nextflow logs. You can have information on nextflow runs by calling ``nextflow log`` inside your project folder:: $ nextflow log TIMESTAMP DURATION RUN NAME STATUS REVISION ID SESSION ID COMMAND 2021-01-14 18:31:18 34m 17s magical_roentgen OK 3643a94411 fa1714cf-1dbf-45ec-9910-9dcb27aab52b nextflow run nf-core/rnaseq -profile test,singularity -resume --max_cpus=24 2021-01-15 15:38:02 - magical_rosalind - 3643a94411 fa1714cf-1dbf-45ec-9910-9dcb27aab52b nextflow run nf-core/rnaseq -profile test,singularity -resume --max_cpus=24 Then you could remove a specific run using name, for example:: $ nextflow clean magical_roentgen -f See `nextflow clean `_ documentation for more info. .. note:: When calling log, you can inspect the command line used to execute the pipeline. You could also get information about execution times. For more information, take a look at `nextflow log `_ documentation. .. hint:: Despite singularity will write images in ``$NXF_SINGULARITY_CACHEDIR``, there are also cache files stored inside your ``$HOME/.singularity/cache`` directory. Free some space with:: $ singularity cache clean The previous command will not affect your downloaded singularity images in ``$NXF_SINGULARITY_CACHEDIR`` folder. If you want to remove them, you have to do it manually. See :ref:`Clean up Singularity ` section of this guidelines for more information. .. warning:: calling ``nextflow clean -f`` without *sessionid*, or *run name* will only remove temporary files from the last nextflow run, without removing files from other previous sessions. If you want to remove **ALL** your nextflow cache directories with a single command, you can do:: $ nextflow clean $(nextflow log -q) -f where ``nextflow log -q`` simply returns only *run name* for all your nextflow run in your working folder. Update a pipeline ~~~~~~~~~~~~~~~~~ .. _update-a-pipelines: If you manage community pipeline using ``nextflow`` or ``nf-core`` software (not using ``git``), you can have information on outdated pipelines with ``nf-core pipelines list`` command:: $ nf-core pipelines list ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Pipeline Name ┃ Stars ┃ Latest Release ┃ Released ┃ Last Pulled ┃ Have latest release? ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ rnaseq │ 323 │ 3.1 │ 2 weeks ago │ 2 hours ago │ Yes (v3.1) │ │ methylseq │ 66 │ 1.6.1 │ 3 weeks ago │ 4 months ago │ No (v1.5) │ In this example, we can see that the ``rnaseq`` pipeline is just updated, while ``methylseq`` is quite old and need to be updated. .. hint:: You can search for as specific pipeline with ``nf-core pipelines list ``, for example:: $ nf-core pipelines list rnaseq .. note:: When you manage pipelines using nextflow software, pipelines are locally downloaded in your ``$HOME/.nextflow/assets/`` (see :ref:`Manage community pipelines with nf-core`): the information you see reflect the updates of the community pipelines compared to your local assets. In order to update a community pipeline, you need to call ``nextflow pull``, for example:: $ nextflow pull nf-core/rnaseq this will update your local assets by downloading the latest default revision of the pipeline. If you need a specific version (or branch), you need to specify it with ``-r`` option:: $ nextflow pull nf-core/rnaseq -r 3.0 .. tip:: You can get a list of available revision and version with:: $ nextflow info nf-core/rnaseq This is related to the local copy of the pipeline in your assets folder, make sure to do this after a ``nextflow pull`` command to collect the latest information. .. hint:: the same considerations apply with custom shared pipelines, for example:: $ nextflow pull cnr-ibba/nf-resequencing-mem -r issue-1 .. warning:: if you download a specific version with ``nextflow pull``, you have to specify it when you call ``nextflow run`` with the same ``-r`` option. This is required if you need to run your analyses with an old pipeline version, or if your ``nextflow`` executable doesn't support the latest pipeline version. Delete the local copy of a pipeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to remove a local copy of a pipeline (a pipeline installed in your cache using ``nextflow pull`` or ``nextflow run``), simply type:: $ nextflow drop where ```` is a single row returned ``nextflow list`` (*github organization/pipeline name*)