Getting started =============== .. contents:: Table of Contents Introduction ------------ Nextflow is a bioinformatics workflow manager in which we can develop or reuse bioinformatics pipeline analyses. It manages and supports many execution platforms like local, HPC, cloud and so one. The aim is to develop a pipeline which can work locally on your laptop and eventually in HPC environments or any other resource that could scale with your data. This could be achieved by writing pipelines in `Nextflow scripting `_ language (or `DSL2 `_) and managing your software requirements with :doc:`conda <../general/conda>`, :doc:`singularity <../general/singularity>` or :doc:`docker <../general/docker>`. .. _learning-nextflow: Learning Nextflow ~~~~~~~~~~~~~~~~~ There are online a series of resources and tutorial about nextflow. The first is `this youtube playlist `_ (here are `the tutorial notes `_ to code along and a `git repository `_ adapted to work in a local environment). Next there is the nextflow `reference documentation `_, which explains things in details. The community builded pipelines can be found in the `pipeline section `_ of `nf-core `_ community site, while DSL2 pipeline modules can be found in `nf-core/modules `_ github repository. Installing Nextflow ------------------- The official nextflow installation page is located at ``_. In order to install nextflow in your local environment ensure you have java installed: .. code-block:: bash java -version .. warning:: You don't need a full release of java, the *openjdk* version is enough. Starting from nextflow ``24.10.3`` the support from java lower than 11 has been dropped, even for the `vscode nextflow extensions `_. If you don't have java installed, you can install as a user using `SdkMan `_. See `nextflow installation requirements `_ for more information. Next you could download nextflow in your local directory and make it executable with: .. code-block:: bash curl -s https://get.nextflow.io | bash chmod +x nextflow If you install nextflow in a directory inside your ``$PATH`` environment, you can avoid to specify the relative or the full path when calling nextflow. .. hint:: Nextflow is already installed in our shared **core** environment, and can be called like a command since is available via ``$PATH`` environment variable Finally verify your installation with: .. code-block:: bash mkdir nf-hello cd nf-hello nextflow run hello .. _install-nf-core: Install nf-core/tools ~~~~~~~~~~~~~~~~~~~~~ `nf-core/tools `_ is a python package which integrates nextflow and is an helper tools for the nextflow community. Using ``nf-core`` software you could manage nextflow pipelines and modules. You can install ``nf-core`` in `many ways `_, but the recommended way is using pip: .. code-block:: bash pip install nf-core nf-core --help To update the package you can use: .. code-block:: bash pip install nf-core --upgrade .. note:: You could install ``nf-core`` in a conda environment. Even if there's a ``nf-core`` conda package, is better to install the **pypi** package version since it is the most update release and avoid some dependency issues with **bioconda**: .. code-block:: bash conda create --name nf-core pip conda activate nf-core pip install nf-core However, it's better to do this in a fresh conda environment used only for nextflow. Please see our consideration :ref:`on channels `. .. tip:: You can add autocompletion for ``nf-core`` within a conda environment. Simply add the activation instruction in ``eval "$(_NF_CORE_COMPLETE=bash_source nf-core)"`` in your ``$CONDA_PREFIX/etc/conda/activate.d/env_vars.sh``, and the deactivation instruction ``complete -r nf-core`` in your ``$CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh``. Test the ``nf-core`` autocompletion with: .. code-block:: bash complete -p nf-core See :ref:`Setting environment variables ` for more information. .. hint:: ``nf-core`` with autocompletion is already installed in a ``nf-core`` environment in our shared **core** *VM*: .. code-block:: bash conda activate nf-core nf-core --help .. _configuring_nextflow: Configuring nextflow -------------------- Nextflow can be customized in different ways: there are configuration files, which can be used to customize a single pipeline execution, and environment variables, which can be used to customize the nextflow runtime and the underlying Java virtual machine. Those configuration files can be stored in multiple location, for example in your home directory, in the pipeline directory and in the directory where you are running the pipeline. The configuration files are loaded in a specific order, and the last loaded configuration file will override the previous ones: the lowest priority configuration file is the one in the home directory, while the highest priority configuration files are in the directory where you are running the pipeline. This means that you can have a default configuration file in ``$HOME/.nextflow/config`` and a pipeline specific configuration file in the pipeline directory, and the latter will override the former. More information on configuration files can be found in the `Configuration file `_ section of nextflow documentation. Default configuration file ~~~~~~~~~~~~~~~~~~~~~~~~~~ The default configuration has the lowest priority and can be used to define option can be applied to all your pipelines. This file is located in ``$HOME/.nextflow/config`` and can be used for example for limiting resources usage:: executor { name = 'slurm' queueSize = 50 submitRateLimit = '10 sec' } In this way is possible to setup a default configuration for all your pipelines, by limiting the job submission in order to avoid to overload the cluster scheduler. There are some tips for HPC users, please take a look at nextflow forum for `5 Nextflow Tips for HPC Users `_ and `Five more tips for Nextflow user on HPC `_ articles. .. _environment-variables: Environment variables ~~~~~~~~~~~~~~~~~~~~~ .. _set-nxf-singularity-cache: Setting ``NXF_SINGULARITY_CACHEDIR`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using nextflow with singularity lets you to define a directory where remote Singularity images are stored. This could speed up **a lot** pipelines execution times, since images are downloaded once and then used when needed. You can define the location of such directory by setting the ``NXF_SINGULARITY_CACHEDIR`` environment variable. Nextflow will create such directory for you and will place every singularity downloaded image inside this directory .. hint:: ``NXF_SINGULARITY_CACHEDIR`` is already defined for every user in our shared **core** infrastructure, and points by default at your ``${HOME}/nxf_singularity_cache/`` directory. If you want to change this value (for example, by setting a shared cache folder), you have to define such variable in your ``$HOME/.profile`` configuration file, for example:: # override nextflow singularity cache dir export NXF_SINGULARITY_CACHEDIR=/home/core/nxf_singularity_cache/ You can also define the ``SINGULARITY_CACHEDIR`` environment variable, which will be used by singularity itself to cache layers and temporary files: this could help in managing singularity cache in a more efficient way. See :ref:`Set SINGULARITY_CACHEDIR ` for more information. .. warning:: When using a computing cluster it must be a shared folder accessible from all computing nodes. .. _nextflow_environment_variables: Other nextflow environment variables ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are others environment variables which could be useful to set in order to customize your nextflow experience. You could find a list of them in the `Environment variables `_ nextflow documentation. Here are a selection of them: .. list-table:: Nextflow environment variables :header-rows: 1 :widths: 25 50 25 * - Name - Description - Example * - NXF_EXECUTOR - Defines the default process executor - ``slurm`` * - NXF_OPTS - | Provides extra options for the Java and Nextflow runtime. | It must be a blank separated list of ``-Dkey[=value]`` properties - ``-Xms500M -Xmx4G`` * - NXF_SINGULARITY_CACHEDIR - | Directory where remote Singularity images are stored. | When using a computing cluster it must be a shared | folder accessible from all compute nodes. - ``$WORK/nxf_singularity_cache`` * - NXF_WORK - | Directory where working files are stored | (usually your scratch directory) - ``"$CINECA_SCRATCH/nxf_work"`` * - NXF_OFFLINE - | When true disables the project automatic download and | update from remote repositories (default: ``false``). - ``true`` * - NXF_ANSI_LOG - | Enables/disables ANSI console output | (default ``true`` when ANSI terminal is detected). - ``false`` Those environment variables could be set in your ``$HOME/.profile`` (Debian) or ``$HOME/.bash_profile`` (Red-Hat) configuration files, for example: .. code-block:: bash # Nextflow custom environment variables export NXF_EXECUTOR=slurm export NXF_OPTS="-Xms500M -Xmx4G" export NXF_SINGULARITY_CACHEDIR="$WORK/nxf_singularity_cache" export NXF_WORK="$CINECA_SCRATCH/nxf_work" export NXF_OFFLINE='true' export NXF_ANSI_LOG='false' .. _nextflow-private-repo: Access to private repositories ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The file ``$HOME/.nextflow/scm`` can store the configuration required to access to private repository in GitHub, for example:: providers { github { user = '' password = '' } } You could find more information in `Git configuration `_ section of nextflow documentation and in `Configure Git private repositories with Nextflow `_ blog post. Access to private nextflow modules ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. warning:: This section is quite old and could be outdated. Please check if the following information are still valid. In order to get access to the private `nextflow-modules `_, you need to configure `GitHub CLI `_ in order to create the ``~/.config/gh/hosts.yml`` file, which is a fundamental requisite in order to deal with private modules with ``nf-core modules``. The easiest way to create this configuration is through *GitHub CLI*:: gh auth login See the documentation on `gh auth login `_ to have more information