Conda
=====
.. contents:: Table of Contents
About Conda
-----------
Conda is a software which allows you to manage software installations in distinct
environments. It was born to support the python ecosystem, however most softwares
has been supported by conda, for example
`R with Anaconda `_
and its packages, and there are
channels like `bioconda`_, which collect and maintain a lot of useful softwares.
The main advantage in using conda environments is that packages could be installed
directly with their dependencies, without the needing to compile everything. Moreover
conda and its environments can be installed by an user without administrative privileges.
Packages and dependencies are installed inside user directories, and a complete
uninstallation can be done by erasing the conda installation folder.
From the `conda official docs `_:
.. epigraph::
Conda is an open source package management system and environment management
system that runs on Windows, macOS and Linux. Conda quickly installs, runs and
updates packages and their dependencies. Conda easily creates, saves, loads
and switches between environments on your local computer. It was created for
Python programs, but it can package and distribute software for any language.
Installing Conda
----------------
Is Conda already installed?
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Conda isn't installed by default on your system. However on a shared resource or
a remote machine could be already installed by the system administrator. Try to
understand if conda is installed using ``which``, for example::
(base) cozzip@cloud1:~$ which conda
/usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/bin/conda
(base) cozzip@cloud1:~$ which python
/usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/bin/python
In such case, conda is installed and currently active (The ``(base)`` near username
in the bash prompt, is the environment name currently active in the terminal)
.. hint::
Conda is already installed and initialized in our shared **core** environment.
When you log in you should see the ``(base)`` default environment activated.
This installation let you use the provided environments managed by the system
administrator, and to define your local environments in your ``$HOME`` folder.
Should I install Conda or Miniconda?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Conda is installed with a a lot of dependencies, like spyder editor, jupyter notebook
and many other packages. Miniconda is a lighter version of anaconda, which installs
only the minimal packages required to work correctly with conda. In general, you could
decide to install the whole Conda in a local installation, since in your personal computer
you could exploit the benefit of the editors and the graphical user interfaces.
When working on a remote server, using Miniconda is recommended since you have the
full control on what is installed and generally you don't need starting graphical
interfaces on a remote servers. If you are in doubt, please see the
`Anaconda or Miniconda`_ section of conda installation guide.
.. _`Anaconda or Miniconda`: https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda
Download and install Conda
~~~~~~~~~~~~~~~~~~~~~~~~~~
You could install `Anaconda `_
or `miniconda `_ respectively.
Then follow the installation instructions provided by Anaconda or miniconda.
Managing environments with conda
--------------------------------
Choose an environment
~~~~~~~~~~~~~~~~~~~~~
You can explore the conda environment available with::
$ conda env list
# conda environments:
#
R-4.3 /home/cozzip/.conda/envs/R-4.3
base * /usr/local/Miniconda3-py38_4.8.3-Linux-x86_64
nf-core /usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/envs/nf-core
The environment with ``*`` is the current active environment. Is the same you see
in the bash prompt.
.. hint::
``conda env list`` is different from ``conda list`` which tells you which
packages are installed in your current environment.
You could enable a conda environment using ``conda activate``, for example::
$ conda activate R-4.3
You should see that the environment name near the bash prompt changed to the desired
environment. In order to exit the current environment (and return to your previous
environment), you have to deactivate with::
$ conda deactivate
Create a new environment
~~~~~~~~~~~~~~~~~~~~~~~~
You can create a new environment by specify the environment name using ``--name``
option. You could also specify which package to install when creating an environment::
conda create --name [package1] [package2]
See `Managing environment `_
in conda documentation for more information
.. hint::
You can save time by specifying package version (ex. ``python=3.8``): conda will
have less dependencies to evaluate
A note on channels
^^^^^^^^^^^^^^^^^^
.. _a-note-on-channels:
Channels are repository where conda store packages. The ``default`` contains packages
maintained by conda developers. There are others channels like `bioconda `_,
which contains a lot of bioinformatics packages, `R channel `_,
which store *R* and its packages, `conda-forge `_, which
contains community packages, often more updated that the official channels. If you
search or want to install a package in a different channel than the ``default``, you
have to specify with the ``--channel`` option::
$ conda search --channel R r-base=4.3
$ conda create --channel R --name R-4.3 r-base=4.3
You can find more information on `Managing channels `_
in conda documentation.
.. warning::
different channels could have different dependencies: for example could be difficult
install both ``rstudio`` package from ``R`` channel and ``R-base=4.0`` from ``conda-forge``.
Moreover channels like ``conda-forge`` could have more updates than the default
one, and could be difficult install or updating packages in those channels. Instead
of installing our your requirements in a single environment, you should install
software in dedicated environments, and use custom channels only if its necessary.
Export a conda environment
~~~~~~~~~~~~~~~~~~~~~~~~~~
You could export conda environment in a file. First, you have to activate the environment
that you want to import, for example::
$ conda activate R-4.3
$ conda env export > R-4.3.yml
.. hint::
When you export an environment with conda, yon don't simply export information
to re-build your environment relying on package version, but you also track information
about the **package build version**, in order to be able to download the same file
required to install a particular library.
Sometimes is difficult to be able to re-create an exported environment, for example
if you use packages in ``conda-forge`` channel: packages could be updated very
often and maybe it is not possible to retrieve the same package file you used
during environment import. For such cases, its better to export a conda
environment without **build specifications**, like this::
$ conda env export --no-builds > R-4.3.yml
This will track all your package version without the file hash stored in conda
channels. This require more time when restoring an environment, however you will
be able to restore an environment after years even if you require some non-standard
channels
Import a conda environment
~~~~~~~~~~~~~~~~~~~~~~~~~~
You could create a new environment relying on the exported file, for example on
a different machine::
$ conda env create -f R-4.3.yml
Conda-pack
~~~~~~~~~~
Conda-pack is a tool which allows you to pack a conda environment in a single
file. This file can be moved to a different machine and unpacked in a different
location. This is useful when you want to move a conda environment to a different
machine without internet connection. You can install conda-pack with::
$ conda install conda-pack
Then you can pack an environment with::
$ conda pack -n R-4.3 -o R-4.3.tar.gz
.. hint::
``conda-pack`` is already installed in our shared **core** environment using
the default ``base`` conda environment
.. warning::
``conda-pack`` will made a copy of all dependencies of your environment, thus
the resulting file could be very large. You will make not use of conda packages
caches, consider to use ``conda-pack`` only when is impossible to make an
environment using the standard conda commands.
You can unpack the environment in a different location with::
$ mkdir R-4.3
$ cd R-4.3
$ tar -xzf ../R-4.3.tar.gz
$ source bin/activate
.. hint::
If you unpack the environment in the conda environment folder (ie. ``$HOME.conda/envs``),
you can activate the environment without specifying the full path (using the
standard *conda activate* command, like ``conda activate R-4.3``), since conda
will search for environments in the default location. Remember that you have to
create the destination path, since the archive will not create it for you.
Remove an environment
~~~~~~~~~~~~~~~~~~~~~
You can remove an environment by specifying its *name*: this environment shouldn't
be active when removing::
$ conda env remove --name R-4.3
Conda best practices
--------------------
Specify package version if possible
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Specifying package version could save a lot of time, for example when you need
to resolve dependencies with channels::
$ conda create --channel conda-forge --channel R --name R-4.3 r-base=4.3
Clean up
~~~~~~~~
Conda will download and save packages in a local cache when installing or updating packages.
You can save some time when you install a cached package, however this can consume
a lot of disk space. You can free conda cache with::
$ conda clean --all
See `conda clean `_
for more options.
Setting environment variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _conda_environment_variables:
In order to define specific environment variables in a conda environment, you
can use the `config API `_
or create specific `environment files `_
where variables are changed and restored respectively by activating and deactivating
the conda environment. The *config API* is the recommended and the easiest way
to define environment variables. In this example we will add a specific *JAVA library*
path to ``LD_LIBRARY_PATH``: first locate the directory with the *shared library*
to include, then call ``conda env config vars set`` to define and store the environment
variable. For the *JAVA* version we want to include, this library is located in
``$(JAVA_HOME)/lib/server``, where ``JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64``,
so::
$ cd /usr/lib/jvm/java-11-openjdk-amd64/lib/server
$ conda env config vars set LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
After doing this, the conda environment should be *reactivated* (you could deactivate and
reactivate the same environment again) in order to get effects. You can inspect
the new environment variable by calling ``echo ``, for example::
$ echo $LD_LIBRARY_PATH
or get the full list of custom variables using::
$ conda env config vars list
Remember that when defining environment variables as collection of paths, the desired
path should be *prepended* to current paths, in order to retrieve the desired files
before the other positions. The current path should be updated and not replaced since it
could contains useful information.
.. warning::
It's a bad idea to set the ``$PATH`` environment variable using the *config API*,
since when disabling the conda environment, the ``$PATH`` will be unset, causing
your terminal not working correctly. If you need to add a path to ``$PATH``, you
need to manually edit the ``env_vars.sh`` files. Ensure to activate your desired
environment (in order to resolve the ``$CONDA_PREFIX`` environment variable) and
then:
.. code-block:: bash
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Next, edit the ``./etc/conda/activate.d/env_vars.sh`` file and modify the ``$PATH``
variable, for example:
.. code-block:: bash
#!/bin/sh
export PATH="/home/core/software/sratoolkit/bin:$PATH"
If you desire, you can restore the previous ``$PATH`` value by editing the
``./etc/conda/deactivate.d/env_vars.sh`` file:
.. code-block:: bash
#!/bin/sh
# remove a particular directory from $PATH (define a new $PATH without it)
# see: https://unix.stackexchange.com/a/496050
export PATH=$(echo $PATH | tr ":" "\n" | grep -v '/home/core/software/sratoolkit/bin' | xargs | tr ' ' ':')
See conda `Managing environments `_
for more information.