Conda

About Conda

Conda is a software which allows you to manage software installations in distinct environments. It was born to support the python ecosystem, however most softwares has been supported by conda, for example R with Anaconda and its packages, and there are channels like bioconda, which collect and maintain a lot of useful softwares. The main advantage in using conda environments is that packages could be installed directly with their dependencies, without the needing to compile everything. Moreover conda and its environments can be installed by an user without administrative privileges. Packages and dependencies are installed inside user directories, and a complete uninstallation can be done by erasing the conda installation folder. From the conda official docs:

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

Installing Conda

Is Conda already installed?

Conda isn’t installed by default on your system. However on a shared resource or a remote machine could be already installed by the system administrator. Try to understand if conda is installed using which, for example:

(base) cozzip@cloud1:~$ which conda
/usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/bin/conda
(base) cozzip@cloud1:~$ which python
/usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/bin/python

In such case, conda is installed and currently active (The (base) near username in the bash prompt, is the environment name currently active in the terminal)

Hint

Conda is already installed and initialized in our shared core environment. When you log in you should see the (base) default environment activated. This installation let you use the provided environments managed by the system administrator, and to define your local environments in your $HOME folder.

Should I install Conda or Miniconda?

Conda is installed with a a lot of dependencies, like spyder editor, jupyter notebook and many other packages. Miniconda is a lighter version of anaconda, which installs only the minimal packages required to work correctly with conda. In general, you could decide to install the whole Conda in a local installation, since in your personal computer you could exploit the benefit of the editors and the graphical user interfaces. When working on a remote server, using Miniconda is recommended since you have the full control on what is installed and generally you don’t need starting graphical interfaces on a remote servers. If you are in doubt, please see the Anaconda or Miniconda section of conda installation guide.

Download and install Conda

You could install Anaconda or miniconda respectively. Then follow the installation instructions provided by Anaconda or miniconda.

Managing environments with conda

Choose an environment

You can explore the conda environment available with:

$ conda env list
# conda environments:
#
R-4.3                    /home/cozzip/.conda/envs/R-4.3
base                  *  /usr/local/Miniconda3-py38_4.8.3-Linux-x86_64
nf-core                  /usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/envs/nf-core

The environment with * is the current active environment. Is the same you see in the bash prompt.

Hint

conda env list is different from conda list which tells you which packages are installed in your current environment.

You could enable a conda environment using conda activate, for example:

$ conda activate R-4.3

You should see that the environment name near the bash prompt changed to the desired environment. In order to exit the current environment (and return to your previous environment), you have to deactivate with:

$ conda deactivate

Create a new environment

You can create a new environment by specify the environment name using --name option. You could also specify which package to install when creating an environment:

conda create --name <env name> [package1] [package2]

See Managing environment in conda documentation for more information

Hint

You can save time by specifying package version (ex. python=3.8): conda will have less dependencies to evaluate

A note on channels

Channels are repository where conda store packages. The default contains packages maintained by conda developers. There are others channels like bioconda, which contains a lot of bioinformatics packages, R channel, which store R and its packages, conda-forge, which contains community packages, often more updated that the official channels. If you search or want to install a package in a different channel than the default, you have to specify with the --channel option:

$ conda search --channel R r-base=4.3
$ conda create --channel R --name R-4.3 r-base=4.3

You can find more information on Managing channels in conda documentation.

Warning

different channels could have different dependencies: for example could be difficult install both rstudio package from R channel and R-base=4.0 from conda-forge. Moreover channels like conda-forge could have more updates than the default one, and could be difficult install or updating packages in those channels. Instead of installing our your requirements in a single environment, you should install software in dedicated environments, and use custom channels only if its necessary.

Export a conda environment

You could export conda environment in a file. First, you have to activate the environment that you want to import, for example:

$ conda activate R-4.3
$ conda env export > R-4.3.yml

Hint

When you export an environment with conda, yon don’t simply export information to re-build your environment relying on package version, but you also track information about the package build version, in order to be able to download the same file required to install a particular library. Sometimes is difficult to be able to re-create an exported environment, for example if you use packages in conda-forge channel: packages could be updated very often and maybe it is not possible to retrieve the same package file you used during environment import. For such cases, its better to export a conda environment without build specifications, like this:

$ conda env export --no-builds > R-4.3.yml

This will track all your package version without the file hash stored in conda channels. This require more time when restoring an environment, however you will be able to restore an environment after years even if you require some non-standard channels

Import a conda environment

You could create a new environment relying on the exported file, for example on a different machine:

$ conda env create -f R-4.3.yml

Conda-pack

Conda-pack is a tool which allows you to pack a conda environment in a single file. This file can be moved to a different machine and unpacked in a different location. This is useful when you want to move a conda environment to a different machine without internet connection. You can install conda-pack with:

$ conda install conda-pack

Then you can pack an environment with:

$ conda pack -n R-4.3 -o R-4.3.tar.gz

Hint

conda-pack is already installed in our shared core environment using the default base conda environment

Warning

conda-pack will made a copy of all dependencies of your environment, thus the resulting file could be very large. You will make not use of conda packages caches, consider to use conda-pack only when is impossible to make an environment using the standard conda commands.

You can unpack the environment in a different location with:

$ mkdir R-4.3
$ cd R-4.3
$ tar -xzf ../R-4.3.tar.gz
$ source bin/activate

Hint

If you unpack the environment in the conda environment folder (ie. $HOME.conda/envs), you can activate the environment without specifying the full path (using the standard conda activate command, like conda activate R-4.3), since conda will search for environments in the default location. Remember that you have to create the destination path, since the archive will not create it for you.

Remove an environment

You can remove an environment by specifying its name: this environment shouldn’t be active when removing:

$ conda env remove --name R-4.3

Conda best practices

Specify package version if possible

Specifying package version could save a lot of time, for example when you need to resolve dependencies with channels:

$ conda create --channel conda-forge --channel R --name R-4.3 r-base=4.3

Clean up

Conda will download and save packages in a local cache when installing or updating packages. You can save some time when you install a cached package, however this can consume a lot of disk space. You can free conda cache with:

$ conda clean --all

See conda clean for more options.

Setting environment variables

In order to define specific environment variables in a conda environment, you can use the config API or create specific environment files where variables are changed and restored respectively by activating and deactivating the conda environment. The config API is the recommended and the easiest way to define environment variables. In this example we will add a specific JAVA library path to LD_LIBRARY_PATH: first locate the directory with the shared library to include, then call conda env config vars set to define and store the environment variable. For the JAVA version we want to include, this library is located in $(JAVA_HOME)/lib/server, where JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64, so:

$ cd /usr/lib/jvm/java-11-openjdk-amd64/lib/server
$ conda env config vars set LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH

After doing this, the conda environment should be reactivated (you could deactivate and reactivate the same environment again) in order to get effects. You can inspect the new environment variable by calling echo <variable name>, for example:

$ echo $LD_LIBRARY_PATH

or get the full list of custom variables using:

$ conda env config vars list

Remember that when defining environment variables as collection of paths, the desired path should be prepended to current paths, in order to retrieve the desired files before the other positions. The current path should be updated and not replaced since it could contains useful information.

Warning

It’s a bad idea to set the $PATH environment variable using the config API, since when disabling the conda environment, the $PATH will be unset, causing your terminal not working correctly. If you need to add a path to $PATH, you need to manually edit the env_vars.sh files. Ensure to activate your desired environment (in order to resolve the $CONDA_PREFIX environment variable) and then:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Next, edit the ./etc/conda/activate.d/env_vars.sh file and modify the $PATH variable, for example:

#!/bin/sh

export PATH="/home/core/software/sratoolkit/bin:$PATH"

If you desire, you can restore the previous $PATH value by editing the ./etc/conda/deactivate.d/env_vars.sh file:

#!/bin/sh

# remove a particular directory from $PATH (define a new $PATH without it)
# see: https://unix.stackexchange.com/a/496050
export PATH=$(echo $PATH | tr ":" "\n" | grep -v '/home/core/software/sratoolkit/bin' | xargs | tr ' ' ':')

See conda Managing environments for more information.