Conda
About Conda
Conda is a software which allows you to manage software installations in distinct environments. It was born to support the python ecosystem, however most softwares has been supported by conda, for example R with Anaconda and its packages, and there are channels like bioconda, which collect and maintain a lot of useful softwares. The main advantage in using conda environments is that packages could be installed directly with their dependencies, without the needing to compile everything. Moreover conda and its environments can be installed by an user without administrative privileges. Packages and dependencies are installed inside user directories, and a complete uninstallation can be done by erasing the conda installation folder. From the conda official docs:
Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.
Installing Conda
Is Conda already installed?
Conda isn’t installed by default on your system. However on a shared resource or
a remote machine could be already installed by the system administrator. Try to
understand if conda is installed using which, for example:
(base) cozzip@cloud1:~$ which conda
/usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/bin/conda
(base) cozzip@cloud1:~$ which python
/usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/bin/python
In such case, conda is installed and currently active (The (base) near username
in the bash prompt, is the environment name currently active in the terminal)
Hint
Conda is already installed and initialized in our shared core environment.
When you log in you should see the (base) default environment activated.
This installation let you use the provided environments managed by the system
administrator, and to define your local environments in your $HOME folder.
Should I install Conda or Miniconda?
Conda is installed with a a lot of dependencies, like spyder editor, jupyter notebook and many other packages. Miniconda is a lighter version of anaconda, which installs only the minimal packages required to work correctly with conda. In general, you could decide to install the whole Conda in a local installation, since in your personal computer you could exploit the benefit of the editors and the graphical user interfaces. When working on a remote server, using Miniconda is recommended since you have the full control on what is installed and generally you don’t need starting graphical interfaces on a remote servers. If you are in doubt, please see the Anaconda or Miniconda section of conda installation guide.
Download and install Conda
You could install Anaconda or miniconda respectively. Then follow the installation instructions provided by Anaconda or miniconda.
Managing environments with conda
Choose an environment
You can explore the conda environment available with:
$ conda env list
# conda environments:
#
R-4.3 /home/cozzip/.conda/envs/R-4.3
base * /usr/local/Miniconda3-py38_4.8.3-Linux-x86_64
nf-core /usr/local/Miniconda3-py38_4.8.3-Linux-x86_64/envs/nf-core
The environment with * is the current active environment. Is the same you see
in the bash prompt.
Hint
conda env list is different from conda list which tells you which
packages are installed in your current environment.
You could enable a conda environment using conda activate, for example:
$ conda activate R-4.3
You should see that the environment name near the bash prompt changed to the desired environment. In order to exit the current environment (and return to your previous environment), you have to deactivate with:
$ conda deactivate
Create a new environment
You can create a new environment by specify the environment name using --name
option. You could also specify which package to install when creating an environment:
conda create --name <env name> [package1] [package2]
See Managing environment in conda documentation for more information
Hint
You can save time by specifying package version (ex. python=3.8): conda will
have less dependencies to evaluate
A note on channels
Channels are repository where conda store packages. The default contains packages
maintained by conda developers. There are others channels like bioconda,
which contains a lot of bioinformatics packages, R channel,
which store R and its packages, conda-forge, which
contains community packages, often more updated that the official channels. If you
search or want to install a package in a different channel than the default, you
have to specify with the --channel option:
$ conda search --channel R r-base=4.3
$ conda create --channel R --name R-4.3 r-base=4.3
You can find more information on Managing channels in conda documentation.
Warning
different channels could have different dependencies: for example could be difficult
install both rstudio package from R channel and R-base=4.0 from conda-forge.
Moreover channels like conda-forge could have more updates than the default
one, and could be difficult install or updating packages in those channels. Instead
of installing our your requirements in a single environment, you should install
software in dedicated environments, and use custom channels only if its necessary.
Export a conda environment
You could export conda environment in a file. First, you have to activate the environment that you want to import, for example:
$ conda activate R-4.3
$ conda env export > R-4.3.yml
Hint
When you export an environment with conda, yon don’t simply export information
to re-build your environment relying on package version, but you also track information
about the package build version, in order to be able to download the same file
required to install a particular library.
Sometimes is difficult to be able to re-create an exported environment, for example
if you use packages in conda-forge channel: packages could be updated very
often and maybe it is not possible to retrieve the same package file you used
during environment import. For such cases, its better to export a conda
environment without build specifications, like this:
$ conda env export --no-builds > R-4.3.yml
This will track all your package version without the file hash stored in conda channels. This require more time when restoring an environment, however you will be able to restore an environment after years even if you require some non-standard channels
Import a conda environment
You could create a new environment relying on the exported file, for example on a different machine:
$ conda env create -f R-4.3.yml
Conda-pack
Conda-pack is a tool which allows you to pack a conda environment in a single file. This file can be moved to a different machine and unpacked in a different location. This is useful when you want to move a conda environment to a different machine without internet connection. You can install conda-pack with:
$ conda install conda-pack
Then you can pack an environment with:
$ conda pack -n R-4.3 -o R-4.3.tar.gz
Hint
conda-pack is already installed in our shared core environment using
the default base conda environment
Warning
conda-pack will made a copy of all dependencies of your environment, thus
the resulting file could be very large. You will make not use of conda packages
caches, consider to use conda-pack only when is impossible to make an
environment using the standard conda commands.
You can unpack the environment in a different location with:
$ mkdir R-4.3
$ cd R-4.3
$ tar -xzf ../R-4.3.tar.gz
$ source bin/activate
Hint
If you unpack the environment in the conda environment folder (ie. $HOME.conda/envs),
you can activate the environment without specifying the full path (using the
standard conda activate command, like conda activate R-4.3), since conda
will search for environments in the default location. Remember that you have to
create the destination path, since the archive will not create it for you.
Remove an environment
You can remove an environment by specifying its name: this environment shouldn’t be active when removing:
$ conda env remove --name R-4.3
Conda best practices
Specify package version if possible
Specifying package version could save a lot of time, for example when you need to resolve dependencies with channels:
$ conda create --channel conda-forge --channel R --name R-4.3 r-base=4.3
Clean up
Conda will download and save packages in a local cache when installing or updating packages. You can save some time when you install a cached package, however this can consume a lot of disk space. You can free conda cache with:
$ conda clean --all
See conda clean for more options.
Setting environment variables
In order to define specific environment variables in a conda environment, you
can use the config API
or create specific environment files
where variables are changed and restored respectively by activating and deactivating
the conda environment. The config API is the recommended and the easiest way
to define environment variables. In this example we will add a specific JAVA library
path to LD_LIBRARY_PATH: first locate the directory with the shared library
to include, then call conda env config vars set to define and store the environment
variable. For the JAVA version we want to include, this library is located in
$(JAVA_HOME)/lib/server, where JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64,
so:
$ cd /usr/lib/jvm/java-11-openjdk-amd64/lib/server
$ conda env config vars set LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH
After doing this, the conda environment should be reactivated (you could deactivate and
reactivate the same environment again) in order to get effects. You can inspect
the new environment variable by calling echo <variable name>, for example:
$ echo $LD_LIBRARY_PATH
or get the full list of custom variables using:
$ conda env config vars list
Remember that when defining environment variables as collection of paths, the desired path should be prepended to current paths, in order to retrieve the desired files before the other positions. The current path should be updated and not replaced since it could contains useful information.
Warning
It’s a bad idea to set the $PATH environment variable using the config API,
since when disabling the conda environment, the $PATH will be unset, causing
your terminal not working correctly. If you need to add a path to $PATH, you
need to manually edit the env_vars.sh files. Ensure to activate your desired
environment (in order to resolve the $CONDA_PREFIX environment variable) and
then:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Next, edit the ./etc/conda/activate.d/env_vars.sh file and modify the $PATH
variable, for example:
#!/bin/sh
export PATH="/home/core/software/sratoolkit/bin:$PATH"
If you desire, you can restore the previous $PATH value by editing the
./etc/conda/deactivate.d/env_vars.sh file:
#!/bin/sh
# remove a particular directory from $PATH (define a new $PATH without it)
# see: https://unix.stackexchange.com/a/496050
export PATH=$(echo $PATH | tr ":" "\n" | grep -v '/home/core/software/sratoolkit/bin' | xargs | tr ' ' ':')
See conda Managing environments for more information.