The ultimate modulefile for conda | Oxford Protein Informatics Group

Environment modules is a great tool for high-performance computing as it is a modular system to quickly and painlessly enable preset configurations of environment variables, for example a user may be provided with modulefile for an antiquated version of a tool and a bleeding-edge alpha version of that same tool and they can easily load whichever they wish. In many clusters the modules are created with a tool called EasyBuild, which delivered an out-of-the-box installation. This works for things like a single binary, but for conda this severely falls short as there are many many configuration changes needed.

Activating conda

Conda is a curious fish to start with. It is not distributed in Linux package managers. It does have a licence that needs accepting (automatically accepted with the -b flag), but so do many linux packages. Out of the box it needs activating. If installed without the -b flag (batch) or conda init is run, a bunch of messy bash commands get appended to .bashrc, primarily a bunch of failsafes to the command.

eval $("$CONDA_PREFIX/bin/conda shell.bash hook 2> /dev/null")

Namely, conda binary is run ($(...)) with the arguments bash and hook, wherein its error messages get sent to the null bucket, while the output is a shell script, which gets evaluated.
In bash running the command source (or ./ or eval) and bash (or exec kind of) can have different effects: source runs the shell script using the same shell while bash run it in a different one. The environment initialisation needs to be sourced therefore —an obvious but important detail.

There are three ways to initialise conda.

Allowing the messy snippet to be added to $HOME/.bashrc and sourcing it, which happens on logging in. I.e. one is sourcing/evaluating the output of conda bash hook
sourcing the $CONDA_PREFIX/etc/profile.d/conda.sh script, which is basically the same evaluation. Personally, I prefer this option.
Adding the variables manually

Conda variables

Once this is done one can activate the base environment, via conda activate or conda activate base, or a virtual environment, via conda activate ENVNAME. Double misleadingly, conda activate without base when run on an already activated base environment will fail telling you that the environment variables are missing, which is a lie.

Various environment variables get set in doing so. printenv prints your environment variables, which makes it really handy.

The key one is the adding to PATH the bin folder of the conda folder. This can be tampered with outside of conda. Files prepended at the front get priority, appended to the back are the last resort.
CONDA_ROOT —the folder where conda lives. say $HOME/.conda
CONDA_PREFIX —the folder of the current environment, for base $CONDA_ROOT = $CONDA_PREFIX
CONDA_EXE — $CONDA_PREFIX/bin/conda
CONDA_PYTHON_EXE — $CONDA_PREFIX/bin/python
PKG_CONFIG_PATH —this is a system package alternative path, nothing do with python packages. But you might have a $CONDA_PREFIX/lib/pkgconfig folder
CONDA_SHLVL —conda shell level. 1 is base.
CONDA_PROMPT_MODIFIER — the text that gets prepended to $PS1, which is the text that appears before your cursor. In my .bashrc I have export PS1="[\u@\h \W]\$", which makes my prompt remind me of my username ($USER) and the hostname ($HOST) and my working directory ($PWD).
CONDA_DEFAULT_ENV —your environment name
- There are many possible conda environment variables as any (on paper) config in a .condarc file can be used as an environment variable by going uppercase and underscored. For examle $CONDA_SOLVER, $CONDA_CHANNELS, $CONDA_YES, $CONDARC and $CONDA_ENVS_PATH etc.

A big caveat needs raising regarding the last one. Always check. For example, $CONDA_CHANNELS does not work in all versions.

Modulefile

Now that we have gone over how to get conda activated, we need to configure the various environment variables for the module command to use.

A module file is a file that tells the module command how to load it. It is written in TCL. There is generally a panel of modulefile written by the sys-admin of your cluster, but you can add your own by appending the folder of your modulefiles to $MODULEPATH, e.g. export MODULEPATH="$MODULEPATH:my-path-with-modulefiles". The main commands to remember are setenv, set-alias and prepend-path, system and puts stderr/stdout.

#%Module
proc ModulesHelp { } {}
module-whatis {}

puts stderr "This is shown to the user on module load and unload"

# set variables within this code: env variables from shell are called via <code>$env(...)</code>
set root path-where-conda-lives
set userhome $env(HOME)
set userconda $env(HOME)/.conda

conflict conda

# conda deactivate on unload
if {[module-info command unload]} {system AUTO_ACTIVATE_BASE=false $root/bin/conda deactivate}

# add envs (on unload they will be unset or replaced)
prepend-path	MANPATH		$root/man
prepend-path	MANPATH		$root/share/man
# See footnote?
# prepend-path	PATH		$root/bin
# prepend-path	PATH		$root/sbin
prepend-path	PKG_CONFIG_PATH		$root/lib/pkgconfig

# `python install -u` by default
setenv PYTHONUSERBASE   $userhome/.local

# user created envs go here:
setenv CONDA_ENVS_PATH	$userconda/envs
setenv JUPYTER_CONFIG_PATH $userhome/.jupyter

# base config for jupyter.
setenv JUPYTER_CONFIG_DIR $root/.jupyter
# CONDA_ENVS_DIR is not the base config

# if not using a $CONDA_ENVS_PATH environment variables can do the job
setenv CONDARC	$userhome/.conda
setenv	CONDA_SOLVER	libmamba
setenv CONDA_YES	true
setenv CONDA_CHANNELS "conda-forge nvidia bioconda"
# etc.

if {[module-info command load]} {
    # give hits to the user
    system touch $env(HOME)/.condarc
    system mkdir -p $userconda/envs
    system mkdir -p $userhome/.local
    system mkdir -p $userhome/.jupyter
    # enable
    puts stdout "source $root/etc/profile.d/conda.sh ;"
}

The footnote is that one could add to one environment that way. Conda has its own system which allows one even to “subclass” one virtual environment into another.

conda env config vars set PATH=$PATH:/Users/matteo/.conda/bin:some-other-env-bin

In the above the virtual environment will search its own folders, then the /usr/local/bin etc. and lastly the other environment.
The big catch to this variable is stored in the conda-meta/state file in the environment and not an environment specific .condarc, which is not a thing as instead conda env export output is generated on the fly. This means that the user would need to be made aware of the alteration as there is no way they are going checking, so the modulefile way (along with a puts stderr call maybe) is way more clear.

Footnote: Can I borrow this?

As mentioned, adding to $MODULEPATH results in the modulefiles therein to be visible when running module avail. This means one can have one’s personal modulefile collection. One could copy these from other clusters: the headers on the output of module avail tells you the path. In the file there will be written where the binary files are.
Then one can share in turn. Except permissions get in the way. To make a folder visible to someone with some with a common group one needs to first change the group ownership to that group chgrp -R COMMONGROUPNAME FILEPATH and change group permissions chmod -R g+rX FILEPATH. Uppercase X means give x if user has it (equivalent to g=u,g-w basically). To make a folder visible to all chmod -R o+rX FILEPATH.

Author

Matteo Ferla

View all posts