Manage project dependencies with conda environments

TL;DR: Create a conda environment for each project, capture exact versions when possible, automate activation and updating with a bash function.

I often work on several different projects involving software: Python libraries, papers, presentations, posters, this website, etc. Each project has different dependencies and there is a non-zero chance that these dependencies might be in conflict with each other. For example, I need Python 2.7 to work on a tesseroid modeling paper with a student, while my current work on GMT/Python and GPS interpolation project are Python 3.5+ only. Clearly, I can't have everything under the same Python installation. That's where virtual environments come in.

Virtual environments allow you to create multiple separate Python installations ("environments"). You can install different packages on each and switch between them easily. Currently, you can do this using Python's virtualenv or using the conda package manager. I use conda for all my package management because I need non-Python packages and multiple Python versions. If you're new to conda, please go check out Eric Ma's great tips for working with conda.

In this post, I'll share some more tips and a bash function I made for managing environments.

When to create environments

First of all, I want to reiterate Eric's second hack: create one conda environment for each project.

I have been doing this for a few years now and even included a default environment file in my research group's paper template. As soon as I start a new project repository, I'll create an environment.yml with the configuration I need:

# The name of the environment matching the repository name
name: same-as-repository
# I prefer conda-forge packages for my projects
channels:
- conda-forge
- defaults
# Start with Python and include everything you need
dependencies:
- python=3.7
- pip
- numpy
...

With this file in the repository, you can create the new environment by running:

conda env create

The advantage of always having the environment file is that I always know what each project needs. This is particularly useful when switching back and forth between a laptop and desktop or when returning to a project after a while.

Now you can activate the environment using source activate same-as-repository to get access to a completely separate Python installation. When switching to a different project, always source activate environment-name and then run your code.

See the conda docs for more information on environments.

Be as specific as you can

When creating environments for papers, it's a good idea to capture the exact versions of every package so that you can rebuild the environment later on. Otherwise, there is the risk of dependencies updating and your code no longer running. You might not want to do this if you're still in the middle of the project and adding new dependencies.

Once a paper is accepted, I'll usually export the environment with exact version numbers using:

conda env export > environment.yml

Automate the boring parts

I have a git repository for nearly everything I do and most of them have an environment.yml file. With so many environments, it can be really hard to remember all their names and type out conda activate paper-moho-inversion-tesseroids. Instead of using really short names, let's automate the activation and some other useful commands with a bash function:

function cenv() {

# Usage and help message
read -r -d '' CENV_HELP <<-'EOF'
Usage: cenv [COMMAND] [FILE]

Detect, activate, delete, and update conda environments.
FILE should be a conda .yml environment file.
If FILE is not given, assumes it is environment.yml.
Automatically finds the environment name from FILE.

Commands:

  None     Activates the environment
  rm       Delete the environment
  up       Update the environment

EOF

    envfile="environment.yml"

    # Parse the command line arguments
    if [[ $# -gt 2 ]]; then
        errcho "Invalid argument(s): $@";
        return 1;
    elif [[ $# == 0 ]]; then
        cmd="activate"
    elif [[ "$1" == "--help" ]] || [[ "$1" == "-h" ]]; then
        echo "$CENV_HELP";
        return 0;
    elif [[ "$1" == "rm" ]]; then
        cmd="delete"
        if [[ $# == 2 ]]; then
            envfile="$2"
        fi
    elif [[ "$1" == "up" ]]; then
        cmd="update"
        if [[ $# == 2 ]]; then
            envfile="$2"
        fi
    elif [[ $# == 1 ]]; then
        envfile="$1"
        cmd="activate"
    else
        errcho "Invalid argument(s): $@";
        return 1;
    fi

    # Check if the file exists
    if [[ ! -e "$envfile" ]]; then
        errcho "Environment file not found:" $envfile;
        return 1;
    fi

    # Get the environment name from the yaml file
    envname=$(grep "name: *" $envfile | sed -n -e 's/name: //p')

    # Execute one of these actions: activate, update, delete
    if [[ $cmd == "activate" ]]; then
        source activate "$envname";
    elif [[ $cmd == "update" ]]; then
        errcho "Updating environment:" $envname;
        source activate "$envname";
        conda env update -f "$envfile"
    elif [[ $cmd == "delete" ]]; then
        errcho "Removing environment:" $envname;
        source deactivate;
        conda env remove --name "$envname";
    fi
}

Copy this code into your ~/.bashrc file and restart your terminal. Now you can activate an environment using the cenv command:

(base) $ cd papers/my-long-project-name

(base) $ ls -F
code/ manuscrip/ data/ README.md LICENSE.txt environment.yml

(base) $ head -n 1 environment.yml
name: my-long-project-env-name

(base) $ cenv

(my-long-project-env-name) $

With no arguments, cenv will find the environment.yml, extract the environment name, and activate it. You can also specify the file as an argument. I find this preferable to using conda-auto-env, as suggested in Eric's post, because conda is not the fastest program and I get frustrated by the slowdown in the cd command.

If you add new dependencies to environment.yml, you can update the environment by running:

$ cenv up

Or you can delete the environment using:

$ cenv rm

With these commands, updating and activating environments is simple and quick to type so there is no excuse for not using them abundantly.

NOTE: If you're using Jupyter notebooks, the cenv function might not be that useful. In that case, I recommend installing the nb_conda package. It allows you to specify which environment you want your notebook to run under when you create a new notebook or change the kernel.

Final thoughts

The main takeaways here are:

  1. Use a tool to manage your dependencies (whatever works for you)
  2. Automate the process so you won't be lazy
  3. Specify exact version numbers for long(er) term reproducibility

I use conda because it suits my needs but similar features exits in other package managers. If you prefer pip with pipenv, by all means use them.

The source code for the cenv function is in my dotfiles repository and is MIT licensed. The exact version of the code shown here is in .bash/functions.sh commit e95f6d9. Additions and contributions are more than welcome!

What are your conda workflow/productivity hacks? Please share below in the comments or on twitter!


The thumbnail image for this post is derived from "Green Anaconda in Trivandrum Zoo" by Mithun.M.Das and both are licensed CC-BY-SA.


Comments? Leave one below or let me know on Twitter @leouieda.
Found a typo/mistake? Send a fix through Github and I'll happily merge it (plus you'll feel great because you helped someone). All you need is an account and 5 minutes!

More from the blog

Related pages