Python

Available Python Versions

When you log on to the clusters, the Python versions immediately available are the system installations.

[user@login0b ~]# which python
/bin/python

[user@login0b bank]# python --version
Python 2.7.5

You will most likely want to user a newer version of python than 2.7.5. The clusters supports various versions of Python, all accessible via the LMOD module system. 

For general use, we suggest the following modules:

  • python/anaconda3.5-4.2.0
  • python/anaconda3.6-5.2.0
  • python/3.7.0

A full list of available python versions can be viewed with module spider python

 

Loading Python Via the Module System

Load a newer version of python and see that is has replaced the system installation:

[user@login0b ~]# module load python/3.7.0
[user@login0b ~]# which python
/ihome/crc/install/python/miniconda3-3.7/bin/python
[user@login0b ~]# python --version
Python 3.7.0

All of the modules are Anaconda distributions of Python that support pip and conda installation commands.

Users do not have privileges to install Python packages to the system, so keep these options in mind when you are setting up your environment:

  1. Manage your python package installation environment using a conda virtual environment (recommended). You can create any number of virtual environments for your various projects, and they are great for preventing dependency conflicts when switching between workflows that use different software tools. 
  2. pip install --user <package> will install a Python package directly into a directory in your /ihome location. You have full permissions to access these files/directories. Be wary of using this method as it is easier to run into the dependency conflicts mentioned above. When using pip, if you are unsure of what version of the package you want to install you can use pip install <package>== to print the available versions.

 

Creating a Virtual Environment

Here is an example of creating a conda virtual environment:

[user@login0b ~]$ conda create -n my_env python=3.8 
Collecting package metadata (current_repodata.json): done 
Solving environment: done ...

The newly created environment my_env is installed in the directory: /ihome/<your group name>/<your username>/.conda/envs/my_env

All of the installed packages are in the folder “bin”.

Note: You can also provide a prefix to install your conda environment to a particular location. For example, if you want your environment in a location on iX instead:

[user@login0b ~]$ conda create --prefix=/ix/<your group name>/<your user name>/envs/my_env python=3.7

 

Activate your Virtual Environment

Activate your environment with the source command:

[user@login0b ~]$ ​source activate my_env
(my_env) [user@login0b ~]$ 

When this is done, the environment name should be visible at the beginning of your terminal prompt.

If you are using a prefix to install your conda environment, use the prefix to activate your environment.

[user@login0b ~]$ source activate /ix/mygroup/user/envs/my_env
(my_env) [user@login0b ~]$

 

Important Note on conda activate

Do not use `conda activate` to source your environment on the clusters.

More recent Anaconda distributions will tell you to use this command instead of source activate to activate your newly created environment. If you use conda activate, you will be prompted to issue the command conda init. Do not do this.

When you load one of our Anaconda modules, you have effectively loaded the base conda environment for that Anaconda version. However, conda tries to manage activation of its own base environment by modifying the user's .bashrc file.

Even if you have installed your own local version of Anaconda or miniconda, do not use conda init. When conda init runs, it places commands into your .bashrc file that will stop certain things from working on the system. Many modules use their own python environments. Activating a specific conda environment in your .bashrc file can break these.

 

Workaround for using conda activate anyways:

The conda init command places code in your .bashrc file that modifies, among other things, the PATH environment variable by prepending it to the path of the base conda environment. This occurs before the default system modules are loaded. Other modules may also have libraries that will hide Anaconda libraries and cause errors.

If you must utilize conda activate in local Anaconda or miniconda installations,

    Run conda init, and then immediately open .bashrc with a file editor.
    Remove the code that was added by conda init and place it in another script file (for example, conda_init.sh).
    After the login process completes, run the code in the script file:

    source conda_init.sh

You should now be able to use conda activate.

 

Install Packages in your Virtual Environment

Installing Python modules with conda

The conda package manager is recommended for maintaining your environment.

Take look at and use the Conda cheat sheet,  which lists the most commonly used commands. More detailed documentation is in the Conda User Guide.

To install a new package, run

conda install [packagename]

Conda channels are the remote repository that conda takes to search or download the packages. If you want to install a package that is not in the default Anaconda channel, you can tell conda which channel containing the package, so that conda can find and install. To install a new package from bioconda channel, run

conda install -c bioconda [packagename]

Installing Python modules with pip

When a Python package does not exist as a conda package, one can use the Python pip installer. We recommend using pip only as a last resort since this way one loses the flexibility of the conda packaging environment (automatic conflict resolution and version upgrade/downgrade). See conda's references.

To install a module using pip, run:

pip install [packagename]

 

Running a SLURM Job using your Virtual Environment

Here is a sample batch script that uses tools that were installed via a Conda environment:

#!/bin/bash
#SBATCH --job-name=conda
#SBATCH -N 1 
#SBATCH -t 3-00:00 
#SBATCH --cpus-per-task=1 

#Load python via LMOD
module load python/bioconda-3.7-2019.03

#Activate your environment
source activate /ix/group/user/envs/samtools

#Run commands utilizing your loaded Python tool
samtools --help

 

Advanced Conda Usage 

Conda Environment Configuration Files

The conda configuration file, .condarc, is an optional runtime configuration file that allows advanced users to configure various aspects of conda, such as which channels it searches for packages, proxy settings, and environment directories.

The following displays a sample .condarc file:

[fmu@login0b ~]$ cat .condarc
pkgs_dirs: - /ihome/fmu/fmu/.conda/pkgs
channels:
  - r
  - conda-forge
  - bioconda
  - defaults

 

Custom Miniconda Installation and Usage

You can use miniconda installer under /ihome/crc/build/python.

You can also download the Miniconda installer using the wget command.  

Then run the installer, pointing it to the directory where you want to install it.

[fmu@login0b ~]$ cd /ihome/crc/build/python
[fmu@login0b python]$ bash Miniconda3-py38_4.9.2-Linux-x86_64.sh -b -p ~/python_env/myenv -s 
PREFIX=/ihome/fmu/fmu/python_env/myenv 
... 
[fmu@login0b python]$ cd ~/python_env/myenv/bin
[fmu@login0b bin]$ pwd /ihome/fmu/fmu/python_env/myenv/bin
[fmu@login0b bin]$ # ./conda install -c conda-forge jupyterlab # You can install other conda package to your miniconda environment ...

The flag '-b' forces unattended installation, which among other things does not add Miniconda to your default environment - we will do it in the next step via environment modules. The '-p' marks the installation directory. The '-s' will not automatically set up your environment to use this miniconda - we will do this in the next section using the environment module.

The full path to this local anaconda installs directory is /ihome/fmu/fmu/python_env/myenv

 

Custom Miniconda Environment Module

To easily set up a Miniconda environment, create a user environment module.

First create a directory where the user environment module hierarchy will reside, and then copy our miniconda module file to this directory.

[user@login0b ~]$ mkdir modulefiles
[user@login0b ~]$ module use ~/modulefiles
[user@login0b ~]$ cd modulefiles
[user@login0b modulefiles]$ cp /ihome/crc/modules/Core/python/ondemand-jupyter-python3.8.lua myenv.lua
[user@login0b modulefiles]$ vi myenv.lua #edit myenv.lua: local package_root = "/ihome/fmu/fmu/python_env/myenv"

The user module environment must be loaded into the default module environment with the module use command.

After that, load the user space miniconda module.

[user@login0b ~]$ module purge
[user@login0b ~]$ module use ~/modulefiles
[user@login0b ~]$ module load myenv
[user@login0b ~]$ which conda ~/python_env/myenv/bin/conda

 

Alternate Instructions for Virtual Environments with virtualenvwrapper

We strongly encourage using virtual environments, which give you complete control over which versions of packages are installed.

Here is an example of installing PyTorch (CPU Version) into a Virtual Environment using virtualenvwrapper:

$ module load python/3.7.0 venv/wrap # Available for the versions of Python listed above
$ mkvirtualenv pytorch
$ workon pytorch
$ pip install numpy torch torchvision
$ python
Python 3.7.0 (default, Jun 28 2018, 13:15:42) 
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
@>>> import torch
@>>> x = torch.rand(5, 3)
@>>> print(x)
tensor([[0.6022, 0.5194, 0.3726],
        [0.0010, 0.7181, 0.7031],
        [0.7442, 0.5017, 0.2003],
        [0.1068, 0.4622, 0.2478],
        [0.8989, 0.8953, 0.0129]])
@>>> 

To list your environments, use workon with no arguments.

The nice thing about this tool is that you can "hot-swap" Python environments.

If you want to swap pytorch for tensorflow you can do workon tensorflow.

 

Machine Learning

At CRC, we are currently running CUDA 10.1 and the corresponding drivers. PyTorch's current stable release is 1.6.0 to install for use on GPUs at CRC,

pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

The recommended version of Tensorflow is 2.1.0 and can be installed,

pip install tensorflow==2.1.0

 

Access Virtual Environment from JupyterHub

To access a virtual environment from JupyterHub you need to install ipykernel. In the following code snippet, ray is an arbitrary name you can use any name you like (but make sure to replace all instances of ray). /location/to/python is the version of python you want to use. If you want to use the one from python/3.7.0 you can omit this option. To get the path, use which python.

module purge
module load python/3.7.0 venv/wrap
mkvirtualenv -p /location/to/python ray
workon ray
pip install ipykernel
python -m ipykernel install --user --name=ray

 

Simple Slurm job submission script

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=11
#SBATCH --time=01:00:00
#SBATCH --cluster=gpu
#SBATCH --partition=gtx1080
#SBATCH --gres=gpu:2

module purge
module load python/3.7.0 venv/wrap # Available for the versions of Python listed above
mkvirtualenv pytorch #you can ignore this line if you already created your env
workon pytorch #switch to your env whatever you named it. 
pip install numpy torch torchvision # you can ignore this line if you already created your env
srun python test.py