Integrating with the CLCbio Genomics Server
This page contains directions on how to connect your CLCbio Genomics Workbench to the CLCbio Genomics Server installation on HTC cluster, allowing you to offload analyses to the cluster
We currently maintain three clcbio server clcbio.crc.pitt.edu, clcbio.sam.pitt.edu and clcbio-stage.sam.pitt.edu. clcbio.crc.pitt.edu and clcbio.sam.pitt.edu run CLC Genomics Server 10.0.1. clcbio-stage.sam.pitt.edu runs CLC Genomics Server 10.0.
Biomedical Genomics Server Extension, CLC Genome Finishing Server Extension and CLC Microbial Genomics Server Extension are enabled on clcbio.crc.pitt.edu.
CLC Assembly Cell 5.0.3 is available on HTC cluster.
The following are the corresponding clients for the CLC Genomics Server 10.0.1
CLC Genomics Workbench 11.0.1
Biomedical Genomics Workbench 5.0.1
CLC Command Line Tools 5.0.1
We recommend running the corresponding versions of clients for CLC Genomics Server. However, CLC Genomics Workbench 11.0, Biomedical Genomics Workbench 5.0, and CLC Command Line Tools 5.0 can also connect to CLC Genomics Server 10.0.1. Tools that have changed between versions cannot be launched when using compatible, but not corresponding, client-server combinations.
Server plugins (clcbio.crc.pitt.edu, clcbio.sam.pitt.edu and clcbio-stage.sam.pitt.edu)
Additional Alignments Server Plugin
Advanced Peak Shape Tools Server Plugin (Beta)
Annotate with GFF file server plugin
Bisulfite Sequencing Server Plugin
Histone ChIP-Seq Server Plugin
Ingenuity Pathway Analysis Server Plugin
Transcript Discovery Server Plugin (Beta)
Biomedical-enabled CLC Genomics Servers only (clcbio.crc.pitt.edu)
Ingenuity Variant Analysis Server Plugin
QIAGEN GeneRead Panel Analysis Server Plugin
QIAseq Targeted Panel Analysis Server Plugin
Commercially available Server Extensions (clcbio.crc.pitt.edu)
CLC Genome Finishing Server Extension
CLC Microbial Genomics Server Extension
CLC workbench download link
CLC Genomics Workbench
Version: 11.0.1 - Release date: 14. Mar 2018
Download Mac OS X 10.7 or later - 188.3 MB (.dmg) http://download.clcbio.com/CLCGenomicsWorkbench/11.0.1/CLCGenomicsWorkbe...
Download Linux (RedHat/SuSE) installer - 64bit - 221.6 MB (.sh) http://download.clcbio.com/CLCGenomicsWorkbench/11.0.1/CLCGenomicsWorkbe...
Download Windows - 64bit - 173.0 MB (.exe) http://download.clcbio.com/CLCGenomicsWorkbench/11.0.1/CLCGenomicsWorkbe...
Biomedical Genomics Workbench
Version: 5.0.1 - Release date: 14. Mar 2018
Download Mac OS X 10.7 or later - 190.5 MB (.dmg) http://download.clcbio.com/BiomedicalGenomicsWorkbench/5.0.1/BiomedicalG...
Download Linux (RedHat/SuSE) installer - 64bit - 223.3 MB (.sh) http://download.clcbio.com/BiomedicalGenomicsWorkbench/5.0.1/BiomedicalG...
Download Windows - 64bit - 174.9 MB (.exe) http://download.clcbio.com/BiomedicalGenomicsWorkbench/5.0.1/BiomedicalG...
Ensure you have the most up-to-date version of the CLCbio Genomics Workbench (the software should tell you if there's a more recent version when you start it, or you can check this page on the CLCbio website)
If you have not already done so, request a user account/allocation on the Center for Research Computing (CRC) cluster by filling out the required information on this page
If your computer is not connected to the Pitt network (e.g. you are working from home or on a trip), or you are working from a laptop that is connected to the Pitt wireless system, make sure you setup Pitt SSLVPN, so that you can communicate with the Center for Research Computing (CRC) cluster (clcbio servers are using HTC cluster)
Start up the CLC Genomics Workbench
If you have not done so already, install the CLC Workbench Client Plugin by clicking on the Plug-ins button () in the toolbar at the top of the CLC Genomics Workbench window. This will bring up the Manage Plug-ins and Resources dialog box. Find the CLC Workbench Client Plugin, click the Download and Install button, and then close the Manage Plug-ins and Resources dialog box and restart the CLC Genomics Workbench (choose Yes when the dialog box comes up that asks if you want to restart the workbench now)
From the File menu, choose the "CLC Server Login" option. Click the triangle next to "Advanced", to find the server information section. The Server host is clcbio.crc.pitt.edu, and the Server port is 7777. Fill in your Pitt username and password, then check off the boxes to have this information saved, and to have the software automatically log in to the server (assuming the software you are using is on your own computer, and not a publicly accessible machine). Please note that username is case sensitive and all letters are in lowercase. Refer to the image below for an example of how the settings in this box should look:
Your workbench software will now attempt to connect to the CLCbio Genomics Server installation on CRC cluster. One of the only noticeable changes will be the appearance of new folders in your Navigation Area. You can find one folder named CLC_Server_Data and CLC_User_Data with a blue S on the folder icon:
This is the data folder on CRC cluster, and inside it you will find folders corresponding to your group, which you should have access to (the name convention is first letter of first name + last name of the faculty):
This folder is your group's working directory. Copying files in the workbench from your local folders to the folders on the server will copy your data over to CRC (again, file permissions have been set to restrict access to your data to only those members of your group - if you need any special permissions, or if you do not find a folder matching your group, please open a support ticket on the CRC mainpage).
Computational genomics tasks require various reference genome. CLC_References and User_References with a blue S on the folder icons are the folders for reference genomes. CLC_Referenes are associated with Biomedical Genomics Server, and its contents include human, mouse and rat genomes. Reference genomes for the other species are installed under User_References. If you need any special reference genome, please open a support ticket on the CRC mainpage).
Running an analysis on HTC cluster operates in much the same fashion as running an analysis on your own computer, however in the dialog box that opens (when you first select a tool to run), you will now see additional options:
To run on HTC cluster, always select the "Grid" option (do not attempt to run analyses using the "CLC Server" option as, counterintuitively, these will fail). The drop-down menu under the "Grid" option allows you to select an appropriate grid present, to control how many cores are assigned to your job and how long the job will need to run:
In our experience, most jobs do not require more than 24 hours to complete (really most of them finish in less than 4 hours). Aligning large exome data sets to a reference genome typically can be done using 24 cores in about 2 hours (even data sets with up to 100x coverage). Aligning whole genome data sets (especially those with high coverage) is best done with 48 cores, and will typically require something less than 24 hours (recent alignments of 100x whole genome data - nearly 1 billion reads - have been completed in 6 hours using 48 cores, and even larger data sets - 1.5 billion reads - completed in 15 hours using 48 cores). Note however that variant calling requires much more time than alignment (sometimes requiring almost twice as much time), but does not use as many cores. In our experience, variant calling for whole exome data sets typically takes on the order of 6 hours (using 6 cores), while variant calling for whole genome data sets takes more like 30 hours (using 6 cores). Minimizing the number of cores your jobs use, and the amount of time blocked off for your jobs is essential, as there are limited resources currently available to the CLC server.
If you think your job requires a grid preset that is not currently available, please send Dr. Fangping Mu an email: email@example.com
Occasionally (such as when you are running an import tool), you will also see a dialog box asking you where your data is located:
Your selection here will decide which folders can be searched for files in the subsequent steps of the tool. Import tools can be used to simultaneously convert data from FASTQ format (for example) to the CLCbio format and transfer the CLCbio format file to the server. We can assign each group (faculty) an import/export directory on mobydisk /mnt/mobydisk/groupshares/. Member of the group shared this import/export directory with read/write permissions. Please open a support ticket on the CRC mainpage if you do not find a folder matching your group.
Once you start a job running on HTC cluster, you will see the usual progress bars in the Process section of the Toolbox. When the job status is listed as "Running", you can close your Workbench software, and the job will continue running on the remote server. When you relaunch your workbench, it will again connect to the server (as long as you checked "Automatic login" above - otherwise you can manually log in again), and the status of your job will be updated.
Working directory and Import/export directory are assigned on /mnt/mobydisk. Note that this /mnt/mobydisk is not backed up, so you will need to be diligent and back up to your own personal drives.
At the moment, the CLCbio software does not provide fine control of data access at the individual user level. The access permissions are enforced at the group level. What this means is that if User_A and User_B are both within Group_Z, then both will have read/write access to data stored within the Group_Z directory.
Each group from the schools of health sciences is assigned a group quota of 2TB on mobydisk. If your group requires more disk space on mobydisk, please contact us.