Integrating with the CLCbio Genomics Server
Mobydisk retirement note: The original folder CLC_Server_Data data is on Mobydisk. We have brought up a new clcbio server clcbio-stage.crc.pitt.edu, and renamed this folder to CLC_Server_Data_to_be_retired.
July 12: Mobydisk will be fully decommissioned. Any data still stored on Mobydisk will be lost and unrecoverable.
Submit a help ticket (http://crc.pitt.edu/tickets ) to migrate folder under CLC_Server_Data_to_be_retired to the newly created CLC_Server_Data on BeeGFS system.
This page contains directions on how to connect your CLCbio Genomics Workbench to the CLCbio Genomics Server installation on HTC cluster, allowing you to offload analyses to the cluster
We currently maintain two clcbio servers: clcbio.crc.pitt.edu (down to upgrade) and clcbio-stage.crc.pitt.edu. clcbio.crc.pitt.edu and clcbio-stage.crc.pitt.edu runs CLC Genomics Server 11.0.
CLC Genome Finishing Server Extension and CLC Microbial Genomics Server Extension are enabled on clcbio-stage.crc.pitt.edu.
Biomedical Genomics Analysis Server Plugin 1.0 Installing this plugin on a CLC Genomics Server provides the functionality formerly available by installing a Biomedical Genomics Server Extension license on a CLC Genomics Server and installing the now-retired QIAseq Targeted Panel Analysis Server Plugin.
CLC Assembly Cell 5.0.3 is available on HTC cluster.
The following are the corresponding clients for the CLC Genomics Server 11.0
CLC Genomics Workbench 12.0
CLC Command Line Tools 6.0
We recommend running the corresponding versions of clients for CLC Genomics Server.
Server plugins (clcbio.crc.pitt.edu and clcbio-stage.crc.pitt.edu)
Additional Alignments Server Plugin
Advanced Peak Shape Tools Server Plugin (Beta)
Annotate with GFF file server plugin
Biomedical Genomics Analysis Server Plugin
Histone ChIP-Seq Server Plugin
Ingenuity Pathway Analysis Server Plugin
Transcript Discovery Server Plugin (Beta)
Commercially available Server Extensions (clcbio-stage.crc.pitt.edu)
CLC Genome Finishing Server Extension
CLC Microbial Genomics Server Extension
CLC workbench download link
CLC Genomics Workbench
Version: 12.0 - Release date: 28. Nov 2018
Download macOS Installer - 268.9 MB (.dmg) http://download.clcbio.com/CLCGenomicsWorkbench/12.0/CLCGenomicsWorkbenc...
Download Linux 64 bit installer - 297.7 MB (.sh) http://download.clcbio.com/CLCGenomicsWorkbench/12.0/CLCGenomicsWorkbenc...
Download Windows 64 bit installer - 250.6 MB (.exe) http://download.clcbio.com/CLCGenomicsWorkbench/12.0/CLCGenomicsWorkbenc...
Ensure you have the most up-to-date version of the CLCbio Genomics Workbench (the software should tell you if there's a more recent version when you start it, or you can check this page on the CLCbio website)
If you have not already done so, request a user account/allocation on the Center for Research Computing (CRC) cluster by filling out the required information on this page
If your computer is not connected to the Pitt network (e.g. you are working from home or on a trip), or you are working from a laptop that is connected to the UPMC network, make sure you setup Pitt SSLVPN, so that you can communicate with the Center for Research Computing (CRC) cluster (clcbio servers are using HTC cluster). Make sure that "Server URL" (4) is sremote.pitt.edu, and "Please select a Role" (14) is Firewall-SAM-USERS-Pulse. Note that there are many different VPN roles. Only Firewall-SAM-USERS-Pulse role can connect to CRC clusters. If your VPN is installed by system administrators and you are not sure what role is used, open Pulse Secure, and click + sign and follow the instructions in these figures (https://crc.pitt.edu/htc#Off-campus-access ).
Start up the CLC Genomics Workbench
If you have not done so already, install the External Applications Client Plugin by clicking on the Plug-ins button () in the toolbar at the top of the CLC Genomics Workbench window. This will bring up the Manage Plugins dialog box. Find the External Applications Client Plugin, click the Download and Install button, and then close the Manage Plugins dialog box and restart the CLC Genomics Workbench (choose Yes when the dialog box comes up that asks if you want to restart the workbench now)
From the File menu, choose the "CLC Server Login" option. Click the triangle next to "Advanced", to find the server information section. The Server host is clcbio.crc.pitt.edu, and the Server port is 7777. Fill in your Pitt username and password, then check off the boxes to have this information saved, and to have the software automatically log in to the server (assuming the software you are using is on your own computer, and not a publicly accessible machine). Please note that username is case sensitive and all letters are in lowercase. Refer to the image below for an example of how the settings in this box should look:
Your workbench software will now attempt to connect to the CLCbio Genomics Server installation on CRC cluster. One of the only noticeable changes will be the appearance of new folders in your Navigation Area. You can find one folder named CLC_Server_Data and CLC_User_Data with a blue S on the folder icon:
This is the data folder on CRC cluster, and inside it you will find folders corresponding to your group, which you should have access to (the name convention is first letter of first name + last name of the faculty):
This folder is your group's working directory. Copying files in the workbench from your local folders to the folders on the server will copy your data over to CRC (again, file permissions have been set to restrict access to your data to only those members of your group - if you need any special permissions, or if you do not find a folder matching your group, please open a support ticket on the CRC mainpage).
Computational genomics tasks require various reference genome. CLC_References and User_References with a blue S on the folder icons are the folders for reference genomes. CLC_Referenes are associated with Biomedical Genomics Server, and its contents include human, mouse and rat genomes. Reference genomes for the other species are installed under User_References. If you need any special reference genome, please open a support ticket on the CRC mainpage).
Running an analysis on HTC cluster operates in much the same fashion as running an analysis on your own computer, however in the dialog box that opens (when you first select a tool to run), you will now see additional options:
To run on HTC cluster, always select the "Grid" option (do not attempt to run analyses using the "CLC Server" option as, counterintuitively, these will fail). The drop-down menu under the "Grid" option allows you to select an appropriate grid present, to control how many cores are assigned to your job and how long the job will need to run:
"HTC Data" Grid options are designed for data import/export. Only 16 GB RAM, 1 core is assigned to the jobs. In our experience, most RNA-Seq Analysis jobs do not require more than 24 hours to complete (really most of them finish in less than 4 hours) using "HTC Job (64GB, 4 cores, 24 hours)". Aligning large exome data sets to a reference genome typically can be done using 8 cores in about 4 hours (even data sets with up to 100x coverage). Aligning whole genome data sets (especially those with high coverage) is best done with 16 cores, and will typically require something less than 24 hours (recent alignments of 100x whole genome data - nearly 1 billion reads - have been completed in 12 hours using 16 cores, and even larger data sets - 1.5 billion reads - completed in 24 hours using 16 cores). Note however that variant calling requires much more time than alignment (sometimes requiring almost twice as much time), but does not use as many cores. In our experience, variant calling for whole exome data sets typically takes on the order of 6 hours (using 4 cores), while variant calling for whole genome data sets takes more like 30 hours (using 4 cores). Minimizing the number of cores your jobs use, and the amount of time blocked off for your jobs is essential, as there are limited resources currently available to the CLC server.
If you think your job requires a grid preset that is not currently available, please send Dr. Fangping Mu an email: email@example.com
Occasionally (such as when you are running an import tool), you will also see a dialog box asking you where your data is located:
Your selection here will decide which folders can be searched for files in the subsequent steps of the tool. Import tools can be used to simultaneously convert data from FASTQ format (for example) to the CLCbio format and transfer the CLCbio format file to the server. We can assign each group (faculty) an import/export directory on mobydisk /mnt/mobydisk/groupshares/. Member of the group shared this import/export directory with read/write permissions. Please open a support ticket on the CRC mainpage if you do not find a folder matching your group.
Once you start a job running on HTC cluster, you will see the usual progress bars in the Process section of the Toolbox. When the job status is listed as "Running", you can close your Workbench software, and the job will continue running on the remote server. When you relaunch your workbench, it will again connect to the server (as long as you checked "Automatic login" above - otherwise you can manually log in again), and the status of your job will be updated.
Working directory and Import/export directory are assigned on /mnt/mobydisk. Note that this /mnt/mobydisk is not backed up, so you will need to be diligent and back up to your own personal drives.
At the moment, the CLCbio software does not provide fine control of data access at the individual user level. The access permissions are enforced at the group level. What this means is that if User_A and User_B are both within Group_Z, then both will have read/write access to data stored within the Group_Z directory.
Each group from the schools of health sciences is assigned a group quota of 2TB on mobydisk. If your group requires more disk space on mobydisk, please contact us.