If you are a first-time user, you can read through this guide from the beginning. However, if you already have your own workflow, you might want to jump to the tips section to check if there are some more little things you can do to make your workflow more efficient.
SSH client: handy tool makes a handyman
First thing first, choose a suitable SSH client. Of course, you can use terminal/power-shell to ssh into the cluster, but some GUI might make things easier. If you can access the github student pack and don’t like to tweak things by yourself, try Termius; if you consider yourself on the hardcore side then maybe VS Code is your thing.
Termius
Termius has all SSH-related things prepared for you. From snippets to host connection settings, it is possibly the easiest way to get started. You can even use Termius to connect to the cluster from your mobile device.
However, to navigate through the remote file system easily, you need to be familiar with the command or install Ranger (see the Tips section).
It has an SFTP client, which is a good way to upload/download files. However, the transfer from third-party to third-party seems to route through the local machine, which reduces the speed.
Get the most out of it by accessing your GitHub student dev pack: https://education.github.com/pack
VS Code
Since VS Code has a built-in ssh client, you can use it to ssh into the cluster. You can install all kinds of extensions to make it more convenient. From auto-formatter to copilot to code-lens, you can find a lot of useful extensions.
https://code.visualstudio.com/docs/remote/ssh
Explorer on the sidebar provides a good way to navigate through the remote file systems and preview files. It might be helpful for organizing or demonstrating your file structures.
Troubleshooting for losing connection due to server-side cache:
https://earlruby.org/2021/06/fixing-vscode-when-it-keeps-dropping-ssh-connections/
How to correctly interact with the cluster
Basic concepts
As you SSH into the head/login node, you are sharing the node with all the other users. You should avoid using the login node for anything computationally intensive. All the real computation should be done in the cluster.
Any operation costs some computing power and some require network bandwidth. It is the user’s responsibility to make sure the things were executed in a reasonable way. For example, if you are transferring huge files, you should try to use the xfer
node. If you are extracting a huge amount of compressed archive files, consider making it a job and submit it to the batch
node.
If you need to do something computationally intensive in an interactive way, you can use qlogin
cmd to get a node with more computing power. Use something like srun --pty -p inter_p --mem=16G --nodes=1 --ntasks-per-node=16 --time=12:00:00 --job-name=qlogin /bin/bash -l
to get a node with specification. (you can actually directly ssh to a certain job node by using ssh <job_node_name>
but it is not a good thing to do if someone else is using that node)
Proposed Workflow
Here I propose a normal researcher workflow based on Dr. Casey Bergman’s workflow.
(Slide from Dr. Casey Bergman)
The above workflow is by Dr. Casey Bergman. We produce functional scripts on local machines taking advantage of powerful GUI. Relying on Github to not only synchronize but also backup all the code. Of course, we can modify the code while logging on to the clusters using pure text editors like vim or emac but not everyone can do it efficiently.
To further expand this workflow as a student, we have to think about data management and what can aid the learning process and the whole community as well.
(1) In UGA, the files on the /scratch/ volume would be purged after 90 days if not being touched. So we have to cautiously back up files using the ‘scp’ command to the lab’s own storage place or following backup policies. However, how to correctly back up data, or generally speaking data management, is a much bigger topic that is out of the scope of this little guide.
(2) The second point is basically Dr. Casey Bergman’s workflow: utilizing version control tools such as GitHub to keep track of everything and synchronize.
(3) The third point is about personal or lab-wise knowledge base management. Installing and using all kinds of bioinformatics software requires a lot of trial and error. It’s always good to make some notes about troubleshooting and computational resource requirements. Sometimes installing software on the cluster is tricky and people all have the same issue. The computing resource requirement record would also be very helpful when trying to request just enough amount of resources.
(use SEFF command to check how much computing resource the program actually used to optimize the parameter next time)
If the information is not confidential then make it on the internet for everyone to access. If the note contains unpublished information, maybe record it using some sharable note-taking software for lab-wise access. Both website hosting and confidential notes can both be achieved by GitHub.
Miscellaneous things to keep in mind
Avoid having a lot of files under a single directory, things would get messy
In addition, it is a best practice to set up a virtual environment for each software you are trying to install. conda
is one of the best packages and environment managers.
Tips and Tricks: Icing on the cake
Set up snippets
Have you ever felt tired of typing repetitive commands on the keyboard and just want to make some clicks on the mouse? Set up those frequently used commands as snippets. Snippets are just some small piece of codes or commands that you can reuse in different scenarios. You can set up snippets in terminal emulators like iterms or termius, or even just have your own word file as a boilerplate to save time. Here are some personal recommendations of snippets to set up:
Job submit system-specific:
resource lookup: sinfo | grep "idle”
, python ~/NodeStat/node_stat.py -q highmem_p
(check next tip!)
current job tracking: squeue --me
qlogin with specifications: srun --pty -p inter_p --mem=16G --nodes=1 --ntasks-per-node=16 --time=12:00:00 --job-name=qlogin /bin/bash -l
File system navigation: go to some frequently visited directories
Developing utility: you can switch between virtual environments, or manage your repo quickly with snippets
(Snippets page on my termius)
(Snippets on iTerms)
Use NodeStat to share the road on HPC system
This is a script tailor-made for sapelo2 resource checking. I also made it one of my top used snippets. During peak hours there could be no available batch nodes to use, but there might be some idle high-mem nodes. Check the resource out before submitting can make life easier.
(Slide from Dr. Casey Bergman)
It is a simple python script. Just clone the repo to get the script and then either add it to $PATH or set up an alias for easy access.
https://github.com/pbasting/nodestat
Use Ranger to navigate through file systems
New users usually get lost in the non-graphical user interface on the cluster. Ranger is here to solve the problem. Not only can you navigate through file systems with arrow keys, with the built-in file launcher you can preview the file content quickly. On top of that, I found this very useful during meetings when demonstrating the file structure to other people. There are a lot more functions Ranger providing so take your time to explore it gradually.
https://github.com/ranger/ranger
Make the shell with your style by Oh-my-bash
I ran out of words. I just want to see some color in the terminal. Depending on your system, Oh-my-bash or Oh-my-zsh should usually get a satisfied customized result.