Today’s MKI cluster consists of over 1,000 CPU cores with 2-4GB of memory per core. A petabyte of storage space is available on six storage servers. The cluster has seen several major updates to its operating system and workload scheduler. We are contemplating another major upgrade, so the following description is likely to change.

Sections:
Logging in from Linux or Mac
Logging in from Windows
Displaying Graphics from the Cluster
Running Cluster Jobs
Transferring Files

How to Access the MKI cluster

Once you have a cluster account, you need to know how to access the cluster from your desktop or laptop (referred to as your “local” machine).  Access is restricted to applications that use SSH protocols.

Logging in from Linux or Mac

Linux and OSX (Mac) operating systems are usually distributed with an SSH client included and you’ll

    • Open a terminal session on your local computer
    • SSH to a cluster node node, e.g., antares.mit.edu

Before you can log in, you will need to create a pair of ssh keys, one private and one public, unless you already have them. To create ssh keys, type the following:

    • ssh-keygen -t rsa

You’ll be promoted to specify where the key files are to be written, and to supply a “pass-phrase”, one or more words that will become your password into the cluster. Do not forget this pass-phrase: if you do, you won’t be able to use the keys and you’ll have to create and register a new pair! The destination can be any “private” directory in your local computer, i.e., one that only you can read. The default is the “.ssh” sub-directory in your home directory, or “~/.ssh” for short. If you don’t have one, here’s how to create it:

    • mkdir ~/.ssh ; chmod 700 ~/.ssh

Once you have created the key files, email the public key file – the one whose name ends “.pub” – to Paul Hsi <paul@space.mit.edu>. Once he has installed it in the cluster,  you’ll be ready to log into the login node, e.g.,

    • ssh <username>@antares.mit.edu

If your local user name is the same as the one you use on the cluster, you can just type “ssh antares.mit.edu”. Also if your ssh key is not in “~/.ssh” on your local computer, you’ll need to specify its location, i.e.:

    • ssh <username> -i <ssh_key_file> <username>@antares.mit.edu

Logging in from Windows

  • Point your browser at the PuTTY download page
  • Download and install the PuTTY and PuTTYgen applications
  • Run PuTTYgen to create SSH key pairs – public and private
  • Select “SSH-2 RSA” and click the “Generate” button
  • Select a “pass-phrase” when prompted. Remember it.
  • Click save private key and remember the location you saved private key.
  • Highlight the public key near the top of PuTTYgen window, copy and paste it to a text editor and save as your public key text file at the same location you save your private key.
  • Paste the public key in an email and send it to Paul Hsi <paul@space.mit.edu>
  • Alternatively, attach your public key text file in the email.
  • Close your PuTTYgen application.

Once you receive confirmation that your public key has been installed on the cluster, start PuTTY and observe the startup window:

  • In the HostName window, enter your username@host where host is a cluster login node
  • In the Port window, enter “22”
  • Select the Connection type as SSH
  • Choose a name and enter it in the “Saved Sessions” window
  • For “Close window on exit”, choose “Only on clean exit”
  • Click the “+” sign in front of “SSH” in the “Category” window
  • The Category list will expand. Click the “Auth” item
  • PuTTY will display its “SSH Authentication” window
  • Click the “Browse…” button and locate and select your private key file
  • Click the “Open” button and you will be logged into the cluster

PuTTY will save this information in its “Saved Sessions” list, so the next time you log in, just select that session name and click the “Load” button to its right.

Logging in through a Graphical Interface

Most Linux systems come with X11 already installed and you use it to create your text windows. Some Mac systems don’t and you’ll have to install an X11 application. We recommend XQuartz which is available from https://www.xquartz.org/releases/. At the present time (summer 2020), the latest stable version is 2.7.11 which you can download from https://dl.bintray.com/xquartz/downloads/XQuartz-2.7.11.dmg. Once loaded, double-click the icon and follow the instructions.

With X11 is installed on your local computer, you must also add an -X or -Y flag to your login commands before you’ll be able to display graphics from the cluster, e.g.,

    • ssh -X <username>@antares.mit.edu

Both -X and -Y tell ssh to expect to receive instructions to display graphical windows on your home screen. The choice of flag depends on how your local ssh is configured. If you get the following error while running a graphics command, e.g., xterm, from the cluster:

    • Unable to access the X Display, is $DISPLAY set properly?

try logging in with the other flag and, if this doesn’t work, check that the following line is present in your local /etc/ssh/sshd_config:

    • XAuthLocation /opt/X11/bin/xauth

If it is missing, either add it (but you’ll need root permission to edit this file), or add it to your local ~/.ssh/config. In the latter case, that file should also contain the following:

    • Host *
    • ForwardX11 yes

If there is no “~/.ssh/config”, create it with a text editor and insert those three lines. Your graphics applications should now display from the cluster.

Running cluster jobs

The MKI cluster uses the SLURM job scheduler and all jobs that run on the cluster’s compute nodes must be submitted through this scheduler. If a job is not scheduled via SLURM’s sbatch or srun commands, it will run on the login node and is liable to be killed without warning since it will be interfering with other users.

1. Batch jobs

These are submitted through the sbatch command, an example of which is provided below. Create a command file containing special “#SBATCH” instructions for SLURM, followed by ordinary UNIX “shell” commands that start the job. This example will run myProgram using 2 compute nodes with a total of 24 CPUs and 45MB RAM for no longer than 35 minutes:

  • #!/bin/sh
    #SBATCH -N 2
    #SBATCH -n 24
    #SBATCH -p <partition name>
    #SBATCH -J <job name>
    #SBATCH -t 0-00:35:00
    #SBATCH –mem=45M
    #SBATCH –error=job.%J.err –output=job.%J.out
    echo Start Job at `/bin/date`
    echo The Master Node of this job is `hostname`
    myProgram

where:

  • N is number of nodes
  • n is the number of compute cores
  • p is the partition name
  • J is the job name
  • t is the time limit allowed (day-hour:min:sec)
  • myProgram is the name of your executable program

More information about sbatch and its “#SBATCH” directives is available from https://slurm.schedmd.com/sbatch.html

2. Heterogeneous  jobs

To find out whether your job can be run by sbatch or requires the added refinements of srun, consult https://slurm.schedmd.com/heterogeneous_jobs.html. Jobs started by sbatch leave the choice of assigning tasks to cores and nodes to SLURM. The srun command allows the user more control in assignments, if some parts of the job require more resources than others. srun can be executed from the command line, as in the following example:

  • srun -N2 -n24 -t=0:01:00 —p=<name> -l <hostname> <myProgram>

The command line options are equivalent to those in #SBATCH directives. Alternatively, multiple srun commands can be included in a single sbatch command file. More information about srun and its many options and environment variables is available from https://slurm.schedmd.com/srun.html.

3. Some other useful SLURM commands

– Show which partitions are available to you

    • sinfo

– Display the status of your jobs

    • squeue -u <username>

– Display the status of a specific job

    • scontrol show job <jobid>

– Display the current cluster usage

    • sview &

If you are configured to receive X11 graphics (see above), here’s what you’ll see:

If you are familiar with other schedulers but not with SLURM, here is a comparison of some popular schedulers to help you with your transition. The SLURM Workload Manger is fully documented in https://slurm.schedmd.com/documentation.html, and the 2-page summary of sbatch and srun options should help.

File Transfer

Your data can be stored remotely on the clusters in a high-performance filesystem, but to copy the files to and from your local computer, you must run a program that uses SSH protocols, e.g., scp or sftp on Linux or MaxOS systems, WinSCP on Windows, or Cyberduck on MacOS.

Transferring to/from Windows

In Windows, there is no built-in client for SSH transfer.  We recommend you use WinSCP, which offers an easy graphical interface for copying files to and from your local desktop.

  • First, download WinSCP and install it on your desktop
  • Use it to access one of the login nodes on the clusters, e.g., antares.mit.edu
  • Note: you may need to specify a port number of 22 for secure access
  • Within WinSCP, you can browse the local and cluster directories, and copy files between them.

Transferring to/from Linux, Solaris, etc.

To copy test.txt to your home directory on the cluster, type the following command on your local machine:

    • scp test.txt <username>@antares.mit.edu:

To copy test.txt in the reverse direction:

    • scp <username>@antares.mit.edu:test.txt .

Please refer to the scp man page for more options.

Transferring to/from MacOSX

The instructions above for scp will also work from a window of the terminal app on MacOSX systems. An alternative is to use a file transfer application such as  Cyberduck.