4622SC-photo1_0.jpeg

IIAI Development Lab - 4622 SC

With tremendous support from Engineering Computing Services (ECS), a new facility for AI method development and preparation for "production-level" use on ARGON GPU cluster queues has been built and is being configured in 4622 SC.

Policies

At this time, policies are pretty simple - and we will see how we need to develop them.

  • Please feel free to use the resource as much as you can to work on problems requiring GPU usage

  • Please, do not use the resource to work on computational problems that do not require GPU usage

  • This entire resource is currently work in progress - therefore, please share your experience, shortcomings of the environment, and usage hints and observations with all of us using this wiki page

  • The plan is to open the access to others around Oct 15, and hopefully have the environment tested, stable, and with you, our test-users mastering its use

  • Soon after, we will start testing the ease of access from this resource to the GPU queues on ARGON = i.e., how to program in this 4622 SC environment in a way that makes porting programs on ARGON as seamless as possible ... and we will develop a sort of cookbook to help others do that. The plan is to use the Singularity environment - but that one has not been installed in 4622 SC yet. You will be the first to know once Singularity is available. Singularity v3.7 is installed and available as a loadable "module" ( module load singularity ).

  • 4622 SC is for development and initial small-size testing. Do NOT use this environment for "production-level" work ... use the ARGON GPU queues for that.

  • The room is access controlled. Please contact Kim Glynn to be added to the access list.

Support Information

For now, please email Kim Glynn with general support inquiries, we will identify somebody in ECS as we move on PHONE NUMBER: (319) 384-1024 EMAIL: kimberly-a-glynn@uiowa.edu

4622 SC Environment:

Hardware:

  • 8 x HP Z2 series systems:
  • 32 GB of RAM
  • 1 TB of NVME storage
  • 8 x machines have one RTX A4500 video/gpu card

Software:

  • Base OS:
    • Ubuntu 20.04 LTS
  • Added tools (available via "module"):
    • docker 1.13
    • miniconda3 (conda 4.10)
    • spyder3
    • jupyter 4.3.0
    • Singularity 3.7
  • Deep learning software currently installed:
    • NVidia CUDA v11.5
    • CuDNN libraries v8
    • NCCL v2.3.4
    • Tensorflow-gpu v1.8.0 with python3 interface

IIAI Development Lab - S207 - Tippie school of business

Things to do before using IIAI lab:

  • Step1: Make sure that you login to the GitLab account with your University of Iowa HawkID. Once you logged send us an email requesting access to GIT project "IIAI / DeepLearningLab4622SC".  (email: amudireddy@uiowa.edu)
  • Step2: Please register for an account on the IIAI(The Iowa Initiative for Artificial Intelligence) website. Here is the link to register.. 
  • Step3: Make yourself comfortable with basic Linux commands following this User guide. Please note that you would not be able to access this guide unless you have a GitLab account. 
  • Step4: To successfully build and deploy various solutions(especially Deep learning/AI based solutions), the research community uses a command-line based virtualization tool named "Singularity". In simple words, it is a framework to build and operate, a mini operating system inside a host operating system. The instructions on how to use the tool can be found here. Get familiarity with this software.

Hardware:

  • 4 x HP series systems: (16 core-hyper threaded Intel(R) Core(TM) i7-9800X @ 3.30GHz). (4 machines)
    • each machine with 32 GB of RAM
    • GeForce RTX 2080 Ti Graphics Card (11GB ram) video/gpu card
    • 1 TB of NVME storage ( /localscratch ephemeral storage similar to argon cluster)
    • 1 TB SSD storage (system)

Software:

  • Base OS: Ubuntu 18.04 LTS
  • Deep learning software currently installed version: NVidia CUDA v10.2
  • Environment (lmod) manager similar to Argon cluster.

Lab usage instruction Manual:

Step 1:

Connect to University VPN through Cisco AnyConnect Secure Mobility Client. Here is the link for the tutorial on how to download, install and configure the VPN.

https://its.uiowa.edu/support/article/1876

Step 2:

Connect to AI lab using FastX2. Here is the link for the tutorial on how to download, install and configure the FastX2.

https://etc.engineering.uiowa.edu/help-desk/how-use/installing-fastx2

Under the connection manager section of the page, the example configuration has “login.engineering.uiowa.edu”. However, you have to provide any of the following Host addresses of the 4 machines in AI lab to connect to them. Name can be of your own choice. And do not change Sci section. 

  • pbb-s220-dl000.biz.uiowa.edu
  • pbb-s220-dl001.biz.uiowa.edu
  • pbb-s220-dl002.biz.uiowa.edu
  • pbb-s220-dl003.biz.uiowa.edu

Once you configured the machine, choose Ubuntu to launch the session.

Step 3:

If you are not aware of how to operate a Ubuntu/Linux machine, follow this tutorial. 

https://research-git.uiowa.edu/iiai/DeepLearningLab4622SC/blob/master/User_Manual_IIAI_Lab/IIAI_Manual.md

(Make sure that you login to the GitLab account with your University of Iowa HawkID. Once you are registered to it, you can email David Funk for permissions for the above tutorial.)

However, if you are comfortable in operating a Linux machines, skip this step.

Step 4:

There are 3 storages available to you. 

  1. Your local user session on the machine that you log in.
  2. /localscratch/User/ is the second location. This is a shared location for all the users in that machine. Make sure that you create a folder with your HawkID. We highly recommend using this storage of all the three storages because it is very fast. 
  3. /nfsscratch/User/ is the third location. This is a shared location for all the users of all machines in the network. Make sure that you create a folder with your HawkID. This is slow but data stored in this location is available across all the machines in that particular shared network. 

Please note that there is an expiry date for data stored in the localscratch or nfsscratch. (Usually it is 30 days). If you do not use that folder for more than the expiry date, it will be purged. 

Step 5:

It is mandatory to load specific modules in order to start working with machines.

Environment Modules:

The Environment Modules package is a tool that simplifies a shell initialization and lets users easily modify their environment. Using such a module you can manipulate your environment to gain access to new software or different versions of the same package. Here are some module commands you may find useful:

  • module avail 
    List the available modules. Note that if there are multiple versions of a single package that one will be denoted as (default). If you load the module without a version number, you will get this default version.
  • module load MODULE 
    Load the named module
  • module unload MODULE 
    Unload the named module
  • module list 
    List all the available modules along with a short description.

(Sometimes it is mandatory to load modules in every new command line terminal you open.)

Step 6:

We recommend you to use Singularity containers instead of using base Ubuntu system. 

Singularity is a container platform. It allows you to create and run containers that package up pieces of software in a way that is portable and reproducible. You can build a container using Singularity on your laptop, and then run it on many of the largest HPC clusters in the world, local university or company clusters, a single server, in the cloud, or on a workstation down the hall. Your container is a single file, and you don’t have to worry about how to install all the software you need on each different operating system and system.

Here is the link to Singularity Introduction:

https://sylabs.io/guides/3.5/user-guide/introduction.html

Here is the link to Sample shell commands:

https://sylabs.io/guides/3.5/user-guide/quick_start.html

Here is the link to full documentation:

https://sylabs.io/guides/3.5/user-guide/index.html

We have made 3 singularity containers for you. Untar them in your localscratch/Users/Your_Hawk_ID folder and use them. 

The containers are located at:

/opt/nfsopt/singularity-images/

Step 7:

The command to check for an existing GPU job or current GPU utilization is:

nvidia-smi

The command to check CPU/RAM utilization is:

Top

Note: You can also connect to these machines through SSH.