The supercomputer can be very tricky to use but is very helpful with these projects. We have all our files on there and that is where we base our projects. There is a channel of introduction videos to the supercomputer which are helpful but not so much before you use it. I would try a few things before doing it.
Submitting jobs is a useful way of running programs on the supercomputer. When a program takes too long to run (45 minutes or an hour), the supercomputer kills it. Those not familiar with machine learning may not have used many programs that take that long to run. However, training a machine learning module can take far, far longer than that (The way many of our programs work is we just train them for an arbitrarily long time, and have it stop when it’s forced to stop). So generally we submit jobs to the supercomputer when we are training a module.
Job scripts are special shell scripts that you use to submit jobs. There is a script generator on BYU’s website that is helpful. We generally make jobs scripts by copying old ones and editing as needed.
Generally you should run jobs from your home directory on the supercomputer. This keeps the shared folders from getting cluttered. You can put other things in shared directories as long as you keep it organized, if you think it will be helpful.
Anaconda is a super helpful package manager. With it you can make environments where you can put Python modules without having to worry about other dependencies you’ve installed. Dependencies can be super hard to figure out, especially on the supercomputer. This is because it’s kind of a dinosaur, and there’s lots of things downloaded onto it already that can interfere with the dependencies we want to use.
Previously we all downloaded our own anaconda installation. This is a hassle but not impossible (it took me maybe an hour to figure it out). However, Dr. Clement has recently downloaded Anaconda to one of the shared directories, so anyone should be able to use it.
One of the main perks to Dr. Clement downloading Anaconda to a shared location is using TensorFlow. This can be a beast to get to work, but Dr. Clement’s installation in the shared Anaconda environments seems to work. We don’t know as yet what individual users might need to do to get it to work, other than simply activating the environment.
Up to compute, the following set of directories is generally the filepath that contains most of what we do. You will follow this to get to most things.
One important thing (that is not terribly difficult) is getting to know the filesystem. This is easy to do on your own, but it’s important to know a few things once you start looking around.
Everyone has a home directory. This is fairly straightforward, only you can access it (admins as well, but don’t worry about that). You should run jobs from there (not a rule, just generally what we’ve done, see above).
fslgroups
From anyone’s home directory, one can access the fslgroups folder. This is where all the lab group directories are, but anyone can only access whichever group they belong to.
fslghandwriting
This is the main directory for the Handwriting lab group.
In this directory, we have limited memory capacity. However, the exception is in its subdirectory called compute. Here we have much, much more memory available. So, don’t store excess things in here. Put it somewhere in compute.
compute
Each lab group has a special directory called Compute. (Each user has their own, which can be accessed from their home directory. But don’t worry about that). This is where we mostly work.
After this point, there are a large number of subdirectories with important things in them. Here I will just list a few of the most important ones:
software
Among other things, this includes the anaconda installation. There is a script in here which, when run, allows a user to use the shared anaconda environments.
death
This is where Stanley’s system for the MaskRCNN death records recognition are kept. For more information, please visit: Ohio Death Records Documentation
generalized_hwr
This is where the handwriting recognition modules are kept. This has a pretty intricate system, which when fully operational can output the contents of the census. We still need to train recognition modules to recognize most of the fields, but that is more of a future project. Within this is the file “batch_run.py”, which runs all the recognition. There is a config folder where the settings for batch_run are found.