I have a cunning plan to promote some better (in my opinion) practices for research computing in TTU’s College of Engineering over the next year or two. This plan is partially derived from things I’ve already been trying to promote, and some other goals laid out by the Python Software Carpentry folks. I’d define success as remedying any or all of the following potential problems for a given researcher (student or faculty), depending on what type of work they do:
- By default, the majority of a researcher’s files are stored on a single drive (hard drive in the office, hard drive at home, or flash drive). A single hardware failure can mean the loss of days, weeks, months, or years of work.
- If a researcher makes a mistake on a particular file or set of files that can’t be easily undone, they may have considerable difficulty in restoring those files to their former state. Most people don’t have automated backups of any kind, and many just manually save copies of their work onto different folders periodically.
- Advisors don’t have automatic access to their students’ research files upon graduation. Rarely do they have easy access to them during the research period. Collaboration is often reduced to emailing drafts back and forth, or to shuffling materials around on removable media on an ad hoc basis. Simple supervision uses the same methods, but on an even more infrequent basis.
- Researchers don’t have a standard storage location for their research materials that is large enough to contain all relevant files, accessible from around the world, and secured against unauthorized access. GMail doesn’t count.
- Researchers don’t have a standard place to publish works in progress, completed papers, and anything else that would be of use to the larger research community (at TTU and/or elsewhere).
- Researchers don’t have a facility for others to comment on completed projects, and to collaborate on works in progress.
- Researchers with computational needs tend to focus on the types of problems they can solve with PCs in ITS labs, PCs on their desk, or PCs they can purchase on a project budget. These PCs are often underpowered, underutilized, and redundant purchases when you consider multiple projects. They also limit the scope and scale of problems that can be solved.
Why would anyone related to TTU’s Engineering research activities care?
- We shouldn’t limit our computational research unnecessarily. We should work to the limit of our available facilities, and use those facilities as effeciently as possible.
- Researchers shouldn’t have to worry about their storage media’s integrity. They should be able to trust that if they save files somewhere safe, that they’ll be there the next time they’re needed. They also shouldn’t have to always worry about keeping multiple copies organized.
- Especially for projects where there is more than one researcher, and also for projects of interest to a supervisor or advisor, the ability to automatically track code and other changes automatically would be great. Even on single-researcher projects, the ability to track all the details of changes means the ability to revert those changes as needed.
- Many techniques that are new to a particular research group may be established procedure for another. This could include image processing, Groebner bases, boundary element methods, LaTeX tips, etc. If the various research groups can see the works in progress of other groups, then they at least have an opportunity to comment and suggest alternative strategies. Similarly, if groups are in the habit of constantly publishing their daily successes and failures, then others at TTU or worldwide can avoid reinventing the wheel and/or offer suggestions on how to work around the particular problems.
My basic strategy is as follows (some of these have already been done to varying degrees):
- Define types of services that each of the above goals, and set up the specific hardware and software for those services.
- Identify faculty and student champions/testers for each of these types of services. Ideally, these researchers would be spread around all departments in Engineering.
- Tune service offerings to meet unexpected needs from the testing group.
- Branch out to other faculty and student researchers not in the initial testing group. The testing group would provide some credence to the tested methodologies from step 3, and could demonstrate how the business of conducting research had improved as a result.
So what does step 1 look like? The following table shows the various services already available in the CAE network, and how they help solve the problems given above:
Data Safety |
Data Recovery |
Student File Retention |
Universal File Access |
Publishing | Collaboration | High-Performance Pooled Computing |
Service |
---|---|---|---|---|---|---|---|
x | x | x | x | File Server | |||
x | x | x | x | x | Web Server | ||
x | x | x | x | x | x | Version Control Server | |
x | x | x | x | x | x | Software Configuration Management Server | |
x | x | x | x | x | Blog Server | ||
x | x | x | x | x | Mailing Lists Server | ||
x | Cluster Systems |
Next up, identifying my champions/testers in different departments, tuning the offerings to meet their unexpected needs, and starting on documentation and publicity materials for the rest of the college.