I Killed the Mail Server Today


It all started so simply: I was going to set up a little Xen instance to be my next cluster submit host, and needed a spare address for it:

  1. I started setting up an instance for ch208i.cae.tntech.edu, since it was no longer on the Xen host like it was several months ago. Crap, the reason it’s no longer on the Xen instance is because I moved it to its own dedicated hardware — it’s still my main ftp/mirror server. Ctrl-C that one.
  2. Hmm, what’s available from old Xen instances? mail2.cae.tntech.edu.cfg from when I was testing out a new mail server setup last fall — doesn’t ping, doesn’t show up in xm list, no problem.
    xen-create-image --hostname=mail2.cae.tntech.edu --ip= \
        --gateway= --netmask= --size=10Gb --memory=256Mb \
        --swap=1Gb --debootstrap --force

    A few minutes later, my instance is debootstrapped and ready to go.

  3. Oh, crap. Why am I getting an error on xm create that says my LVM is already in use on a domU somewhere?
  4. Further crap. Looking in /etc/xen/mail.cae.tntech.edu.cfg for the production mail server, it apparently uses the old mail2.cae.tntech.edu LVMs. Wonderful. ssh mail? It works since sshd was already memory-resident, but /root/.profile doesn’t exist. And neither does much of anything else.
  5. Great. I’ve just killed the mail server. Off to the Amanda server to do a quick restore of its data. What? I never put mail.cae.tntech.edu into the backup list? Not normally the end of the world, since the mail stores are held accessed over NFS from the main file server, but what about my dovecot and postfix configurations?
  6. Oh, well. Time to see how good my puppet manifests are for the mail server.

Not too bad, as it turns out. Total downtime was only a couple hours, including having to redo the postfix and dovecot configurations (which were then copied off to the puppetmaster). I still have a few more things to fix, but mail delivery is up, and imap is running. TLS support for my sending mail from home isn’t up yet, but it’ll be fixed shortly.

I still need to fix that submit host, though. Next time, I think I’ll use an IP address reserved for my office.

Update: after getting a partial TLS/SASL setup going late Wednesday night, I went to sleep without realizing I’d killed mail delivery again. Finally got it straightened out Thursday morning.

Better Inventory Management Through Python

One good thing about the switch from the legacy administrative computing system to the new Banner setup is now I can more easily get an unformatted table of my department’s equipment inventory instead of a printed report. This matters to me for a few reasons:

  • The default inventory reports always come out sorted by inventory tag. I have no idea why. If you have anything more than a handful of items to inventory, you would sort them by room, just like you’d identify them when walking around. Since I have somewhere around 100 items to inventory across 6-7 buildings, it’s a pain and rather error-prone to manually mark the ones in a particular building.
  • Even when I got a report sorted by location, that doesn’t help a great deal with items that I only see during inventory time, or during a random audit. My memory just isn’t that good, and occasionally an item will move from one room to another without my knowledge. Photos of the items would help greatly in finding them, but the default reports don’t include any.

So this year, within a few hours of first asking for it, I got a nice CSV file of our inventory from the Business Office (thanks, Matt). It included lines like

74396,"Fork Lift, Electric",SU 803647,F30,CATERPILLAR,5/1/2001,CH106,13900,Center for Manufacturing Research,6700

that I could work with. I had text files from previous years on the old system, but they were much weirder to parse. This opened right up in Excel without issue, so I had hope for a much cleaner implementation of my old Python inventory scripts.

Man, I love Python. There’s a built-in CSV module that parses lines into lists, and separates values on each line into separate items in those lists. No parsing required on my part. There’s a nice third-party module for generating HTML markup (original site is down, but see this link from archive.org for the module). And it’s a mature enough language to where a years-old Usenet post on sorting is still useful.

On a procedural rather than technical front, I started going around a couple of years ago and snapping pictures of items as I inventoried them. The old camera-phone pictures aren’t great, but they’re good enough to identify an item and find its property tag. So now I had a table of inventory data, and a big folder of JPGs named according to the item’s property tag. So now to convert them into useful web pages:
Continue reading “Better Inventory Management Through Python”

Transforming Research Computing Practices in TTU’s College of Engineering

I have a cunning plan to promote some better (in my opinion) practices for research computing in TTU’s College of Engineering over the next year or two. This plan is partially derived from things I’ve already been trying to promote, and some other goals laid out by the Python Software Carpentry folks. I’d define success as remedying any or all of the following potential problems for a given researcher (student or faculty), depending on what type of work they do:

  1. By default, the majority of a researcher’s files are stored on a single drive (hard drive in the office, hard drive at home, or flash drive). A single hardware failure can mean the loss of days, weeks, months, or years of work.
  2. If a researcher makes a mistake on a particular file or set of files that can’t be easily undone, they may have considerable difficulty in restoring those files to their former state. Most people don’t have automated backups of any kind, and many just manually save copies of their work onto different folders periodically.
  3. Advisors don’t have automatic access to their students’ research files upon graduation. Rarely do they have easy access to them during the research period. Collaboration is often reduced to emailing drafts back and forth, or to shuffling materials around on removable media on an ad hoc basis. Simple supervision uses the same methods, but on an even more infrequent basis.
  4. Researchers don’t have a standard storage location for their research materials that is large enough to contain all relevant files, accessible from around the world, and secured against unauthorized access. GMail doesn’t count.
  5. Researchers don’t have a standard place to publish works in progress, completed papers, and anything else that would be of use to the larger research community (at TTU and/or elsewhere).
  6. Researchers don’t have a facility for others to comment on completed projects, and to collaborate on works in progress.
  7. Researchers with computational needs tend to focus on the types of problems they can solve with PCs in ITS labs, PCs on their desk, or PCs they can purchase on a project budget. These PCs are often underpowered, underutilized, and redundant purchases when you consider multiple projects. They also limit the scope and scale of problems that can be solved.

Why would anyone related to TTU’s Engineering research activities care?

  1. We shouldn’t limit our computational research unnecessarily. We should work to the limit of our available facilities, and use those facilities as effeciently as possible.
  2. Researchers shouldn’t have to worry about their storage media’s integrity. They should be able to trust that if they save files somewhere safe, that they’ll be there the next time they’re needed. They also shouldn’t have to always worry about keeping multiple copies organized.
  3. Especially for projects where there is more than one researcher, and also for projects of interest to a supervisor or advisor, the ability to automatically track code and other changes automatically would be great. Even on single-researcher projects, the ability to track all the details of changes means the ability to revert those changes as needed.
  4. Many techniques that are new to a particular research group may be established procedure for another. This could include image processing, Groebner bases, boundary element methods, LaTeX tips, etc. If the various research groups can see the works in progress of other groups, then they at least have an opportunity to comment and suggest alternative strategies. Similarly, if groups are in the habit of constantly publishing their daily successes and failures, then others at TTU or worldwide can avoid reinventing the wheel and/or offer suggestions on how to work around the particular problems.

My basic strategy is as follows (some of these have already been done to varying degrees): Continue reading “Transforming Research Computing Practices in TTU’s College of Engineering”

Why Should I Fill Out a Contact Form to Download FlexLM Utilities?

So some years ago, Macrovision buys FlexLM, and a few months ago, spins it off into Acresso Software. And now I’m ready to start using Cacti to monitor FlexLM license usage, but I need Linux versions of lmutil and related utilities. I could just pull them from a package that uses FlexLM, but I don’t want to worry about whether or not I got the absolute most recent version (newer lmutils will talk to older servers just fine, but older ones might not talk to newer servers).

Off we go to Acresso’s downloads page. Fill out a form including my email address, phone, name, and other information. Eventually, I get redirected to this Acresso page, where the lmutil downloads actually are held. I grumble a bit because I’ve filled out the form on something other than the server where I need lmutil, so odds are, they’ll prevent me from doing the download since I won’t have some kind of cookie from the original form.

Nope. No security there. Copy the link to the 32-bit Linux lmutil, paste it into a terminal on the server, and go. So from now on, I plan to just return to their target page to do the FlexLM downloads.

I really don’t know what they’re trying to accomplish here. 99% of the people looking for lmutil or anything else on that download page aren’t potential Acresso customers, they’re just looking for newer releases of programs already provided by other software vendors.

But regardless, I’ve got my graphs running now:
Matlab License Usage

Some Days, I Just Hate Solaris

Back in 2000, when some of us in engineering were talking about how best to improve our facilities for high-performance and research computing for our graduate students, we came to a few conclusions:

  • Software was more important than hardware.
  • Some software ran only under Windows, some had no Windows version at all.
  • Of the non-Windows software people cared about, there was always a version for Solaris. There was often a version for most other Unixes, but regardless of the company, they always had a Solaris version.
  • Sun’s matching grant program for education was awesome.

And to be fair, for some operations, our Sun Blade 1000 workstations blow the doors off of our Dell Precision Workstations with 3x the clock cycles. We’ve had very little hardware trouble from the Suns, and the aforementioned matching grant program and judicious use of third-party upgrade vendors let us buy two decked out Ultra 80 workstations on a budget that was originally allocated for one decked out workstation and one considerably lower-specced one.

But there’s little to no excuse for the following:

  • patchadd rewrites every byte of /var/sadm/install/contents every time you do a file operation. During jumpstarts, I manage to put that file in a tmpfs for faster access, but before that, I couldn’t do a single Solaris-only Jumpstart install in less than half a day.
  • Solaris 10 includes Samba. Solaris 10’s Samba includes winbind, which is what I use on my Debian systems to convert Active Directory accounts to Unix ones. But the Solaris 10 winbind doesn’t include the idmap_rid backend for consistently converting an Active Directory RID into a Unix UID, which confuses NFS mightily. I thought blastwave’s or sunfreeware’s Samba packages might be better, but they weren’t. I found these instructions for configuring winbind and idmap_rid for Solaris, but they’re squirreled off in a manual for Sun Cluster Data Services. What reason might they have for not compiling in idmap_rid by default? Am I the only person who uses Active Directory to generate UIDs for a central NFS and Samba server?
  • Today, during an attempt to install and test Matlab 7.6, I found that X11 forwarding is broken on recently-patched Solaris systems like mine. A similar bug came up in 2005 and sat unfixed for a few months. The usual fix of telling sshd to only listen on IPv4 interfaces in sshd_config isn’t enough, though. You actually have to add the -4 argument to the sshd service file.

I hate throwing away tens of thousands of dollars of perfectly functional hardware. I could install Debian’s sparc port on them, but why? I’d lose access to Ansys, Matlab, and all the other packages that are the reason I have these systems in the first place. And letting them languish like they did for years before I got into the managed infrastructure business seems a waste. Solaris 10, puppet, and the newer firmware that allows PXE booting is such a vast improvement over earlier versions for what I need to do, but there’s still some distance to go before it’s up to Debian standards.

If we went through the same evaluation process in 2002, I’d probably not have any Solaris sytems at all. Matlab, Maple, Ansys, Abaqus, etc. were all coming out with (or had already come out with) Linux versions. We’d have spent a lot less on hardware, and some jobs just love the extra clock cycles available on an Intel CPU.

Simple Time Log for OpenOffice (and soon, Excel)

Last year, I threw together a mildly intelligent Excel spreadsheet that would help me add up release time entries for various projects, and posted it to an earlier version of this 43 Folders topic. Since they converted from their old forum software to Drupal, my spreadsheet attachment got lost. And since I couldn’t find it in my documents folder, I just went and reconstructed it.

Making Solaris Packages from Commercial Software

Creating a managed infrastructure can go pretty slowly when you’re beset with a combination of bare competence and a work schedule that’s overrun with non-infrastructural tasks. So yes, it’s been just under a year since I wrote up how to make Debian packages from commercial software. On to getting similar capabilities out of the Solaris systems.

The packages

I already use Blastwave and pkg-get to install third-party free software applications, so I figured it would be easiest to use the same tools on my packaging. So for a first example, I installed Maple 11.00 manually into /opt/maple/11 on a Solaris 10 system. Then I made a temporary working folder and build folder, made an opt folder there, and moved the maple folder from the regular opt to my build folder’s opt. I also made a usr/local/bin in my build folder, and made relative symlinks from the main Maple executables to their assumed homes in usr/local/bin. The abridged results from the temporary working folder looked like this:

# pwd
# ls -l
total 6
drwxr-xr-x   4 root     root         512 May 22 09:27 build
-rw-r--r--   1 root     root          41 May 21 17:59 copyright
-rw-r--r--   1 root     root           0 May 22 09:36 depend
-rw-r--r--   1 root     root         143 May 22 09:35 pkginfo
# cat copyright
Copyright MapleSoft, All Rights Reserved
# cat pkginfo
DESC=Interactive computer algebra system
# ls -al build/opt/maple/11
total 504
drwxrwxr-x  17 root     other        512 May 22 08:43 .
drwxrwxr-x   3 root     other        512 May 22 08:42 ..
drwxr-xr-x   2 root     other        512 May 22 08:42 afm
drwxr-xr-x   2 root     other        512 May 22 08:42 bin
drwxr-xr-x   3 root     other       2048 May 22 08:42 bin.SUN_SPARC_SOLARIS
drwxr-xr-x   9 root     other        512 May 22 08:42 data
drwxr-xr-x   2 root     other        512 May 22 08:42 etc
drwxr-xr-x   2 root     other       3072 May 22 08:42 examples
drwxr-xr-x   3 root     other        512 May 22 08:42 extern
-rw-r--r--   1 root     other     153861 May 21 14:12 Install.html
drwxr-xr-x   2 root     other       1536 May 22 08:42 java
drwxrwxr-x   7 root     other        512 May 22 08:42 jre.SUN_SPARC_SOLARIS
drwxr-xr-x   4 root     other       1536 May 22 08:43 lib
drwxr-xr-x   2 root     other        512 May 22 08:43 license
drwxr-xr-x   3 root     other        512 May 22 08:43 man
-rw-rw-r--   1 root     other      60064 May 21 14:15 Maple_11_InstallLog.log
-rw-r--r--   1 root     other      10285 May 21 14:12 readme.txt
drwxr-xr-x   6 root     other        512 May 22 08:43 samples
drwxr-xr-x   2 root     other        512 May 22 08:43 test
drwxr-xr-x   2 root     other        512 May 22 08:42 X11_defaults
# ls -al build/usr/local/bin
total 10
drwxr-xr-x   2 root     root         512 May 22 08:56 .
drwxr-xr-x   3 root     root         512 May 22 08:47 ..
lrwxrwxrwx   1 root     root          31 May 22 08:55 maple11 -> ../../../opt/maple/11/bin/maple
lrwxrwxrwx   1 root     root          30 May 22 08:56 mint11 -> ../../../opt/maple/11/bin/mint
lrwxrwxrwx   1 root     root          32 May 22 08:55 xmaple11 -> ../../../opt/maple/11/bin/xmaple

Now, given that folder structure, I could adapt Blastwave’s package creation instructions to create some workable Solaris packages:

# (echo "i pkginfo"; echo "i copyright" ; echo "i depend" ; cd build ; find . | pkgproto ) > prototype
# pkgmk -b / -a `uname -p`
# filename=maple11-11.00-SunOS`uname -r`-`uname -p`.pkg
# pkgtrans -s /var/spool/pkg /root/$filename MAPLmaple11
# cd /root
# gzip $filename

Once mkpkg is all done with its work, I have a valid maple11-11.00-SunOS5.10-sparc.pkg.gz Solaris package in my /root folder. After testing it with regular pkgadd, I’m ready to put it into a private pkg-get repository.

The pkg-get repository

Compared to a Debian repository, a pkg-get repository is pretty simple. From the top-level folder in the repository on the ftp server:

# find sparc -print

A pkg-get repository’s top-level folders are named by processor type, i.e., the results of uname -p. Each processor type folder contains folders for each OS release level (from uname -r). Each release level folder contains packages for that CPU and OS, plus a descriptions and a catalog file.

The catalog file is created with Phil Brown’s makecontents script. It could potentially handle creating the descriptions, file, too, but I guess he never needed them. But the pkg-get script I got from blastwave.org definitely wants a descriptions file, so I’ll need to create that, too.

The way I’m creating the descriptions file is with the following script (on a Debian ftp server, so there may be some GNU-isms or bash-isms in the following code):

for name in sparc i386; do
    if [ -d $name ]; then
        cd $name
        for version in 5*; do
            if [ -d $version ]; then
                cd $version
                for package in *.gz; do
                    name=`grep $package catalog | awk '{print $1}'`
                    echo -ne "$name - "
                    zcat $package | head | strings | grep DESC= | cut -d= -f2-
                done > descriptions
                cd ..
        cd ..

which leaves me with a catalog file containing (so far, since I’ve only made one package):

maple11 11.00 MAPLmaple11 maple11-11.00-SunOS5.10-sparc.pkg.gz

and a descriptions file containing:

maple11 - Interactive computer algebra system

And now I can install them on a second host that’s never seen Maple installed before with:

pkg-get -s ftp://host/path/to/repository/ -U ; pkg-get -s ftp://host/path/to/repository/ install maple11

and afterwards get:

# which maple11
# maple11
    |\\^/|     Maple 11 (SUN SPARC SOLARIS)
._|\\|   |/|_. Copyright (c) Maplesoft, a division of Waterloo Maple Inc. 2007
 \\  MAPLE  /  All rights reserved. Maple is a trademark of
 <____ ____>  Waterloo Maple Inc.
      |       Type ? for help.
> quit
bytes used=412112, alloc=393144, time=0.07

Giving a Presentation at the Tennessee Higher Education IT Symposium

I’m heading to the IT Symposium this morning to give a talk on creating a managed Unix infrastructure from scratch, somewhat of a summary of several things I’ve posted here over the last year or so. Thanks to the folks on #puppet who read over them and gave editing suggestions.

Update: So yesterday, I get an email regarding my presentation (well, the slides, at least). No reason to clutter up the main page with it though, so if you’re not happy with the slides and want to express your displeasure, read the rest after the jump and see if I’ve addressed your concerns already. Continue reading “Giving a Presentation at the Tennessee Higher Education IT Symposium”

Converting National Instruments LVM Timestamps to Excel (UPDATED: and Matlab)

A few days ago, I had a student looking into what would be required to periodically log some temperatures and pressures from a long-running furnace experiment, so that he doesn’t have to babysit it and come back every 30 minutes to record his data. We borrowed a National Instruments USB-6008 data acquisition device and downloaded NI SignalExpress LE to try some things out.

There wasn’t much of a problem with actually capturing the data, but the timestamps (in column 1) were odd-looking:

LabVIEW Measurement	
Writer_Version	0.92
Reader_Version	1
Separator	Tab
Multi_Headings	Yes
X_Columns	One
Time_Pref	Absolute
Date	2008/04/11
Time	16:24:24.354863
Channels	1	
Samples	100	
Date	2008/04/11	
Time	16:22:45.354864	
X_Dimension	Time	
X0	0.0000000000000000E+0	
Delta_X	1.000000	
X_Value	Voltage - Dev1_ai0	Comment
3.29079376535486410E+9	2.64872102540972910E+0
3.29079376635486410E+9	2.64872102540972910E+0
3.29079376735486410E+9	2.64872102540972910E+0
3.29079376835486410E+9	2.64872102540972910E+0
3.29079376935486410E+9	2.65280301912685700E+0
3.29079377035486410E+9	2.65280301912685700E+0

There may be a simple way to reformat them inside SignalExpress LE, but it wasn’t obvious enough when we looked. So here’s one solution.

The primary difference between National Instruments’ timestamp format and Excel’s is that NI counts a real-valued number of seconds since January 1, 1904, while Excel counts a real-valued number of days since January 1, 1900. So
the formula (A6/86400)+365*4+2 (convert seconds into days, add four years plus two leap days for 1900 and 1904) will convert NI’s timestamp value of 3290793765.35486 in cell A6 into an Excel equivalent 39549.8908, or April 11, 2008.

But that’s not quite enough. The test run we made was at 4:22 PM, and that’s nowhere near 89% of a full day. Sure enough, putting 39549.8908 into a time format gave us 9:22:45 PM, a full five hours ahead of local time. So there’s a timezone shift in there, too, and we’re currently on GMT-5. So strip off the fractional part of the Excel timestamp, offset it by 5/24, and convert it back into a time. The final resulting spreadsheet in tabular form:

5 Original LVM Time Original LVM Voltage Convert Secs to Days Offset by 4 Years (plus 2 leap days) Final Date Time (before timezone offset) Decimal part of Timestamp Offset for Timezone Final Time Final Date/Time
6 3290793765.35486 2.64872102540972 =A6/86400 =C6+365*4+2 =D6 =D6 =D6-ROUNDDOWN(D6,0) =G6+$C$3/24 =H6 =ROUNDDOWN(E6,0)+H6

where cell C3 contained a -5 for our timezone shift from GMT. Columns E, F, I, and J were all in date, time, or date/time format as needed. The result in cell J6 could be cut down to =(A6/86400+365*4+2)+$C$3/24 if you’re in a rush to just convert it.

There may be a math or other error that’s been compensated for somewhere in here, since I’m not 100% positive about the 2 leap days. But it does equal out down to the second.

Matlab Addendum:

Matlab has another time format altogether — it uses a real-valued number of days since the zeroth of January, year 0000, whatever that means:

>> datestr(0,0)

ans =

00-Jan-0000 00:00:00

>> datestr(1,0)

ans =

01-Jan-0000 00:00:00

>> datestr(1/86400,0)

ans =

00-Jan-0000 00:00:01

Since that goes back far enough to include the current Gregorian calendar, Julian calendar, and possibly even the pre-Julian Roman calendar standards, the conversion equation is a bit more obtuse (in particular, the 97 day offset was found entirely by trial and error):

grid on;

resulting in the correct plot for the brief experiment:
Time and Temperature Plot