IRAF Newsletter -- Number 14 -- April 1998

[ Previous ] [ Next ] [ Table of Contents ] [ Search this issue ]

Archiving Data with "Save The Bits"

The NOAO/IRAF "Save The Bits" archive (STB) has been in operation on Kitt Peak since July 1993, automatically archiving newly acquired KPNO and NSO nighttime optical and IR digital images from eight different telescopes. A duplicate STB archive was installed at CTIO in April, 1996, saving data from four additional NOAO telescopes. Over 1.5 million images have been archived to date, amounting to approximately 3.5 Terabytes of scientific imagery. This total is currently increasing at over one Terabyte/year.

This rather large rate of astronomical data production is likely to more than double with the addition of data from the BTC Mosaic camera (with four 2K x 2K CCDs) at CTIO last Fall, and from the NOAO Mosaic (with eight 4K x 2K CCDs) at KPNO in February 1998. Archiving the NOAO Mosaic images (large multi-extension FITS files at 135 MB each) required a port of STB to Solaris and is accomplished with dedicated Exabyte 8705 drives attached directly to the Sun UltraSparc used by the Mosaic Data Handling System. The smaller (but still impressive) BTC data stream was simply attached to the general CTIO STB installation.

Other observatories that use STB include UCO/Lick and the W. M. Keck Observatory, where STB has been in continuous service since February, 1995.

Each of the original NOAO STB installations consists of four Exabyte 8505 tape drives arranged as two pairs of duplicate tape copies. One copy of each tape is shipped to the central NOAO data center in Tucson, while the second copy is retained on site at each observatory. The data from all of the telescopes on each mountaintop are transported via the local Ethernet to a central archive computer where the images from the several telescopes and instruments that are in operation on any given night are interleaved onto the same media and are packed into large FITS image extension files to optimize efficient access for later retrieval.

STB uses the Berkeley Unix lpd print spooler to provide the underlying network queuing mechanism. Queuing via lpd has been used with robust results at NOAO for many years to simplify file transfers to other facilities such as photographic hardcopy units. In the simplest case, the data acquisition software for an astronomical instrument writes a FITS file to disk on a computer in the telescope dome. This FITS file is then simply passed to a standard network lpd queue, "bits", which passes the file to the queue on the central archive computer.

On the archive computer, the bits lpd queue passes each FITS file in turn to the actual STB archive daemon, "bitf". The bitf daemon is responsible for adding a serial numbered keyword to each FITS header and for translating simple FITS images into the FITS image extension format. A FITS checksum ( ) is calculated for each FITS extension to support later tape verification. After a sufficient volume of data (approximately 50 Mbytes) accumulates, the individual FITS extensions are assembled into a complete FITS multi-extension file on tape. The duplicate tape copies are written at the same time by interleaving writes from the same data buffer to separate tape drives.

After a pair of tapes are full, the first duplicate set of Exabyte drives are rewound and verified against the checksums, as well as against each other, while the second pair continues with taping duties. The capacity of an Exabyte 8505 format tape is conservatively rated at 4.12 Gbytes (this corresponds to a round number of 1.5 million FITS records of 2880 bytes each). When empty tapes have been newly mounted in both pairs of duplicate drives, and allowing for 2 Gbytes of queue spooling area on the archive computer, the total capacity of the online media is greater than 10 Gbytes. Note that if both pairs of tapes are allowed to fill up without swapping the media, and after the spool area fills up on the archive computer, that lpd provides its normal service of continuing to queue the data onto disk on the data acquisition computer in each dome. After the tapes are swapped (or after some network or computer outage), the network queue will start back up and drain to the central archive computer with no loss of data.

Accounting for standard Ethernet bandwidth (~ 80 Gbytes/day), for the speed of writing to an Exabyte (~ 20 Gbytes/day) and for the overhead of the robustly conservative data handling practices designed into STB, the total throughput of either the KPNO or CTIO archive comfortably exceeds 10 Gbytes/day. By substituting a faster network, faster tape drive and/or a faster computer, this throughput could be increased significantly.

A monitor program, "bitmon", provides privileged access to the archive for swapping and verifying tapes and for performing various utility chores such as stopping the archive should the computer require servicing. STB is robust against tape drive failures, and can easily be reconfigured to produce different numbers of tape copies and to support additional tape drives (or pairs or multiples) to provide additional between-swap capacity. Only a single tape drive is actually required for normal archive operations. Duplicate copies are highly recommended, though, as the best way to safeguard an observatory's investment in its data.

In addition to the actual taped data files, STB produces two other data products--a FITS header catalog and a simple text index that cross-references the catalog with the particular tape ID, file number and image number within the large multi-extension FITS tape files. The header catalog format is easy to ingest into various commercial relational databases. The separate index allows for future recasting of the catalog without the painful need to read all the tapes, or for recasting the data tapes (perhaps onto DVD, for instance) without having to also recast a catalog database in a wholesale fashion.

The index file entries also serve as input to a simple IRAF CL script, "readbits", that prompts the user through mounting appropriate archive tapes as data are retrieved from the archive. Any image in the archive can be retrieved in a maximum of about ten minutes due to the efficient tape motions permitted by the large FITS tape files. Since STB produces standard FITS data files, any astronomical software package that supports FITS can be used to read the archive tapes and an IRAF installation is not required to operate the archive. The STB software itself requires only about a Mbyte of diskspace and could be installed on a quite modest workstation.

STB is easily portable to other operating systems (NOAO uses SunOS, and now Solaris, Sparcs) and to other astronomical instruments and telescopes, including those that may produce other than FITS data and potentially, other than imaging data. STB has also been successfully used with DAT tape drives, and would be easy to adapt to additional media formats. Compatibility with the Unix System V print spooler, lpsched , was provided for in the initial design and has been implemented by sites outside of NOAO. Our recent Solaris port of STB uses the excellent port of Berkeley lpd to Solaris that is available from:

Current STB development efforts center on an update to support the WIYN Data Archive and Distribution System, which will use writable CD-R as the primary data distribution format for observers, as well as to construct an online random access data archive. We anticipate fielding this system at the WIYN telescope about the time this Newsletter is published.

This brief article has only touched on the many issues involved in operating an astronomical data archive 24 hours/day, 365 days/year. More information is available from , or from the author at .

Rob Seaman

[ Previous ] [Next ] [ Table of Contents ] [ Search this issue ]

IRAF Group, National Optical Astronomy Observatories, P.O. Box 26732, Tucson, AZ 85726, Phone: (520) 318-8160, FAX: (520) 318-8360, Email

Posted: 07May1998