[Bruce Barnett introduces this topic in article 20.5. -JP]
In news posting <5932@tahoe.unr.edu> malc@equinox.unr.edu (Malcolm Carlock) asked how to make tar write a remote tape drive via rsh (1.33) and dd (35.6). Here's the answer:
%tar cf - . | rsh foo dd of=/dev/deviceobs=20b
Be forewarned that most incarnations of dd are extremely slow at handling this.
What is going on? This answer requires some background:
Tapes have "block sizes." Not all tapes, mind you - most SCSI tapes have a fixed block size that can, for the most part, be ignored. Nine-track tapes, however, typically record data in "records" separated by "gaps," and only whole records can be reread later.
In order to accommodate this, UNIX tape drivers generally translate each read( ) or write( ) system call into a single record transfer. The size of a written record is the number of bytes passed to write( ). (There may be some additional constraints, such as "the size must be even" or "the size must be no more than 32768 bytes." Note that phase-encoded (1600-bpi) blocks should be no longer than 10240 bytes, and GCR (6250-bpi) blocks should be no longer than 32768 bytes, to reduce the chance of an unrecoverable error.) Each read( ) call must ask for at least one whole record (many drivers get this wrong and silently drop trailing portions of a record that was longer than the byte count given to read( )); each read( ) returns the actual number of bytes in the record.
Network connections are generally "byte streams": the two host "peers" (above, the machine running tar, and the machine with the tape drive) will exchange data but will drop any "record boundary" notion at the protocol-interface level. If record boundaries are to be preserved, this must be done in a layer above the network protocol itself. (Not all network protocols are stream-oriented, not even flow-controlled, error-recovering protocols. Internet RDP and XNS SPP are two examples of reliable record-oriented protocols. Many of these, however, impose fairly small record sizes.)
rsh simply opens a stream protocol, and does no work to preserve "packet boundaries."
dd works in mysterious ways:
dd if=x of=y
is the same as:
dd if=x of=y ibs=512 obs=512
which means: open files x and y, then loop doing read(fd_x) with a byte count of 512, take whatever you got, copy it into an output buffer for file y, and each time that buffer reaches 512 bytes, do a single write(fd_y) with 512 bytes.
On the other hand:
dd if=x of=y bs=512
means something completely different: open files x and y, then loop doing read(fd_x) with a byte count of 512, take whatever you got, and do a single write(fd_y) with that count. All of this means that:
%tar cf - . | rsh otherhost dd of=/dev/device
will write 512-byte blocks (not what you wanted), while:
%tar cf - . | rsh otherhost dd of=/dev/devicebs=20b
will be even worse: it will take whatever it gets from stdin-which, being a TCP connection, will be arbitrarily lumpy depending on the underlying network parameters and the particular TCP implementation - and write essentially random-sized records. On purely "local" (Ethernet) connections, with typical implementations, you will wind up with 1024-byte blocks (a tar "block factor" of 2).
If a blocking factor of 2 is acceptable, and if cat forces 1024-byte blocks (both true in some cases), you can use:
%tar cf - . | rsh otherhost "cat >/dev/device"
but this depends on undocumented features in cat. In any case, on nine-track tapes, since each gap occupies approximately 0.7 inches of otherwise useful tape space, a block size of 1024 has ten times as many gaps as a block size of 10240, wasting 9x1600x0.7 = 10 kbytes of tape at 1600 bpi, or 32 times as many as a size of 32768, wasting 31x6250x0.7 = 136 kbytes of tape at 6250 bpi.
I say "approximately" because actual gap sizes vary. In particular, certain "streaming" drives (all too often called streaming because they do not - in some cases the controller is too "smart" to be able to keep up with the required data rate, even when fed back-to-back DMA requests) have been known to stretch the gaps to 0.9 inches.
In general, because of tape gaps, you should use the largest record size that permits error recovery. Note, however, that some olid [2] hardware (such as that found on certain AT&T 3B systems) puts a ridiculous upper limit (5K) on tape blocks.
[2] Go ahead, look it up... it is a perfectly good crossword puzzle word.
:-)
- in comp.unix.questions on Usenet, 3 April 1991