XXCOPY

DATMAN TECHNICAL BULLETIN #022



From:    Kan Yabumoto           tech@datman.com
To:      DATMAN user
Subject: The Remaining Space in DATMAN Tape
Date:    1998-10-30
====================================================================

The remaining space on the DATMAN volume depends on a number of
factors.  If you are a user of the Freeware version, or a licensed
version (e.g., DATMAN-2GB) with a tape drive whose capacity is
much larger than the software-imposed limit, the answer is relatively
simple.

DATMAN-xGB keeps track of how many bytes it has written to the
current tape and subtract the number of bytes written from the
artificial limit (i.e., xGB, for example, 1 GB in the case of the
freeware) and determines the remaining space on the tape.  This
value is not compensated whether you enable or disable the hardware
data compression performed by the tape drive.  A common misconception
held by many people is that DATMAN's FORMAT command somehow sets the
available space on a tape when it is formatted.  This is not true.
As a matter of fact, the tape cartridge formatted by the freeware
version is nearly indistinguishable from the one formatted by
DATMAN-PRO.  Therefore, when you upgrade your DATMAN software with
with more capacity, you find more remaining space on the tape which
showed less or no room with earlier version of DATMAN.

On the other hand, when DATMAN reports the remaining space based on
how much space left on the tape rather than the software-imposed
limit (i.e., the actual remaining space is less than the software-
imposed limit), the answer becomes much murkier.  There are many
factors which influence the value reported by DATMAN.  To state
the conclusion first, the remaining space in only a rough estimate at
best.  DATMAN is simply unable to provide a very accurate figure
on the remaining space especially when the remaining space is relatively
large (25% or more of the tape capacity).

Many users ask us the following questions:

  "When I first format a tape, it shows more than 4 GB left.  But,
  after recording about 3 GB, DATMAN says no room left.  How come?"

In most cases, this is related to the way data compression is handled.
In recent years, the storage device manufacturers have been describing
the capacity of a storage device exactly twice as large as its true
(native) capacity.  DATMAN also follows this convention when your
tape drive operate in data compression mode.  The compression ratio
of 2.0 was once a realistic value for typical combinations of files.  
Today, typical compression ratio for many users are somewhere between
1.3 and 1.5.  This is because more and more files on the disk are
already highly compressed and cannot be compressed much further.
Therefore, when you actually record files on DATMAN tape, the drive
often consumes more space than originally estimated.  You may get
a surprise "out of space" condition.  You have to note that the
capacity reported by DATMAN is only an estimate and DATMAN cannot
guarantee that you can utilize the space shown when the tape is
freshly formatted.

The data compression plays a major part in this mystery.  But there
are other factors in the tape which complicates the matter.


Here is how DATMAN reports the remaining space:

 1.  From time to time, DATMAN sends a SCSI command to the tape
     drive asking how much free space left on the tape.

 2.  Based on the most recent value received from the drive and the
     number of bytes written to the tape since then, DATMAN computes
     the remaining space on the tape.

 3.  The computation involves several elements.  For example:
       a. the reserved space for the final catalog file to append.
       b. whether data compression is enabled or not
       c. N-group writing for redundant recording of frames.

 4.  Truncate the number to a clean multiple of 1,000 or 1,000,000
     bytes within the confine of the Win32 API between the system
     (KERNEL32.DLL) and the DATMAN File Engine (VxD).

Now, each of the four steps listed here adds uncertainty to the
remaining space.  The largest source of error comes from the drive
itself.  Unlike sector-based storage devices such as the floppy disk
and the hard disk, tape drives do not allocate space by sectors.
The tape drive simply records blocks (of software-chosen size) on
the media on the fly.  Most of the time, the recording is successful
and the space is efficiently used.  But, when the drive detects
a marginal spot on the tape, it rewrites the frame (a recording
unit in the DDS technology) until it succeeds --- typically, a DDS
drive does not give up retries until it fails 127 times.  Usually,
such an automatic rewrite feature is very transparent to the DATMAN
software.  This alone accounts for the uncertainty of the remaining
space on the tape.  There are other factors involved on how efficient
the recording can be.  The blocks of user-data sent by DATMAN are
first packaged into the so-called "groups" of approximately 126 KB,
the native data structure for the digital audio tape format by the
on-board firmware (after performing  hardware-assisted data
compression, if enabled).  When the rate of data coming to the
drive is very low, some of the groups recorded on the tape will
include padded bytes to form a group, resulting in a reduced storage
utilization.  All in all, the drive firmware determines the remaining
space using its best judgment.  According to Hewlett-Packard's
literature, their drives assume 5 frames per group (22 frames) will
require a rewrite.  That is, HP is so pessimistic that it assumes
23 percent of the recording would go bad when the remaining space
is estimated.  We find the other extreme in Sony's drives
which overestimate the remaining space to a degree when they send
the "low-remaining" space warning to DATMAN when the tape does not  
even have the guaranteed space (it is definitely a firmware bug).
In essence, the tape drive makes the first stage estimation.

Additional discrepancies arise when DATMAN tries to compute the
actual space consumption in between the polling of the remaining
space to the tape drive.  In theory, DATMAN should request the
remaining space immediately after every block is written to the
tape.  This is not practical due to the substantial performance
penalty of doing so (it breaks the streaming).  DATMAN samples
the remaining space only sparingly at first (with increased 
frequency as the remaining space nears zero).  If you perform
a large scale tape-write operation, you may observe the remaining
space sometimes going a zigzag motion (sometimes it even increase),
or a sudden decrease in a bigger increment than the file size
written.  The irregularity is due to the recalibration immediately
after the refresh action of the remaining space by DATMAN.

Why does DATMAN deliberately add even more inaccuracies by
truncating the value?  By the time DATMAN gets to report the value
for remaining space, the number to be reported is quite inaccurate.
A truncation error of few megabytes by then are literally negligible
in comparison to the overall precision in reporting the value.
Therefore, we feel it is a more convenient to provide a number
which is much easier to process visually.

Lastly, due to the limitation by some application programs (such
as COMMAND.COM) which cannot process some big numbers (larger
than an unsigned 32-bit quantity ---- 4 GB), DATMAN refrains
itself from reporting a number larger than 4 GB when the volume
has more than 4 GB.  Therefore, in a DOS box, you may observe 
remaining space only 4 GB while the DATMAN Command Center indicates
much more space on the tape.

By the way, DATMAN Command Center uses a funny kilobyte (kB) unit
which is exactly 1,000 bytes as opposed to the more common
1,024 bytes (KB) which is more commonly used in computer.  The funny
kilobyte (with lowercase k) is consistent with the metric convention
rather than computer's big K (uppercase).  We feel it is more
intuitive to most users even though this method of display is takes
slightly more computation.    

[ More Technical Bulletins ] [ DATMAN Table of Contents ]