SHORTEN(1) USER COMMANDS SHORTEN(1)
NAME
shorten - fast compression for waveform files
SYNOPSIS
shorten [-hl] [-a #bytes] [-b #samples] [-c #channels] [-d
#bytes] [-m #blocks] [-n #dB] [-p #order] [-q #bits] [-r
#bits] [-t filetype] [-v #version] [waveform-file
[shortened-file]]
shorten -x [-hl] [ -a #bytes] [-d #bytes] [shortened-file
[waveform-file]]
DESCRIPTION
shorten reduces the size of waveform files (such as audio)
using Huffman coding of prediction residuals and optional
additional quantisation. In lossless mode the amount of
compression obtained depends on the nature of the waveform.
Those composing of low frequencies and low amplitudes give
the best compression, which may be 2:1 or better. Lossy
compression operates by specifying a minimum acceptable seg-
mental signal to noise ratio or a maximum bit rate. Lossy
compression operates by zeroing the lower order bits of the
waveform, so retaining waveform shape.
If both file names are specified then these are used as the
input and output files. The first file name can be replaced
by "-" to read from standard input and likewise the second
filename can be replaced by "-" to write to standard output.
Under UNIX, if only one file name is specified, then that
name is used for input and the output file name is generated
by adding the suffix ".shn" on compression and removing the
".shn" suffix on decompression. In these cases the input
file is removed on completion. The use of automatic file
name generation is not currently supported under DOS. If no
file names are specified, shorten reads from standard input
and writes to standard output. Whenever possible, the out-
put file inherits the permissions, owner, group, access and
modification times of the input file.
OPTIONS
-a align bytes
Specify the number of bytes to be copied verbatim
before compression begins. This option can be used to
preserve fixed length ASCII headers on waveform files,
and may be necessary if the header length is an odd
number of bytes.
-b block size
Specify the number of samples to be grouped into a
block for processing. Within a block the signal ele-
ments are expected to have the same spectral charac-
teristics. The default option works well for a large
range of audio files.
-c channels
Specify the number of independent interwoven channels.
For two signals, a(t) and b(t) the original data format
is assumed to be a(0),b(0),a(1),b(1)...
-d discard bytes
Specify the number of bytes to be discarded before
compression or decompression. This may be used to
delete header information from a file. Refer to the -a
option for storing the header information in the
compressed file.
-h Give a short message specifying usage options.
-l Prints the software license specifying the conditions
for the distribution and usage of this software.
-m blocks
Specify the number of past blocks to be used to esti-
mate the mean and power of the signal. The value of
zero disables this prediction and the mean is assumed
to lie in the middle of the range of the relevant data
type (i.e. at zero for signed quantities). The
default value is non-zero for format versions 2.0 and
above.
-n noise level
Specify the minimum acceptable segmental signal to
noise ratio in dB. The signal power is taken as the
variance of the samples in the current block. The
noise power is the quantisation noise incurred by cod-
ing the current block assuming that samples are unifor-
mally distributed over the quantisation interval. The
bit rate is dynamically changed to maintain the desired
signal to noise ratio. The default value represents
lossless coding.
-p prediction order
Specify the maximum order of the linear predictive
filter. The default value of zero disables the use of
linear prediction and a polynomial interpolation method
is used instead. The use of the linear predictive
filter generally results in a small improvement in
compression ratio at the expense of execution time.
This is the only option to use a significant amount of
floating point processing during compression.
Decompression still uses a minimal number of floating
point operations.
Decompression time is normally about twice that of the
default polynomial interpolation. For version 0 and 1,
compression time is linear in the specified maximum
order as all lower values are searched for the greatest
expected compression (the number of bits required to
transmit the prediction residual is monotonically
decreasing with prediction order, but transmitting each
filter coefficient requires about 7 bits). For ver-
sion 2 and above, the search is started at zero order
and terminated when the last two prediction orders give
a larger expected bit rate than the minimum found to
date. This is a reasonable strategy for many real
world signals - you may revert back to the exhaustive
algorithm by setting -v1 to check that this works for
your signal type.
-q quantisation level
Specify the number of low order bits in each sample
which can be discarded (set to zero). This is useful
if these bits carry no information, for example when
the signal is corrupted by noise.
-r bit rate
Specify the expected maximum number of bits per sample.
The upper bound on the bit rate is achieved by setting
the low order bits of the sample to zero, hence max-
imising the segmental signal to noise ratio.
-t file type
Gives the type of the sound sample file as one of
{ulaw,s8,u8,s16,u16,s16x,u16x,s16hl,u16hl,s16lh,u16lh}.
ulaw is the natural file type of ulaw encoded files
(such as the default sun .au files). All the other
types have initial s or u for signed or unsigned data,
followed by 8 or 16 as the number of bits per sample.
No further extension means the data is in the natural
byte order, a trailing x specifies byte swapped data,
hl explicitly states the byte order as high byte fol-
lowed by low byte and lh the converse. The default is
s16, meaning signed 16 bit integers in the natural byte
order.
Specific optimisations are applied to ulaw files. If
lossless compression is specified then a check is made
that the whole dynamic range is used (useful for files
recorded on a SparcStation with the volume set too
high). If lossy compression is specified then the
data is internally converted to linear. The lossy
option "-r4" has been observed to give little degrada-
tion.
-v version
Specify the binary format version number of compressed
files. Legal values are 0, 1 and 2, higher numbers
generally giving better compression. The current
release can write all format versions, although con-
tinuation of this support is not guaranteed. Support
for decompression of all earlier format versions is
guaranteed.
-x extract
Reconstruct the original file. All other command line
options except -a and -d are ignored.
METHODOLOGY
shorten works by blocking the signal, making a model of each
block in order to remove temporal redundancy, then Huffman
coding the quantised prediction residual.
Blocking
The signal is read in a block of about 128 or 256 samples,
and converted to integers with expected mean of zero.
Sample-wise-interleaved data is converted to separate chan-
nels, which are assumed independent.
Decorrelation
Four functions are computed, corresponding to the signal,
difference signal, second and third order differences. The
one with the lowest variance is coded. The variance is
measured by summing absolute values for speed and to avoid
overflow.
Compression
It is assumed the signal has the Laplacian probability den-
sity function of exp(-abs(x)). There is a computationally
efficient way of mapping this density to Huffman codes, The
code is in two parts, a run of zeros, a bounding one and a
fixed number of bits mantissa. The number of leading zeros
gives the offset from zero. Signed numbers are stored by
calling the function for unsigned numbers with the sign in
the lowest bit. Some examples for a 2 bit mantissa:
100 0
101 1
110 2
111 3
0100 4
0111 7
00100 8
0000100 16
This Huffman code was first used by Robert Rice, for more
details see the technical report CUED/F-INFENG/TR.156
included with the shorten distribution as files tr154.tex
and tr154.ps.
SEE ALSO
compress(1),pack(1).
DIAGNOSTICS
Exit status is normally 0. A warning is issued if the file
is not properly aligned, i.e. a whole number of records
could not be read at the end of the file.
BUGS
There are no known bugs. An easy way to test shorten for
your system is to use "make test", if this fails, for what-
ever reason, please report it.
No check is made for increasing file size, but valid
waveform files generally achieve some compression. Even
compressing a file of random bytes (which represents the
worst case waveform file) only results in a small increase
in the file length (about 6% for 8 bit data and 3% for 16
bit data).
There is no provision for different channels containing dif-
ferent data types. Normally, this is not a restriction, but
it does mean that if lossy coding is selected for the ulaw
type, then all channels use lossy coding.
It would be possible for all options to be channel specific
as in the -r option. I could do this if anyone has a
really good need for it.
See also the file Change.log and README.dos for what might
also be called bugs, past and present.
Please mail me immediately at the address below if you do
find a bug.
AVAILABILITY
The latest version can be obtained by anonymous FTP from
svr-ftp.eng.cam.ac.uk, in directory comp.speech/sources.
The UNIX version is called shorten-?.??.tar.Z and the DOS
version is called short???.zip (where ? represents a digit).
AUTHOR
Copyright (C) 1992-1994 by Tony Robinson (ajr4@cam.ac.uk)
Shorten is available for non-commercial use without fee.
See the LICENSE file for the formal copying and usage res-
trictions.