Initial revision

This commit is contained in:
Michael Brown
2005-05-17 16:44:57 +00:00
parent 75a5374d79
commit 1097cf8685
164 changed files with 24592 additions and 0 deletions

View File

@@ -0,0 +1,23 @@
The compression code as implemented in "lzhuf.c" was taken from a BBS
program written by Joachim Schurig <jschurig@zedat.fu-berlin.de>. He
states that the code can be used freely for programs that are covered
by a "freeware" license. This probably includes both BSD style
licenses and the GPL.
The code in "loader.asm" is a reimplementation of the uncompressor. It
has been written from scratch and is hereby placed under the
conditions of the GNU General Public License (GPL). The algorithm is
outlined in "algorithm.doc".
Thus, there are no copyright problems with using this code, but there
still might be difficulties with software patents. These patents are
not legal in most parts of the world, but if you live in a country
that honors software patents then you should verify that using these
algorithms is legally permitted. Unless you are absolutely sure, that
there are no legal obstacles, you should use the code for educational
purposes only (this assumes that your educational institution is
exempted from patent laws). The author cannot be held responsible for
using the program code in violation of applicable local laws.
If you are aware of patents that might affect the legality of using
the code in some parts of the world, please let me know.

View File

@@ -0,0 +1,58 @@
The compressor achieves an average compression rate of 60% of the
original size which is on par with "gzip". It seems that you cannot do
much better for compressing compiled binaries. This means that the
break even point for using compressed images is reached, once the
uncompressed size approaches 1.5kB. We can stuff more than 12kB into
an 8kB EPROM and more than 25kB into an 16kB EPROM. As there is only
32kB of RAM for both the uncompressed image and its BSS area, this
means that 32kB EPROMs will hardly ever be required.
The compression algorithm uses a 4kB ring buffer for buffering the
uncompressed data. Before compression starts, the ring buffer is
filled with spaces (ASCII character 0x20). The algorithm tries to
find repeated input sequences of a maximum length of 60 bytes. All
256 different input bytes plus the 58 (60 minus a threshold of 2)
possible repeat lengths form a set of 314 symbols. These symbols are
adaptively Huffman encoded. The algorithm starts out with a Huffmann
tree that assigns equal code lengths to each of the 314 symbols
(slightly favoring the repeat symbols over symbols for regular input
characters), but it will be changed whenever the frequency of any of
the symbols changes. Frequency counts are kept in 16bit words until
the total number of compressed codes totals 2^15. Then, all frequency
counts will be halfed (rounding to the bigger number). For unrepeated
characters (symbols 0..255) the Huffman code is written to the output
stream. For repeated characters the Huffmann code, which denotes the
length of the repeated character sequence, is written out and then the
index in the ring buffer is computed. From this index, the algorithm
computes the offset relative to the current index into the ring
buffer. Thus, for typical input data, one would expect that short to
medium range offsets are more frequent than extremely short or medium
range to long range offsets. Thus the 12bit (for a 4kB buffer) offset
value is statically Huffman encoded using a precomputed Huffman tree
that favors those offset values that are deemed to be more
frequent. The Huffman encoded offset is written to the output data
stream, directly following the code that determines the length of
repeated characters.
This algorithm, as implemented in the C example code, looks very good
and its operating parameters are already well optimized. This also
explains why it achieves compression ratios comparable with
"gzip". Depending on the input data, it sometimes excells considerably
beyond what "gzip -9" does, but this phenomenon does not appear to be
typical. There are some flaws with the algorithm, such as the limited
buffer sizes, the adaptive Huffman tree which takes very long to
change, if the input characters experience a sudden change in
distribution, and the static Huffman tree for encoding offsets into
the buffer. The slow changes of the adaptive Huffman tree are
partially counteracted by artifically keeping a 16bit precision for
the frequency counts, but this does not come into play until 32kB of
compressed data is output, so it does not have any impact on our use
for "etherboot", because the BOOT Prom does not support uncompressed
data of more then 32kB (c.f. doc/spec.doc).
Nonetheless, these problems do not seem to affect compression of
compiled programs very much. Mixing object code with English text,
would not work too well though, and the algorithm should be reset in
between. Actually, we might gain a little improvement, if text and
data segments were compressed individually, but I have not
experimented with this option, yet.

View File

@@ -0,0 +1,14 @@
/* Do not change these values unless you really know what you are doing;
the pre-computed lookup tables rely on the buffer size being 4kB or
smaller. The buffer size must be a power of two. The lookahead size has
to fit into 6 bits. If you change any of these numbers, you will also
have to adjust the decompressor accordingly.
*/
#define BUFSZ 4096
#define LOOKAHEAD 60
#define THRESHOLD 2
#define NCHAR (256+LOOKAHEAD-THRESHOLD)
#define TABLESZ (NCHAR+NCHAR-1)
#define NIL ((unsigned short)-1)

764
contrib/compressor/lzhuf.c Normal file
View File

@@ -0,0 +1,764 @@
/*
----------------------------------------------------------------------------
M. LZHuf Compression
This is the LZHuf compression algorithm as used in DPBOX and F6FBB.
----------------------------------------------------------------------------
*/
/**************************************************************
lzhuf.c
written by Haruyasu Yoshizaki 11/20/1988
some minor changes 4/6/1989
comments translated by Haruhiko Okumura 4/7/1989
minor beautifications and adjustments for compiling under Linux
by Markus Gutschke <gutschk@math.uni-muenster.de>
1997-01-27
Modifications to allow use as a filter by Ken Yap <ken_yap@users.sourceforge.net>.
1997-07-01
Small mod to cope with running on big-endian machines
by Jim Hague <jim.hague@acm.org)
1998-02-06
Make compression statistics report shorter
by Ken Yap <ken_yap@users.sourceforge.net>.
2001-04-25
**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
#ifndef VERBOSE
#define Fprintf(x)
#define wterr 0
#else
#define Fprintf(x) fprintf x
#if defined(ENCODE) || defined(DECODE)
static char wterr[] = "Can't write.";
#ifdef ENCODE
static unsigned long int codesize = 0;
#endif
static unsigned long int printcount = 0;
#endif
#endif
#ifndef MAIN
extern
#endif
FILE *infile, *outfile;
#if defined(ENCODE) || defined(DECODE)
static unsigned long int textsize = 0;
static __inline__ void Error(char *message)
{
Fprintf((stderr, "\n%s\n", message));
exit(EXIT_FAILURE);
}
/* These will be a complete waste of time on a lo-endian */
/* system, but it only gets done once so WTF. */
static unsigned long i86ul_to_host(unsigned long ul)
{
unsigned long res = 0;
int i;
union
{
unsigned char c[4];
unsigned long ul;
} u;
u.ul = ul;
for (i = 3; i >= 0; i--)
res = (res << 8) + u.c[i];
return res;
}
static unsigned long host_to_i86ul(unsigned long ul)
{
int i;
union
{
unsigned char c[4];
unsigned long ul;
} u;
for (i = 0; i < 4; i++)
{
u.c[i] = ul & 0xff;
ul >>= 8;
}
return u.ul;
}
#endif
/********** LZSS compression **********/
#define N 4096 /* buffer size */
/* Attention: When using this file for f6fbb-type compressed data exchange,
set N to 2048 ! (DL8HBS) */
#define F 60 /* lookahead buffer size */
#define THRESHOLD 2
#define NIL N /* leaf of tree */
#if defined(ENCODE) || defined(DECODE)
static unsigned char
text_buf[N + F - 1];
#endif
#ifdef ENCODE
static int match_position, match_length,
lson[N + 1], rson[N + 257], dad[N + 1];
static void InitTree(void) /* initialize trees */
{
int i;
for (i = N + 1; i <= N + 256; i++)
rson[i] = NIL; /* root */
for (i = 0; i < N; i++)
dad[i] = NIL; /* node */
}
static void InsertNode(int r) /* insert to tree */
{
int i, p, cmp;
unsigned char *key;
unsigned c;
cmp = 1;
key = &text_buf[r];
p = N + 1 + key[0];
rson[r] = lson[r] = NIL;
match_length = 0;
for ( ; ; ) {
if (cmp >= 0) {
if (rson[p] != NIL)
p = rson[p];
else {
rson[p] = r;
dad[r] = p;
return;
}
} else {
if (lson[p] != NIL)
p = lson[p];
else {
lson[p] = r;
dad[r] = p;
return;
}
}
for (i = 1; i < F; i++)
if ((cmp = key[i] - text_buf[p + i]) != 0)
break;
if (i > THRESHOLD) {
if (i > match_length) {
match_position = ((r - p) & (N - 1)) - 1;
if ((match_length = i) >= F)
break;
}
if (i == match_length) {
if ((c = ((r - p) & (N - 1)) - 1) < match_position) {
match_position = c;
}
}
}
}
dad[r] = dad[p];
lson[r] = lson[p];
rson[r] = rson[p];
dad[lson[p]] = r;
dad[rson[p]] = r;
if (rson[dad[p]] == p)
rson[dad[p]] = r;
else
lson[dad[p]] = r;
dad[p] = NIL; /* remove p */
}
static void DeleteNode(int p) /* remove from tree */
{
int q;
if (dad[p] == NIL)
return; /* not registered */
if (rson[p] == NIL)
q = lson[p];
else
if (lson[p] == NIL)
q = rson[p];
else {
q = lson[p];
if (rson[q] != NIL) {
do {
q = rson[q];
} while (rson[q] != NIL);
rson[dad[q]] = lson[q];
dad[lson[q]] = dad[q];
lson[q] = lson[p];
dad[lson[p]] = q;
}
rson[q] = rson[p];
dad[rson[p]] = q;
}
dad[q] = dad[p];
if (rson[dad[p]] == p)
rson[dad[p]] = q;
else
lson[dad[p]] = q;
dad[p] = NIL;
}
#endif
/* Huffman coding */
#define N_CHAR (256 - THRESHOLD + F)
/* kinds of characters (character code = 0..N_CHAR-1) */
#define T (N_CHAR * 2 - 1) /* size of table */
#define R (T - 1) /* position of root */
#define MAX_FREQ 0x8000 /* updates tree when the */
/* root frequency comes to this value. */
typedef unsigned char uchar;
/* table for encoding and decoding the upper 6 bits of position */
/* for encoding */
#ifdef ENCODE
static uchar p_len[64] = {
0x03, 0x04, 0x04, 0x04, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x06, 0x06, 0x06, 0x06,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08
};
static uchar p_code[64] = {
0x00, 0x20, 0x30, 0x40, 0x50, 0x58, 0x60, 0x68,
0x70, 0x78, 0x80, 0x88, 0x90, 0x94, 0x98, 0x9C,
0xA0, 0xA4, 0xA8, 0xAC, 0xB0, 0xB4, 0xB8, 0xBC,
0xC0, 0xC2, 0xC4, 0xC6, 0xC8, 0xCA, 0xCC, 0xCE,
0xD0, 0xD2, 0xD4, 0xD6, 0xD8, 0xDA, 0xDC, 0xDE,
0xE0, 0xE2, 0xE4, 0xE6, 0xE8, 0xEA, 0xEC, 0xEE,
0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
};
#endif
#ifdef DECODE
/* for decoding */
static uchar d_code[256] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09,
0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A,
0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B,
0x0C, 0x0C, 0x0C, 0x0C, 0x0D, 0x0D, 0x0D, 0x0D,
0x0E, 0x0E, 0x0E, 0x0E, 0x0F, 0x0F, 0x0F, 0x0F,
0x10, 0x10, 0x10, 0x10, 0x11, 0x11, 0x11, 0x11,
0x12, 0x12, 0x12, 0x12, 0x13, 0x13, 0x13, 0x13,
0x14, 0x14, 0x14, 0x14, 0x15, 0x15, 0x15, 0x15,
0x16, 0x16, 0x16, 0x16, 0x17, 0x17, 0x17, 0x17,
0x18, 0x18, 0x19, 0x19, 0x1A, 0x1A, 0x1B, 0x1B,
0x1C, 0x1C, 0x1D, 0x1D, 0x1E, 0x1E, 0x1F, 0x1F,
0x20, 0x20, 0x21, 0x21, 0x22, 0x22, 0x23, 0x23,
0x24, 0x24, 0x25, 0x25, 0x26, 0x26, 0x27, 0x27,
0x28, 0x28, 0x29, 0x29, 0x2A, 0x2A, 0x2B, 0x2B,
0x2C, 0x2C, 0x2D, 0x2D, 0x2E, 0x2E, 0x2F, 0x2F,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
};
static uchar d_len[256] = {
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
};
#endif
#if defined(ENCODE) || defined(DECODE)
static unsigned freq[T + 1]; /* frequency table */
static int prnt[T + N_CHAR]; /* pointers to parent nodes, except for the */
/* elements [T..T + N_CHAR - 1] which are used to get */
/* the positions of leaves corresponding to the codes. */
static int son[T]; /* pointers to child nodes (son[], son[] + 1) */
#endif
#ifdef DECODE
static unsigned getbuf = 0;
static uchar getlen = 0;
static int GetBit(void) /* get one bit */
{
int i;
while (getlen <= 8) {
if ((i = getc(infile)) < 0) i = 0;
getbuf |= i << (8 - getlen);
getlen += 8;
}
i = getbuf;
getbuf <<= 1;
getlen--;
return ((signed short)i < 0);
}
static int GetByte(void) /* get one byte */
{
unsigned short i;
while (getlen <= 8) {
if ((signed short)(i = getc(infile)) < 0) i = 0;
getbuf |= i << (8 - getlen);
getlen += 8;
}
i = getbuf;
getbuf <<= 8;
getlen -= 8;
return i >> 8;
}
#endif
#ifdef ENCODE
static unsigned putbuf = 0;
static uchar putlen = 0;
static void Putcode(int l, unsigned c) /* output c bits of code */
{
putbuf |= c >> putlen;
if ((putlen += l) >= 8) {
if (putc(putbuf >> 8, outfile) == EOF) {
Error(wterr);
}
if ((putlen -= 8) >= 8) {
if (putc(putbuf, outfile) == EOF) {
Error(wterr);
}
#ifdef VERBOSE
codesize += 2;
#endif
putlen -= 8;
putbuf = c << (l - putlen);
} else {
putbuf <<= 8;
#ifdef VERBOSE
codesize++;
#endif
}
}
}
#endif
/* initialization of tree */
#if defined(ENCODE) || defined(DECODE)
static void StartHuff(void)
{
int i, j;
for (i = 0; i < N_CHAR; i++) {
freq[i] = 1;
son[i] = i + T;
prnt[i + T] = i;
}
i = 0; j = N_CHAR;
while (j <= R) {
freq[j] = freq[i] + freq[i + 1];
son[j] = i;
prnt[i] = prnt[i + 1] = j;
i += 2; j++;
}
freq[T] = 0xffff;
prnt[R] = 0;
}
/* reconstruction of tree */
static void reconst(void)
{
int i, j, k;
unsigned f, l;
/* collect leaf nodes in the first half of the table */
/* and replace the freq by (freq + 1) / 2. */
j = 0;
for (i = 0; i < T; i++) {
if (son[i] >= T) {
freq[j] = (freq[i] + 1) / 2;
son[j] = son[i];
j++;
}
}
/* begin constructing tree by connecting sons */
for (i = 0, j = N_CHAR; j < T; i += 2, j++) {
k = i + 1;
f = freq[j] = freq[i] + freq[k];
for (k = j - 1; f < freq[k]; k--);
k++;
l = (j - k) * 2;
memmove(&freq[k + 1], &freq[k], l);
freq[k] = f;
memmove(&son[k + 1], &son[k], l);
son[k] = i;
}
/* connect prnt */
for (i = 0; i < T; i++) {
if ((k = son[i]) >= T) {
prnt[k] = i;
} else {
prnt[k] = prnt[k + 1] = i;
}
}
}
/* increment frequency of given code by one, and update tree */
static void update(int c)
{
int i, j, k, l;
if (freq[R] == MAX_FREQ) {
reconst();
}
c = prnt[c + T];
do {
k = ++freq[c];
/* if the order is disturbed, exchange nodes */
if (k > freq[l = c + 1]) {
while (k > freq[++l]);
l--;
freq[c] = freq[l];
freq[l] = k;
i = son[c];
prnt[i] = l;
if (i < T) prnt[i + 1] = l;
j = son[l];
son[l] = i;
prnt[j] = c;
if (j < T) prnt[j + 1] = c;
son[c] = j;
c = l;
}
} while ((c = prnt[c]) != 0); /* repeat up to root */
}
#endif
#ifdef ENCODE
#if 0
static unsigned code, len;
#endif
static void EncodeChar(unsigned c)
{
unsigned i;
int j, k;
i = 0;
j = 0;
k = prnt[c + T];
/* travel from leaf to root */
do {
i >>= 1;
/* if node's address is odd-numbered, choose bigger brother node */
if (k & 1) i += 0x8000;
j++;
} while ((k = prnt[k]) != R);
Putcode(j, i);
#if 0
code = i;
len = j;
#endif
update(c);
}
static void EncodePosition(unsigned c)
{
unsigned i;
/* output upper 6 bits by table lookup */
i = c >> 6;
Putcode(p_len[i], (unsigned)p_code[i] << 8);
/* output lower 6 bits verbatim */
Putcode(6, (c & 0x3f) << 10);
}
static void EncodeEnd(void)
{
if (putlen) {
if (putc(putbuf >> 8, outfile) == EOF) {
Error(wterr);
}
#ifdef VERBOSE
codesize++;
#endif
}
}
#endif
#ifdef DECODE
static int DecodeChar(void)
{
unsigned c;
c = son[R];
/* travel from root to leaf, */
/* choosing the smaller child node (son[]) if the read bit is 0, */
/* the bigger (son[]+1} if 1 */
while (c < T) {
c += GetBit();
c = son[c];
}
c -= T;
update(c);
return c;
}
static int DecodePosition(void)
{
unsigned i, j, c;
/* recover upper 6 bits from table */
i = GetByte();
c = (unsigned)d_code[i] << 6;
j = d_len[i];
/* read lower 6 bits verbatim */
j -= 2;
while (j--) {
i = (i << 1) + GetBit();
}
return c | (i & 0x3f);
}
#endif
#ifdef ENCODE
/* compression */
void Encode(void) /* compression */
{
int i, c, len, r, s, last_match_length;
unsigned long tw;
fseek(infile, 0L, 2);
textsize = ftell(infile);
#ifdef VERBOSE
if ((signed long)textsize < 0)
Fprintf((stderr, "Errno: %d", errno));
#endif
tw = host_to_i86ul(textsize);
if (fwrite(&tw, sizeof tw, 1, outfile) < 1)
Error(wterr); /* output size of text */
if (textsize == 0)
return;
rewind(infile);
textsize = 0; /* rewind and re-read */
StartHuff();
InitTree();
s = 0;
r = N - F;
for (i = s; i < r; i++)
text_buf[i] = ' ';
for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
text_buf[r + len] = c;
textsize = len;
for (i = 1; i <= F; i++)
InsertNode(r - i);
InsertNode(r);
do {
if (match_length > len)
match_length = len;
if (match_length <= THRESHOLD) {
match_length = 1;
EncodeChar(text_buf[r]);
} else {
EncodeChar(255 - THRESHOLD + match_length);
EncodePosition(match_position);
}
last_match_length = match_length;
for (i = 0; i < last_match_length &&
(c = getc(infile)) != EOF; i++) {
DeleteNode(s);
text_buf[s] = c;
if (s < F - 1)
text_buf[s + N] = c;
s = (s + 1) & (N - 1);
r = (r + 1) & (N - 1);
InsertNode(r);
}
if ((textsize += i) > printcount) {
#if defined(VERBOSE) && defined(EXTRAVERBOSE)
Fprintf((stderr, "%12ld\r", textsize));
#endif
printcount += 1024;
}
while (i++ < last_match_length) {
DeleteNode(s);
s = (s + 1) & (N - 1);
r = (r + 1) & (N - 1);
if (--len) InsertNode(r);
}
} while (len > 0);
EncodeEnd();
#ifdef LONG_REPORT
Fprintf((stderr, "input size %ld bytes\n", codesize));
Fprintf((stderr, "output size %ld bytes\n", textsize));
Fprintf((stderr, "input/output %.3f\n", (double)codesize / textsize));
#else
Fprintf((stderr, "input/output = %ld/%ld = %.3f\n", codesize, textsize,
(double)codesize / textsize));
#endif
}
#endif
#ifdef DECODE
void Decode(void) /* recover */
{
int i, j, k, r, c;
unsigned long int count;
unsigned long tw;
if (fread(&tw, sizeof tw, 1, infile) < 1)
Error("Can't read"); /* read size of text */
textsize = i86ul_to_host(tw);
if (textsize == 0)
return;
StartHuff();
for (i = 0; i < N - F; i++)
text_buf[i] = ' ';
r = N - F;
for (count = 0; count < textsize; ) {
c = DecodeChar();
if (c < 256) {
if (putc(c, outfile) == EOF) {
Error(wterr);
}
text_buf[r++] = c;
r &= (N - 1);
count++;
} else {
i = (r - DecodePosition() - 1) & (N - 1);
j = c - 255 + THRESHOLD;
for (k = 0; k < j; k++) {
c = text_buf[(i + k) & (N - 1)];
if (putc(c, outfile) == EOF) {
Error(wterr);
}
text_buf[r++] = c;
r &= (N - 1);
count++;
}
}
if (count > printcount) {
#if defined(VERBOSE) && defined(EXTRAVERBOSE)
Fprintf((stderr, "%12ld\r", count));
#endif
printcount += 1024;
}
}
Fprintf((stderr, "%12ld\n", count));
}
#endif
#ifdef MAIN
int main(int argc, char *argv[])
{
char *s;
FILE *f;
int c;
if (argc == 2) {
outfile = stdout;
if ((f = tmpfile()) == NULL) {
perror("tmpfile");
return EXIT_FAILURE;
}
while ((c = getchar()) != EOF)
fputc(c, f);
rewind(infile = f);
}
else if (argc != 4) {
Fprintf((stderr, "'lzhuf e file1 file2' encodes file1 into file2.\n"
"'lzhuf d file2 file1' decodes file2 into file1.\n"));
return EXIT_FAILURE;
}
if (argc == 4) {
if ((s = argv[1], s[1] || strpbrk(s, "DEde") == NULL)
|| (s = argv[2], (infile = fopen(s, "rb")) == NULL)
|| (s = argv[3], (outfile = fopen(s, "wb")) == NULL)) {
Fprintf((stderr, "??? %s\n", s));
return EXIT_FAILURE;
}
}
if (toupper(*argv[1]) == 'E')
Encode();
else
Decode();
fclose(infile);
fclose(outfile);
return EXIT_SUCCESS;
}
#endif