Lossy data compression and denoising
wzip [ -c | -d | -dn | -hdn ] num sf
This manual page documents the wzip command.
wzip is a program that can be used for LOSSY data compression and denoising. It reads from STDIN and writes to STDOUT. In compression mode the input is a sequence of ascii floating-point values. num is the number of these data values. The output is a sequence of small integers, most of them zero in typical application. This is ready for effective compression with a standard loss-less compression program like gzip.
The program can also be used for denoising. In this case both input and output are sequences of ascii floating-point values.
The scale factor sf determines the strength of compression or denoising. A higher scale factor means heavier compression and stronger denoising. Four times the standard deviation of the noise content is a good start. Otherwise 5 percent of the overall signal amplitude might be used as a first estimation of a suitable scale factor.
If the noise content of the input data is strongly non-Gaussian-distributed, like Poisson noise. The input data should be transformed to approximate Gaussian-distributed noise. If the input values are Poisson-distributed, that means for example raw counts per channel in EDX or XPD, they can be transformed to approximate Gaussian-distributed noise by transformation of each data point with y:=2.0*sqrt(x+0.25109). Back transformation is done with y:=(x/2)^2. The summand 0.25109 compensates for the bias caused by the asymmetry of the Poisson-distribution.
Invoking the program without any options writes examples of the use of the program to STDERR.
There must be given exactly one option.
-c
Compression, reads num ascii floating-point values from STDIN and writes a sequence of integers with high redundancy to STDOUT.
-d
Decompression, reads from STDIN and writes a sequence of num ascii floating-point values to STDOUT. These are more or less similar to the original data.
-dn
Denoising, reads num ascii floating-point values from STDIN and writes a sequence of num ascii floating-point values to STDOUT. These are more or less similar to the original data.
-hdn
Denoising with hard thresholding instead of wavelet shrinkage. Single untouched noise peaks may be visible with this mode. On the other hand, there is much less impact on the signal slope.
Donoho, D.L.; Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage, technical report 425, Department of Statistics, Stanford University, Stanford, June 1993, ftp://playfair.stanford.edu/pub/donoho/ausws.ps.Z
Franzen, A.: Compression of process data with a wavelet method, steel res. 69 (1998), No. 1, pp. 28/30
Franzen, A.: Non-linear denoising with wavelet transformation, Z. Metallkd. 89 (1998), No. 4, pp. 297/302
This manual page was written by Andreas Franzen <[email protected]>, for the Debian GNU/Linux system (but may be used by others).
Copyright (C) 1997 Andreas Franzen, placed under the GNU General Public License, see the file copyright for details.