Pylzma

How to use

In this document, some samples of the PyLZMA library will be given.

First, we need to import the module

    >>> import pylzma

The easiest usage is compression and decompression in one step

    >>> compressed = pylzma.compress('Hello world!')
    >>> pylzma.decompress(compressed)
    'Hello world!'

For compression, additional parameters can be specified

    >>> compressed = pylzma.compress('Hello world!', dictionary=10)
    >>> pylzma.decompress(compressed)
    'Hello world!'

Other available parameters are:

dictionary

Dictionary size (Range 0-28, Default: 23 (8MB))

The maximum value for dictionary size is 256 MB = 2^28 bytes. Dictionary size is calculated as DictionarySize = 2^N bytes. For decompressing file compressed by LZMA method with dictionary size D = 2^N you need about D bytes of memory (RAM).

fastBytes

Range 5-255, default 128

Usually big number gives a little bit better compression ratio and slower compression process.

literalContextBits

Range 0-8, default 3

Sometimes literalContextBits=4 gives gain for big files.

literalPosBits

Range 0-4, default 0

This switch is intended for periodical data when period is equal 2^N. For example, for 32-bit (4 bytes) periodical data you can use literalPosBits=2. Often it's better to set literalContextBits=0, if you change the literalPosBits switch.

posBits

Range 0-4, default 2

This switch is intended for periodical data when period is equal 2^N.

algorithm

Compression mode 0 = fast, 1 = normal, 2 = max (Default: 2)

The lower the number specified for algorithm, the faster compression will perform.

multithreading

Use multithreading if available? (Default yes)

Currently, multithreading is only available on Windows platforms.

eos

Should the End Of Stream marker be written? (Default yes)

You can save some bytes if the marker is omitted, but the total uncompressed size must be stored by the application and used when decompressing:

    >>> compressed1 = pylzma.compress('Hello world!', eos=1)
    >>> compressed2 = pylzma.compress('Hello world!', eos=0)
    >>> len(compressed1) > len(compressed2)
    True

    >>> pylzma.decompress(compressed2)
    Traceback (most recent call last):
    ...
    ValueError: data error during decompression

    >>> pylzma.decompress(compressed2, maxlength=12)
    'Hello world!'

If you don't know the total uncompressed size, you can use the compatibility decompression function from pylzma version 0.0.3. Be aware that this old method is slower than the new decompression function, so you should use pylzma.decompress whenever possible.

    >>> pylzma.decompress_compat(compressed2)
    'Hello world!'

If you need to compress larger amounts of data, you should use the streaming version of the library. If supports compressing any file-like objects:

    >>> from cStringIO import StringIO
    >>> fp = StringIO('Hello world!')
    >>> c_fp = pylzma.compressfile(fp, eos=1)
    >>> compressed = ''
    >>> while True:
    ...     tmp = c_fp.read(1)
    ...     if not tmp:
    ...         break
    ...     compressed += tmp
    ... 
    >>> pylzma.decompress(compressed)
    'Hello world!'

Using a similar technique, you can decompress large amounts of data without keeping everything in memory:

    >>> from cStringIO import StringIO
    >>> fp = StringIO(pylzma.compress('Hello world!'))
    >>> obj = pylzma.decompressobj()
    >>> plain = ''
    >>> while True:
    ...     tmp = fp.read(1)
    ...     if not tmp:
    ...         break
    ...     plain += obj.decompress(tmp)
    ... 
    >>> plain += obj.flush()
    >>> plain
    'Hello world!'

However this only works for streams that contain the End Of Stream marker. You must provide the size of the decompressed data if you don't include the EOS marker:

    >>> from cStringIO import StringIO
    >>> fp = StringIO(pylzma.compress('Hello world!', eos=0))
    >>> obj = pylzma.decompressobj(maxlength=13)
    >>> plain = ''
    >>> while True:
    ...     tmp = fp.read(1)
    ...     if not tmp: break
    ...     plain += obj.decompress(tmp)
    ... 
    >>> plain += obj.flush()
    Traceback (most recent call last):
    ...
    ValueError: data error during decompression

    >>> obj.reset(maxlength=12)
    >>> fp.seek(0)
    >>> plain = ''
    >>> while True:
    ...     tmp = fp.read(1)
    ...     if not tmp: break
    ...     plain += obj.decompress(tmp)
    ... 
    >>> plain += obj.flush()
    >>> plain
    'Hello world!'

Please note that the compressed data is not compatible to the lzma.exe command line utility! To get compatible data, you can use the following utility function:

    >>> import struct
    >>> from cStringIO import StringIO
    >>> 
    >>> def compress_compatible(data):
    ...     c = pylzma.compressfile(StringIO(data))
    ...     # LZMA header
    ...     result = c.read(5)
    ...     # size of uncompressed data
    ...     result += struct.pack('<Q', len(data))
    ...     # compressed data
    ...     return result + c.read()
    ... 
    >>>