XMill is a new tool for compressing XML data efficiently.
It is based on a regrouping strategy that leverages the effect of
highly-efficient compression techniques in compressors such as gzip
.
XMill groups XML text strings with respect to their meaning and exploits similarities between those text strings for compression.
Hence, XMill typically achieves much better compression rates than conventional compressors such as gzip
.
XML files are typically much larger than the same data represented in some reasonably efficient domain-specific data format. One of the most intriguing results of XMill is that the conversion of proprietary data formats into XML will in fact improve the compression - i.e. the the compressed XML file is (up to twice) smaller than the compressed original file! And this astonishing compression improvement is achieved at about the same compression speed.