by Kaj Kandler

Rick Jelliffe, from O’Reilley, writes today about “Comparing XML office document formats: using XML Metrics”.

He used a large document, the ODF 1.0 specification, (~735 pages) with tables and images and converted it into various formats for OpenDocument Format (ODF) using 2.0 and MS Office Open XML (MSOOXML) using MS Office 2007 beta. Then he used tools to measure the XML complexity with various metrics. This makes an interesting read for people who are interested in the debate of the two office document formats or are simply interested in the value of XML metrics.

Rick concludes:

The numbers seem to support the interpretation that beta MSOOX may be quite a bit less complex than ODF 1.1 at this stage, at least in the sense of using fixed structures more, and simpler in these sense of using fewer elements and attributes. ODF is flatter and has smaller filesize but seems to include more style headers than the MOOX does. The metrics indicate that the use of attributes may be significantly different between the two formats, for example for people looking at data conversion estimation. On the application level, Open Office loads the ODT file much faster than the Word 2007 beta loads the DOCX file.

A quick warning. Rick admittedly compares against a beta version of MS Office 2007. He states that “it seems possible that the Word 2007 beta saves a lot of information in bin64 encoded form that ODF exposes as attribute values.” and that this might be of temporary nature “while the thing [MS Office 2007 and the MSOOOXML] is under development.”

In any case a story I’ll follow up with.