I was surprised to see that a plain word document compressed more a relatively equal sized OOXML file when zipping them. I'd expected OOXML to be, well, XML. Text usually compresses a lot more than proprietary stuff, even something as inefficient as .doc.

adding: OOXML.docx (deflated 22%)
adding: wordDoc.doc (deflated 89%)


Opening up the docx file showed that it wasn't anything like XML. Turns out there was a reason for that.

C#, docx - .NET C#:

Docx files are not xml. They are zip files that contain the xml files. To open them and manipulate their contents, the OP will want to look at the System.IO.Packaging namepsace from .NET 3.0 and above.


Great. Well, off to google "OOXML to XML" converters.