Here's a nice command template for Pandoc which, in spite of its foibles, is still a very cool library.

It turns out to be quite simple to convert a docx to markdown. The following example is from the Pandoc demos site.
pandoc -s example30.docx -t markdown -o example35.md

However the generated markdown from the above command has a few issues.

The lines are only 80 characters long. I do not know why an 80-character line length is the default but I do not like it. This is fortunately quite easy to fix with the option โ€“no-wrap.

Links do not use the reference style. I prefer the reference style links because it makes the text less cluttered by moving the link it self to the bottom of the file. This is also easy to fix with the option โ€“reference-links.

With the two options added the command looks like this.

pandoc -s example30.docx --no-wrap --reference-links -t markdown -o example35.md

Now the generated markdown is very readable and close to what I would write myself. 

Okay, I might quibble, well, vehemently deny "quite simple" and "very readable" are accurate. I just tried it on a complicated outline that was making Word throw up, you know, a doc where you try to add another line to your outline and Word gives up, not matching up indent or numbering with the rest of the content. 

I wanted to move to Markdown so I could define precisely where each bullet level was and who belonged to it. It's painful that we're in 2021 and we haven't come to the realization that you have to use some sort of markup code for complex document authoring. No rich text editor will ever achieve true WY[intend]IWYG.

Pandoc results were... not great. Here's a snippet:

3.  Unchecked and Enabled when instructor has access for none of the selected questions (I-I/F) -- TC 697849
iii. 
iv. User perms none \<\<\< Impossible! User owns some in this situation.
    1.  
    2.  
    3.  
```{=html}
<!-- -->
```
e)  Transfer Ownership is visible -- TC 694132
f)  Remove is visible -- TC 694132

Yuck. It didn't even convert bulleted lists with a), b), c) to a list in Markdown at all. To be clear, I don't have empty 1., 2., 3. in the original. Those are hidden in Word's DOM somewhere. Ugh.

So now I'm in VIm setting things right.

But it's still a good command template:

pandoc -s in.docx --no-wrap --reference-links -t markdown -o out.md

Labels: , , ,