Wednesday, February 23, 2005

» We're lacking a silver bullet for documentation

Good thoughts about tools/ways to write documentation blogged by Bruce though I can't agree with Matt Raible on Word being a better tool. First off, I've done a lot with OpenOffice.org and it's superior to Word in terms of robustness and styling options. It can handle large documents pretty well actually. As an example, at work, our product documentation is an 800 page thick MS Word document and sometimes even Word is choking on it. It just pops up an unhelpful error message (well, what would you expect from a Microsoft product anyway) and dies. There goes the document. It happened several times and on various Word versions, on different PCs. And how did we resurrect that document ? Import in in OpenOffice.org, export it to MS Word format and open it again in MS Word .. and.. yep, there it works again. That's pretty amazing when you think of it. Nevertheless, I (as Bruce) also think that markup languages are a superior option, as they also give the possibility to use simple text editors, process them in various means (e.g. include JavaDocs), etc... Having a closed, proprietary, binary format (read: Microsoft Office) is really unsustainable, as you're closing all of the options above (and much more) right from the start. OpenOffice.org might actually be something in-between, but even though its XML format is very clean and might even be hand-written, it's not as typesetting-oriented as Docbook XML or LaTeX (see below for some OpenOffice.org markup). I also wrote some documents and whitepapers with Docbook XML (mostly using the excellent Emacs nxml mode by James Clark) and although it's very nice for most things, some of them really are a pain in the ... My gripes mostly go with tables and lists. Especially lists ! It's not like they're uncommon, especially in technical documentation, but Docbook XML has chosen a very annoying markup, where you have to go 3 levels deep to write a list element (itemized-list » list-item » para). You may call me here but I use a lot of bullet lists in technical documents and e.g. XHTML's markup for that is much quicker to use (ul » li). Sure, I could just use my very own pseudo-Docbook format and write an XSL stylesheet that transforms it to "correct" Docbook XML but then I wouldn't stick with the standard. Furthermore, Apache FOP is very much stagnant and far from perfect. When generating PDFs, FOP doesn't have LaTeX' rendering quality in terms of paragraph layout. But on the other hand Docbook XML is XML, and LaTeX is not. LaTeX would prove to be very cumbersome and complex to parse, so forget about writing transformation or other purpose processors on that format. With Docbook you can simply use XSL stylesheets. Another option might be to use passivetex (which I haven't done (yet?)) but having an all-Java toolchain (e.g. with Ant tasks hiding away the complexity of XSL+FOP) would be much more interesting... also, I can imagine that installing passivetex on Windows isn't as easy as "apt-get install passivetex". I don't really care much about Windows... actually I don't care about Microsoft and Windows at all, but most people still use that inferiour operating system, so it has to be taken into account. Also tried Vex but... hmm... it's nice but... not that helpful actually. At least not yet. About having a WYSIWYG frontend... it may increase productivity, but that really depends on its quality. It's true that the word completion feature of OpenOffice.org is really helping a lot, but on the other hand those Word-like frontends (which includes MS Word and OpenOffice.org Writer) are way to unstructered and permissive. Things like changing font attributes should be banned, to enforce the use of styles. They're all just typing machines when you don't use them properly. OpenOffice.org not being very light on memory and resources isn't that much of an issue IMHO.. I mean, Eclipse is pretty heavy as well. OpenOffice.org's file format is pretty nice, actually much cleaner than one would expect. I wrote a Java application at work that extracts content from (using JDOM) and generates very large OpenOffice.org Writer files (using XSLT). So, believe me, I really know what the format looks like ;)
<text:h style-name="Heading 1" text-level="1">This is a heading</text:h> <text:p style-name="Text body">This is a <text:span style-name="emphasized">really</text:span> stupid paragraph.</text:p>
Too bad it's not hierarchical like Docbook XML (paragraphs being children of section elements). It's more or less a flat format. The perfect solution ?
  • use an open, standardized, typesetting-oriented, XML-based format for storage (Docbook XML or maybe the Open Office XML Format)
  • play nice with version control (i.e. being a text format, which includes XML)
  • be well-suited for processing, transformation and generation (read: XML)
  • a WYSIWYG frontend that is explicitely restricted in its options to enforce the use of style catalogs, with maybe just bold, italic and underlined being permitted as inline styles (although... even using that is wrong: LaTeX has "emphasize" (em))
  • an all-Java, all-OpenSource toolchain
  • be able to output PDF and XHTML

6 Comments:

Anonymous Anonymous said...

You might want to try XMLmind Editor - I've found that very usable docbook editor, and it comes with free, standard edition.

I've had some plans to write mine master's thesis in docbook format - but I'm a bit afraid all the modifications needed in XSLT to match with university guidelines for thesis :/

20:42  
Anonymous Anonymous said...

Pls make amarok work in kde 3.4-rc1!

15:26  
Blogger Loki said...

Please send me a mail for such things, but don't post unrelated comments on my blog.
I'll support KDE 3.4 when it will be released as final, not as a release candidate.
You may rebuild the package yourself though, that's pretty easy with the source RPM (look at the rpmbuild manpage).

08:20  
Blogger Eric Barroca said...

This comment has been removed by a blog administrator.

15:18  
Blogger Eric Barroca said...

You might want to have a look to OOo2Dbk. It's a Python / XSLT based tool that transforms OpenOffice.org documents to semantic DocBook XML.
It allies then the power of OpenOffice.org for WYSIWYG authoring and the power of DocBook as a semantic XML format for storage, indexing and processing.
OOo2Dbk support a large subset of DocBook, tables (with borders and alignment), page orientation, glossary and bilbiography terms, metadatas, pictures, OLE object (copy-pasted from Calc, Impress or Draw), etc.

You can find it here : http://indesko.com/sites/en/downloads/ooo2dbk

15:25  
Anonymous Anonymous said...

Use LyX. See Document processing with LyX and SGML.

19:29  

Post a Comment

<< Home