Tuesday, January 17, 2017

Example d'outils open source pour convertir de nombreux fichiers word (docx avec un template) en HTML: part of Macmillan’s Bookmaker toolchain

GitHub  macmillanpublishers/WordXML-to-HTML

word template of macmillanpublishers To others formats

XSL to convert MS Word-generated XML to HTML

The wordtohtml.xsl transforms are a key part of Macmillan’s Bookmaker toolchain. These core transforms convert Word XML to HTML that conforms to the HTMLBook spec, and are built-on by a handful of other ruby and XSL transforms to create an HTML file that plugs into the larger Macmillan workflow. Specifically, these XSL transforms are part of the bookmaker_htmlmaker process - you can read about the entire HTML transformation set here.

For well-formed HTMLBook, the wordtohtml.xsl transforms require Word documents to use Macmillan’s Microsoft Word template--a set of predefined paragraph and character styles that add semantic tagging to the different pieces of a manuscript. You can read about the template here. wordtohtml.xsl is built to look for specific Word style names, and apply HTMLBook elements accordingly--this means that in order to get predictable HTMLBook, Word documents must use the Macmillan tag set correctly. You can read about some of the specific markup requirements here.

Welcome to the Bookmaker toolchain! Bookmaker comprises a series of scripts that turn a Word document into an HTML document, and then into a PDF and/or EPUB file.

Each script in the Bookmaker sequence performs a distinct set of actions that builds on the scripts that came before, and depends on any number of other scripts or tools. While most of these scripts were originally written for internal use at Macmillan, we've done our best to hone them down to a cross-platform, generic core that can be used out of the box (though there are still a number of dependencies, discussed further down). The scripts all live here, in the core directory.

It's important to note that correct transformation depends on correct application of the Macmillan Word template, a set of styles and rules for Microsoft Word manuscripts that create the initial structure each manuscript needs in order to cleanly transform into valid HTMLBook HTML.

No comments:

Post a Comment