Design notes
Output
Document structure:
- Part 1
- Part 2
- Chapter 1
- Chapter 2
- Section 1
- Section 2
Examples of parts: background, installation guide, user manual, tutorial, cookbook, API reference
- HTML:
- parts listed as tabs in navbar across top of page
- TOC in sidebar lists all chapters in the current part
Chapters:
- HTML:
- one page per chapter
- TOC in sidebar lists all sections in the current chapter (not in 2nd sidebar)
- PDF:
- start on new right-hand page
Sections:
- HTML: multiple sections on the same page
- PDF: no page break between sections
HTML might also have an in-page TOC at top level (root) of each part, and maybe also at the top of each chapter. In PDF, these would be per-part (in-page) TOC and optional per-chapter TOC — in addition to outline displayed in sidebar of PDF viewer. This is optional since it is redundant with the sidebar TOC / PDF outline; but might be useful if it includes/requires prose to orient the reader.
Index rather than search box for finding relevant section by topic. (Site-wide or whole-book free text search is less useful than ^F on the index, pointing to sections explicitly tagged by the author.) Generated text (e.g. API docs) could have auto-generated index entries. Separate index for each part; formatted as a separate page/chapter.
Input
Book author explicitly specifies the document tree structure. Conceptually, all source files are collated into a single document. (In practice, might want to allow source files to be processed individually to allow caching? But resolving cross-references, collecting index/glossary/biblio entries, etc. would be simpler if parsing all content from scratch. And no stale cache bugs.) The alternative is to let the output organisation reflect the organisation of source files. But: doesn’t work for linear formats (PDF, EPUB); requires in-page navigation to show the reader the canonical reading order; disallows features based on the reading order, e.g. next/previous links in HTML. Author’s toctree (or equivalent) directive should not be rendered as an in-page TOC: TOC in sidebar or at top of part should be enough. And the author should not be able to request a TOC at arbitrary locations in the document (for consistency, reduced reader surprise, fewer decisions for author to make).
Author must supply document-level metadata (author, date, etc.)
Input format must supported these features: file inclusion, in-source metadata, index entries. Candidates: rST, LaTeX, DocBook
Processing
Phases (conceptually at least; some might be collapsed in practice):
-
Parsing: rst → document model
Feasible parsers:
- docutils
- pandoc + custom Lua
-
Preprocessing: model → model
Text generation, typically by calling external commands, e.g.:
- syntax highlighting code blocks
- generating diagrams
- create image file on disk
- replace diagram description node with image/figure node
- auto-documenting APIs (sphinx autodoc extension, godoc)
- inserts appropriate role prefixes (e.g. :py, :go) for code objects
- inserts index entry & cross-reference target/anchor
- don’t insert subsection heading? (or insert unnumbered subsection to exclude from TOC)
- converting spans with certain roles (e.g. :py, :go) to internal refs, or external refs with appropriate (role-specific) base URL (https://pkg.go.dev/ etc.)
- generating indices, glossaries, bibliographies (after other preprocessors)
Transforms custom directives into elements that can be handled natively by docutils.
Specify/customise preprocessors to run via List of Callable[*args, **kwargs] in config file? If all transformations can’t be done in a single pass, might need to define a dict instead and repeatedly run all preprocessors until no matching directives are found in the document.
-
Formatting: model → HTML | LaTeX
Document elements are transformed recursively (bottom-up) into target format; and then plugged into layout templates, incl. sidebar TOCs (HTML).
Output customisation via docutils formatter subclasses that override default methods.
-
Compilation: output to disk
- HTML: copying pages, images, CSS to output folder
- PDF: compiling .tex & images to PDF
To include HTML-only output (e.g. formatted man page, exported Jupyter notebook): insert an “external” URL link. The linked page would be treated as an external reference in both HTML and PDF output, even if the page ends up being deployed to the same site as the HTML output: it wouldn’t be part of the document tree (e.g. not listed in TOC).