Decision: Calibre data source
The metadata for books in the user’s Calibre library will be parsed
from the metadata.opf file generated by Calibre in each book directory,
rather than from the metadata.db Sqlite database in the library root.
Advantages of loading from OPF
-
Users will be managing their libraries with different versions of Calibre, and Calibre might change its database schema between major versions. If we were to load the library data from the Sqlite database, we would have to implement query handlers for each active schema version, and potentially ship new ones for each new version of Calibre.
The OPF schema, on the other hand, is a widely used specification published as part of EPUB v2 in 2007 and unchanged since 2010. Using the OPF records requires only a single implementation, and is unlikely to require ongoing maintenance.
-
An implementation that parses data from text files on disk is simple to test, whereas a Sqlite interface requires creating test databases with SQL scripts (since we need the test data to be stored in readable format in Git). But then we would need to test that our test scripts create the databases in the same way as Calibre, or we would need Calibre as a test dependency.
-
The Go standard library includes a package for parsing XML, but it does not implement a Sqlite interface. Using the OPF files as data source means we avoid a dependency on a potentially undermaintained non-trivial third-party library.
Disadvantages
-
It is probably significantly slower and more CPU-intensive to parse hundreds of XML files every time the library is loaded than to run SQL queries against a database designed for that purpose.
Potential mitigation: Caching the data in an application-specific way that is easy to retrieve.
-
It would also use more memory to parse the entire library before processing it than to do targeted queries for specific tables or fields.
Potential mitigation: For some workflows, it may be possible to stream the data one book at a time, so that it gets garbage-collected as soon as it is processed rather than accumulating in memory.