I'd be better off if I could keep it down to a page a day; I find it as mindlessly enthralling as Tetris. I'm not even attempting the hard stuff yet, the Hakluyt or Anatomy of Melancholy. Latin, Greek, superscripts as contractions, ligature characters, poetry, nested footnotes; tricky. Especially tricky because the Project Gutenberg standards, towards which DP tends its efforts, are all old skool Latin-1 completely linear text. I was made a bit sad taking the page numbers out of an index; the topic titles in a good index are not always enough to find the page referred to, because a good index may have a topic filed under a explicit term when the text identifies something in context; "Clarissa Character, bankruptcy of" might point to a paragraph saying "From epistolary evidence, in this year his sister signed over her share of the inheritance completely, and it was lost with the whole." Hypertext can be very good at this, of course, and at footnotes and endnotes. Really excellent footnotes are a form of commented linking that hypertext is still thinking about. It seems a pity to be washing out some of the links that we could improve instead. On the other hand, do I feel like doing it all myself? Not that book, no. Maybe I'll think of a tool.
Really, embarrassingly, if the two goals are to allow fairly precise internal references and to be readable by both machines and humans, page numbers are not at all bad. Three, four digits every eighty lines? there aren't many HTML anchors smaller, let alone identifiers in newer cooler schemes.
Many sites are storing images of the originals, although they present a
more emphatic choice between easy-to-read formats, images good enough
to be useful and attractive, and compression algorithms that will be
reversible later.
So wrote clew in
Meta.